# Transformación de los datos

Aquí haremos la transformación y limpieza de los datasets que hemos considerado más relevantes para cumplir nuestros objetivos (business.parquet y review.parquet que fueron optimizados anteriormente). Para empezar importamos pandas y warnings

In [None]:
import pandas as pd
import warnings

Utilizamos warnings para que no aparezcan mensajes de advertencias

In [148]:
warnings.filterwarnings('ignore')

Leemos el archivo 'business.parquet'

In [149]:
business = pd.read_parquet('Yelp/business.parquet')

In [150]:
business

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,,93101,34.426679,-119.711197,5.0,7,0,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,,63123,38.551126,-90.335695,3.0,15,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Shipping Centers, Local Services, Notaries, Ma...","{'Friday': '8:0-18:30', 'Monday': '0:0-0:0', '..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,,85711,32.223236,-110.880452,3.5,22,0,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Department Stores, Shopping, Fashion, Home & G...","{'Friday': '8:0-23:0', 'Monday': '8:0-22:0', '..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,CA,19107,39.955505,-75.155564,4.0,80,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Friday': '7:0-21:0', 'Monday': '7:0-20:0', '..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,MO,18054,40.338183,-75.471659,4.5,13,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Brewpubs, Breweries, Food","{'Friday': '12:0-22:0', 'Monday': None, 'Satur..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,3388 Gateway Blvd,Edmonton,IN,T6J 5H2,53.468419,-113.492054,3.0,13,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Nail Salons, Beauty & Spas","{'Friday': '10:0-19:30', 'Monday': '10:0-19:30..."
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,2813 Bransford Ave,Nashville,DE,37204,36.115118,-86.766925,4.0,5,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Pets, Nurseries & Gardening, Pet Stores, Hobby...","{'Friday': '9:30-17:30', 'Monday': '9:30-17:30..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,"6020 E 82nd St, Ste 46",Indianapolis,AB,46250,39.908707,-86.065088,3.5,8,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Shopping, Jewelry, Piercing, Toy Stores, Beaut...",
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,2472 Troy Rd,Edwardsville,AB,62025,38.782351,-89.950558,4.0,24,1,"{'AcceptsInsurance': None, 'AgesAllowed': None...","Fitness/Exercise Equipment, Eyewear & Optician...","{'Friday': '9:0-20:0', 'Monday': '9:0-20:0', '..."


Eliminamos las columnas que son irrelevantes y también eliminamos duplicados (si es que los tiene)

In [151]:
business.drop(['attributes', 'hours'], axis=1, inplace=True)

In [152]:
business.drop_duplicates(inplace=True)

In [153]:
business.drop(['address', 'postal_code', 'is_open'], axis=1, inplace=True)

In [154]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ",Santa Barbara,,34.426679,-119.711197,5.0,7,"Doctors, Traditional Chinese Medicine, Naturop..."
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,Affton,,38.551126,-90.335695,3.0,15,"Shipping Centers, Local Services, Notaries, Ma..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,Tucson,,32.223236,-110.880452,3.5,22,"Department Stores, Shopping, Fashion, Home & G..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,Philadelphia,CA,39.955505,-75.155564,4.0,80,"Restaurants, Food, Bubble Tea, Coffee & Tea, B..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,Green Lane,MO,40.338183,-75.471659,4.5,13,"Brewpubs, Breweries, Food"
...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,Edmonton,IN,53.468419,-113.492054,3.0,13,"Nail Salons, Beauty & Spas"
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,Nashville,DE,36.115118,-86.766925,4.0,5,"Pets, Nurseries & Gardening, Pet Stores, Hobby..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,Indianapolis,AB,39.908707,-86.065088,3.5,8,"Shopping, Jewelry, Piercing, Toy Stores, Beaut..."
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,Edwardsville,AB,38.782351,-89.950558,4.0,24,"Fitness/Exercise Equipment, Eyewear & Optician..."


Notamos que las 3 primeras filas de la columna 'state' son nulas. Al corroborar las ciudades notamos que solamente la ciudad de Santa Barbara pertenece a uno de los 5 estados que incluiremos para realizar nuestra labor (California), las otras dos ciudades, Tucson y Affton pertenecen a los estados de Arizona y Missouri respectivamente, por lo que no son de nuestro interés. Por lo que procedemos a añadir el valor 'CA' (California) a la primera fila.

In [155]:
business.at[0,'state'] = 'CA'

In [156]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ",Santa Barbara,CA,34.426679,-119.711197,5.0,7,"Doctors, Traditional Chinese Medicine, Naturop..."
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,Affton,,38.551126,-90.335695,3.0,15,"Shipping Centers, Local Services, Notaries, Ma..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,Tucson,,32.223236,-110.880452,3.5,22,"Department Stores, Shopping, Fashion, Home & G..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,Philadelphia,CA,39.955505,-75.155564,4.0,80,"Restaurants, Food, Bubble Tea, Coffee & Tea, B..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,Green Lane,MO,40.338183,-75.471659,4.5,13,"Brewpubs, Breweries, Food"
...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,Edmonton,IN,53.468419,-113.492054,3.0,13,"Nail Salons, Beauty & Spas"
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,Nashville,DE,36.115118,-86.766925,4.0,5,"Pets, Nurseries & Gardening, Pet Stores, Hobby..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,Indianapolis,AB,39.908707,-86.065088,3.5,8,"Shopping, Jewelry, Piercing, Toy Stores, Beaut..."
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,Edwardsville,AB,38.782351,-89.950558,4.0,24,"Fitness/Exercise Equipment, Eyewear & Optician..."


In [157]:
business.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150346 entries, 0 to 150345
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   business_id   150346 non-null  object 
 1   name          150346 non-null  object 
 2   city          150346 non-null  object 
 3   state         150344 non-null  object 
 4   latitude      150346 non-null  float64
 5   longitude     150346 non-null  float64
 6   stars         150346 non-null  float64
 7   review_count  150346 non-null  int64  
 8   categories    150243 non-null  object 
dtypes: float64(3), int64(1), object(5)
memory usage: 10.3+ MB


Podemos notar que han quedado valores nulos en la columna 'categories' por lo que vamos a proceder a eliminarlos, ya que la columna 'categories' es de suma importancia ya que los negocios de interés para nuestro cliente son los negocios relacionados con la hotelería, por lo que no podemos saber si dichos negocios pertenecen a ese rubro.

In [158]:
business.dropna(inplace=True)

In [159]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ",Santa Barbara,CA,34.426679,-119.711197,5.0,7,"Doctors, Traditional Chinese Medicine, Naturop..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,Philadelphia,CA,39.955505,-75.155564,4.0,80,"Restaurants, Food, Bubble Tea, Coffee & Tea, B..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,Green Lane,MO,40.338183,-75.471659,4.5,13,"Brewpubs, Breweries, Food"
5,CF33F8-E6oudUQ46HnavjQ,Sonic Drive-In,Ashland City,AZ,36.269593,-87.058943,2.0,6,"Burgers, Fast Food, Sandwiches, Food, Ice Crea..."
6,n_0UpQx1hsNbnPUSlodU8w,Famous Footwear,Brentwood,PA,38.627695,-90.340465,2.5,13,"Sporting Goods, Fashion, Shoe Stores, Shopping..."
...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,Edmonton,IN,53.468419,-113.492054,3.0,13,"Nail Salons, Beauty & Spas"
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,Nashville,DE,36.115118,-86.766925,4.0,5,"Pets, Nurseries & Gardening, Pet Stores, Hobby..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,Indianapolis,AB,39.908707,-86.065088,3.5,8,"Shopping, Jewelry, Piercing, Toy Stores, Beaut..."
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,Edwardsville,AB,38.782351,-89.950558,4.0,24,"Fitness/Exercise Equipment, Eyewear & Optician..."


Ahora realizaremos un filtrado de aquellos negocios que posean la cadena de texto 'Hotel' en alguna parte de la columna 'categories'.

In [160]:
hotel = business[business['categories'].str.contains('Hotel')]

In [161]:
hotel

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
18,8wGISYjYkE2tSqn3cDMu8A,Nifty Car Rental,Kenner,PA,29.981183,-90.254012,3.5,14,"Automotive, Car Rental, Hotels & Travel, Truck..."
34,w_AMNoI1iG9eay7ncmc67w,River 127,New Orleans,PA,29.951359,-90.064672,3.0,12,"Event Planning & Services, Hotels, Hotels & Tr..."
55,xM6LoUcnpDpMBzXs_7dXAg,Fairfield Inn & Suites,Kennett Square,AB,39.856248,-75.694610,3.0,37,"Hotels, Hotels & Travel, Event Planning & Serv..."
65,uczmbBk5O3tYhGue13dCDg,New Orleans Spirit Tours,New Orleans,IN,29.958431,-90.065173,4.0,38,"Hotels & Travel, Tours, Local Flavor"
67,eYxGFkxo6m3SYGVTh5m2nQ,Big Boyz Toyz Motorcycle Rentals,Tucson,PA,32.250324,-110.903655,4.5,8,"Towing, Hotels & Travel, Automotive, Motorcycl..."
...,...,...,...,...,...,...,...,...,...
150244,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,3.5,6,"Tours, Hotels & Travel, Event Planning & Servi..."
150253,ktZZNOKr3NcRo8hqQnc-FA,B & B Transportation,Tucson,AB,32.132352,-111.000017,3.5,9,"Hotels & Travel, Transportation, Medical Trans..."
150255,sj4kRiUYo3akee0CuUbONw,Extended Stay America - St. Louis - Westport -...,Maryland Heights,TN,38.697319,-90.447389,2.0,18,"Hotels & Travel, Event Planning & Services, Ho..."
150269,2dVJ7R-3JMmu2v4DJYtBbw,Spring Mount Hotel,Schwenksville,AZ,40.275532,-75.456772,2.0,5,"Nightlife, Cafes, Hotels, Bars, Hotels & Trave..."


In [162]:
business = hotel

Ahora realizaremos un filtrado de los 5 estados que poseen mayor afluencia de turismo (California, Florida, Nueva York, Nevada y Texas)

In [164]:
business = business[business['state'].isin(['CA', 'FL', 'NY', 'NV', 'TX'])]

In [165]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
143,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,4.5,6,"Buses, Transportation, Bus Tours, Hotels & Tra..."
181,ORL4JE6tz3rJxVqkdKfegA,Gaylord Opryland Resort & Convention Center,Nashville,NV,36.211592,-86.694319,3.0,1639,"Venues & Event Spaces, Performing Arts, Arts &..."
246,MyE_zdul_JO-dOHOug4GQQ,Watson Adventures Scavenger Hunts,Philadelphia,FL,40.119713,-75.009710,3.0,8,"Local Flavor, Team Building Activities, Active..."
259,vjLSYNGFkPu4Y5HKoJlzYg,Rancho 777,Reno,FL,39.532347,-119.804255,2.0,5,"Event Planning & Services, Hotels, Hotels & Tr..."
551,1tLrjXG-I9hkOoZxlbwDqw,Olde Black Horse,Norristown,NV,40.108381,-75.318943,4.0,9,"American (Traditional), Bars, Breakfast & Brun..."
...,...,...,...,...,...,...,...,...,...
149956,XGGPXLaa_B2Qc79cq9YPlg,Beach House Inn,Santa Barbara,FL,34.410388,-119.695894,4.5,96,"Apartments, Hotels, Hotels & Travel, Home Serv..."
150046,1pBiQhcwaI_kF3urOdnG5A,Hotel Indigo Nashville,Nashville,NV,36.152989,-86.795709,3.0,39,"Hotels, Shopping, Hotels & Travel, Venues & Ev..."
150098,FvN-rcK9Ly3iK-zPSVKbDA,Tucson Ghost Tour,Tucson,FL,32.177254,-110.795688,5.0,8,"Tours, Hotels & Travel"
150112,2GnAOElY1MqaLZ2NHMerhw,Pelican RV Park,New Orleans,FL,30.008066,-90.021493,2.5,10,"RV Parks, Hotels & Travel"


In [166]:
business = business.reset_index(drop=True)

In [167]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,stars,review_count,categories
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,4.5,6,"Buses, Transportation, Bus Tours, Hotels & Tra..."
1,ORL4JE6tz3rJxVqkdKfegA,Gaylord Opryland Resort & Convention Center,Nashville,NV,36.211592,-86.694319,3.0,1639,"Venues & Event Spaces, Performing Arts, Arts &..."
2,MyE_zdul_JO-dOHOug4GQQ,Watson Adventures Scavenger Hunts,Philadelphia,FL,40.119713,-75.009710,3.0,8,"Local Flavor, Team Building Activities, Active..."
3,vjLSYNGFkPu4Y5HKoJlzYg,Rancho 777,Reno,FL,39.532347,-119.804255,2.0,5,"Event Planning & Services, Hotels, Hotels & Tr..."
4,1tLrjXG-I9hkOoZxlbwDqw,Olde Black Horse,Norristown,NV,40.108381,-75.318943,4.0,9,"American (Traditional), Bars, Breakfast & Brun..."
...,...,...,...,...,...,...,...,...,...
1531,XGGPXLaa_B2Qc79cq9YPlg,Beach House Inn,Santa Barbara,FL,34.410388,-119.695894,4.5,96,"Apartments, Hotels, Hotels & Travel, Home Serv..."
1532,1pBiQhcwaI_kF3urOdnG5A,Hotel Indigo Nashville,Nashville,NV,36.152989,-86.795709,3.0,39,"Hotels, Shopping, Hotels & Travel, Venues & Ev..."
1533,FvN-rcK9Ly3iK-zPSVKbDA,Tucson Ghost Tour,Tucson,FL,32.177254,-110.795688,5.0,8,"Tours, Hotels & Travel"
1534,2GnAOElY1MqaLZ2NHMerhw,Pelican RV Park,New Orleans,FL,30.008066,-90.021493,2.5,10,"RV Parks, Hotels & Travel"


Eliminaremos las columnas 'stars' y 'review_count' ya que nuestro objetivo será realizar un merge con el dataset 'review.parquet', para que no hayan confusiones ni interferencias.

In [168]:
business.drop(['stars', 'review_count'], axis=1, inplace=True)

In [170]:
business

Unnamed: 0,business_id,name,city,state,latitude,longitude,categories
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra..."
1,ORL4JE6tz3rJxVqkdKfegA,Gaylord Opryland Resort & Convention Center,Nashville,NV,36.211592,-86.694319,"Venues & Event Spaces, Performing Arts, Arts &..."
2,MyE_zdul_JO-dOHOug4GQQ,Watson Adventures Scavenger Hunts,Philadelphia,FL,40.119713,-75.009710,"Local Flavor, Team Building Activities, Active..."
3,vjLSYNGFkPu4Y5HKoJlzYg,Rancho 777,Reno,FL,39.532347,-119.804255,"Event Planning & Services, Hotels, Hotels & Tr..."
4,1tLrjXG-I9hkOoZxlbwDqw,Olde Black Horse,Norristown,NV,40.108381,-75.318943,"American (Traditional), Bars, Breakfast & Brun..."
...,...,...,...,...,...,...,...
1531,XGGPXLaa_B2Qc79cq9YPlg,Beach House Inn,Santa Barbara,FL,34.410388,-119.695894,"Apartments, Hotels, Hotels & Travel, Home Serv..."
1532,1pBiQhcwaI_kF3urOdnG5A,Hotel Indigo Nashville,Nashville,NV,36.152989,-86.795709,"Hotels, Shopping, Hotels & Travel, Venues & Ev..."
1533,FvN-rcK9Ly3iK-zPSVKbDA,Tucson Ghost Tour,Tucson,FL,32.177254,-110.795688,"Tours, Hotels & Travel"
1534,2GnAOElY1MqaLZ2NHMerhw,Pelican RV Park,New Orleans,FL,30.008066,-90.021493,"RV Parks, Hotels & Travel"


Ahora abrimos el archivo 'review.parquet' y lo importamos a un DataFrame en Pandas

In [172]:
review = pd.read_parquet('Yelp/review.parquet')

In [173]:
review

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0,0,0,0,0.8597,2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0,1,0,1,0.9858,2012-01-03 15:28:18
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0,0,0,0,0.9201,2014-02-05 20:30:30
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0,1,0,1,0.9588,2015-01-04 00:01:03
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0,1,0,1,0.9804,2017-01-14 20:54:15
...,...,...,...,...,...,...,...,...,...
6990275,H0RIamZu0B0Ei0P4aeh3sQ,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5.0,1,2,1,0.1027,2014-12-17 21:45:20
6990276,shTPgbgdwTHSuU67mGCmZQ,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5.0,2,1,2,0.8549,2021-03-31 16:55:10
6990277,YNfNhgZlaaCO5Q_YJR4rEw,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4.0,1,0,0,0.6792,2019-12-30 03:56:30
6990278,i-I4ZOhoX70Nw5H0FwrQUA,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5.0,1,0,0,0.9982,2022-01-19 18:59:27


Eliminaremos las columnas irrelevantes:

In [177]:
review.drop(['useful', 'funny', 'cool'], axis=1, inplace=True)

In [178]:
review

Unnamed: 0,review_id,user_id,business_id,stars,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0,0.8597,2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0,0.9858,2012-01-03 15:28:18
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0,0.9201,2014-02-05 20:30:30
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0,0.9588,2015-01-04 00:01:03
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0,0.9804,2017-01-14 20:54:15
...,...,...,...,...,...,...
6990275,H0RIamZu0B0Ei0P4aeh3sQ,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5.0,0.1027,2014-12-17 21:45:20
6990276,shTPgbgdwTHSuU67mGCmZQ,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5.0,0.8549,2021-03-31 16:55:10
6990277,YNfNhgZlaaCO5Q_YJR4rEw,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4.0,0.6792,2019-12-30 03:56:30
6990278,i-I4ZOhoX70Nw5H0FwrQUA,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5.0,0.9982,2022-01-19 18:59:27


Y finalmente hacemos un merge con el DataFrame de business ya filtrado para filtrar aquellas reviews que tienen que ver con negocios vinculados a la hotelería y de los 5 estados con mayor afluencia de turismo

In [179]:
merge = business.merge(review, on='business_id')

In [180]:
merge

Unnamed: 0,business_id,name,city,state,latitude,longitude,categories,review_id,user_id,stars,text,date
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",Bj6g6pCM5dBBlNO2EwRP6w,NePHFwSt7Hvuc9tbQ6S-aQ,5.0,0.9766,2014-08-21 14:12:08
1,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",uzJfKuY4kNjpIHQJLHg8Gg,eLuM7MT4twNmdAPF8Xxi7Q,5.0,0.8829,2020-06-01 15:11:31
2,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",CSy6BEXqqr-hJfwrUhK3dQ,yFFa4AIe7zM3o9pEzVIB2Q,1.0,-0.9607,2016-07-25 02:50:46
3,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",FQnqH3UDL01DhFPV03ADPw,VCxY2F6Q4ADlaRLTD3m6rA,5.0,0.9606,2019-05-06 20:57:24
4,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",a5u1u1Or4-qmhEQYKzgLZQ,udmsyG7J4Hgl954xVL1hjQ,5.0,0.8595,2021-10-26 21:09:44
...,...,...,...,...,...,...,...,...,...,...,...,...
82920,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",FZPZOACgGia2dONvLXSS7w,qjfMBIZpQT9DDtw_BWCopQ,4.0,0.9836,2016-03-04 01:48:59
82921,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",TBTrB5u5H1CUlUI-GHuu9Q,NA9fkvlaPKoNwbk97eO6bA,1.0,0.8388,2016-02-15 20:22:33
82922,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",_iDSYEvWhjTcw02Rt4WM_A,pCMX9AXtoXLvIldAuLGS3Q,1.0,0.3293,2017-08-25 13:04:27
82923,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",ZRpLZsP7SDIIV3csMdk4gw,SWJtXwHLpl-6CXPPiTCNQQ,5.0,0.9334,2018-02-18 02:18:39


Dividiremos la cantidad de estrellas por 5 y lo sumaremos al análisis de sentimiento que hicimos con NLTK para finalmente obtener un número en el que si es mayor a 1,5 significará que hay un análisis de sentimiento positivo, si el valor es menor a 1,5 pero es mayor o igual a 1 significará que es neutro y si es menor que 1 significará que es negativo.

In [187]:
merge['sent_analysis'] = merge['stars'] / 5 + merge['text']

In [188]:
merge

Unnamed: 0,business_id,name,city,state,latitude,longitude,categories,review_id,user_id,stars,text,date,sent_analysis
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",Bj6g6pCM5dBBlNO2EwRP6w,NePHFwSt7Hvuc9tbQ6S-aQ,5.0,0.9766,2014-08-21 14:12:08,1.9766
1,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",uzJfKuY4kNjpIHQJLHg8Gg,eLuM7MT4twNmdAPF8Xxi7Q,5.0,0.8829,2020-06-01 15:11:31,1.8829
2,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",CSy6BEXqqr-hJfwrUhK3dQ,yFFa4AIe7zM3o9pEzVIB2Q,1.0,-0.9607,2016-07-25 02:50:46,-0.7607
3,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",FQnqH3UDL01DhFPV03ADPw,VCxY2F6Q4ADlaRLTD3m6rA,5.0,0.9606,2019-05-06 20:57:24,1.9606
4,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",a5u1u1Or4-qmhEQYKzgLZQ,udmsyG7J4Hgl954xVL1hjQ,5.0,0.8595,2021-10-26 21:09:44,1.8595
...,...,...,...,...,...,...,...,...,...,...,...,...,...
82920,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",FZPZOACgGia2dONvLXSS7w,qjfMBIZpQT9DDtw_BWCopQ,4.0,0.9836,2016-03-04 01:48:59,1.7836
82921,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",TBTrB5u5H1CUlUI-GHuu9Q,NA9fkvlaPKoNwbk97eO6bA,1.0,0.8388,2016-02-15 20:22:33,1.0388
82922,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",_iDSYEvWhjTcw02Rt4WM_A,pCMX9AXtoXLvIldAuLGS3Q,1.0,0.3293,2017-08-25 13:04:27,0.5293
82923,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",ZRpLZsP7SDIIV3csMdk4gw,SWJtXwHLpl-6CXPPiTCNQQ,5.0,0.9334,2018-02-18 02:18:39,1.9334


Aquí asignaremos el valor 2 que indica positivo, 1 que indica neutralidad y 0 que indica negativo

In [189]:
merge['sent_analysis'] = merge['sent_analysis'].apply(lambda x: 2 if x > 1.5 else (1 if x >= 1 else 0))

In [190]:
merge

Unnamed: 0,business_id,name,city,state,latitude,longitude,categories,review_id,user_id,stars,text,date,sent_analysis
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",Bj6g6pCM5dBBlNO2EwRP6w,NePHFwSt7Hvuc9tbQ6S-aQ,5.0,0.9766,2014-08-21 14:12:08,2
1,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",uzJfKuY4kNjpIHQJLHg8Gg,eLuM7MT4twNmdAPF8Xxi7Q,5.0,0.8829,2020-06-01 15:11:31,2
2,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",CSy6BEXqqr-hJfwrUhK3dQ,yFFa4AIe7zM3o9pEzVIB2Q,1.0,-0.9607,2016-07-25 02:50:46,0
3,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",FQnqH3UDL01DhFPV03ADPw,VCxY2F6Q4ADlaRLTD3m6rA,5.0,0.9606,2019-05-06 20:57:24,2
4,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",a5u1u1Or4-qmhEQYKzgLZQ,udmsyG7J4Hgl954xVL1hjQ,5.0,0.8595,2021-10-26 21:09:44,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
82920,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",FZPZOACgGia2dONvLXSS7w,qjfMBIZpQT9DDtw_BWCopQ,4.0,0.9836,2016-03-04 01:48:59,2
82921,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",TBTrB5u5H1CUlUI-GHuu9Q,NA9fkvlaPKoNwbk97eO6bA,1.0,0.8388,2016-02-15 20:22:33,1
82922,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",_iDSYEvWhjTcw02Rt4WM_A,pCMX9AXtoXLvIldAuLGS3Q,1.0,0.3293,2017-08-25 13:04:27,0
82923,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",ZRpLZsP7SDIIV3csMdk4gw,SWJtXwHLpl-6CXPPiTCNQQ,5.0,0.9334,2018-02-18 02:18:39,2


In [191]:
review = merge

Eliminamos la columna 'text' con el análisis de sentimiento ya que ahora pasó a ser irrelevante

In [193]:
review.drop('text', axis=1, inplace=True)

In [194]:
review

Unnamed: 0,business_id,name,city,state,latitude,longitude,categories,review_id,user_id,stars,date,sent_analysis
0,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",Bj6g6pCM5dBBlNO2EwRP6w,NePHFwSt7Hvuc9tbQ6S-aQ,5.0,2014-08-21 14:12:08,2
1,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",uzJfKuY4kNjpIHQJLHg8Gg,eLuM7MT4twNmdAPF8Xxi7Q,5.0,2020-06-01 15:11:31,2
2,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",CSy6BEXqqr-hJfwrUhK3dQ,yFFa4AIe7zM3o9pEzVIB2Q,1.0,2016-07-25 02:50:46,0
3,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",FQnqH3UDL01DhFPV03ADPw,VCxY2F6Q4ADlaRLTD3m6rA,5.0,2019-05-06 20:57:24,2
4,-aeZuatjCDMV1X4gCTz9Ug,David Thomas Trailways,Philadelphia,FL,40.106409,-74.973937,"Buses, Transportation, Bus Tours, Hotels & Tra...",a5u1u1Or4-qmhEQYKzgLZQ,udmsyG7J4Hgl954xVL1hjQ,5.0,2021-10-26 21:09:44,2
...,...,...,...,...,...,...,...,...,...,...,...,...
82920,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",FZPZOACgGia2dONvLXSS7w,qjfMBIZpQT9DDtw_BWCopQ,4.0,2016-03-04 01:48:59,2
82921,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",TBTrB5u5H1CUlUI-GHuu9Q,NA9fkvlaPKoNwbk97eO6bA,1.0,2016-02-15 20:22:33,1
82922,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",_iDSYEvWhjTcw02Rt4WM_A,pCMX9AXtoXLvIldAuLGS3Q,1.0,2017-08-25 13:04:27,0
82923,JkF0um3dxe-cOBYeergOhQ,St Petersburg Carriages,St. Petersburg,NV,27.775253,-82.632121,"Tours, Hotels & Travel, Event Planning & Servi...",ZRpLZsP7SDIIV3csMdk4gw,SWJtXwHLpl-6CXPPiTCNQQ,5.0,2018-02-18 02:18:39,2


Finalmente guardamos el DataFrame en formato parquet

In [195]:
review.to_parquet('review.parquet', index=False)