## ETL - Extract, Transform and Load

En esta etapa del proyecto, se llevara a cabo el proceso de ETL (extracción, transformación y carga), para poder comenzar con la obtención de valiosa e importante información a partir de los datos. En esta etapa podremos dimensionar la información con la que contamos, las diversas estructuras y formatos de los datos, las relaciones entre los distintos datasets que trabajaremos, para su porterior utilización, y que esta, sea correcta, efectiva y eficaz.

#### Importación de librerias:

In [1]:
import pandas as pd
from io import StringIO
from io import BytesIO
import pyarrow.parquet as pq
import io
import pyarrow as pa
import os
import Utilidades as ut

### Business 📊

*Definimos la ruta del archivo :*

In [2]:
ruta = '../Yelp/business.pkl' 

*Almacenamos el contenido del archivo en un DataFrame:*

In [4]:
with open(ruta, "rb") as file:
    df_business = pd.read_pickle(BytesIO(file.read()))

*Visualizamos el DataFrame:*

In [5]:
df_business.head(2)

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,...,state.1,postal_code.1,latitude.1,longitude.1,stars.1,review_count.1,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,,93101,34.426679,-119.711197,5.0,7,...,,,,,,,,,,
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,,63123,38.551126,-90.335695,3.0,15,...,,,,,,,,,,


*Podemos ver que se duplicaron las columnas, asique procedemos a eliminar los duplicados:*

In [6]:
df_business = df_business.loc[:,~df_business.columns.duplicated()]

*Verificamos:*

In [7]:
df_business.head(2)

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."


*Normalizamos los nombres de las columnas:*

In [8]:
ut.normalizacion_columnas(df_business)

Unnamed: 0,Business_Id,Name,Address,City,State,Postal_Code,Latitude,Longitude,Stars,Review_Count,Is_Open,Attributes,Categories,Hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,CA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,MO,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'Wheelc...","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-2..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,3388 Gateway Blvd,Edmonton,IN,T6J 5H2,53.468419,-113.492054,3.0,13,1,"{'ByAppointmentOnly': 'False', 'RestaurantsPri...","Nail Salons, Beauty & Spas","{'Monday': '10:0-19:30', 'Tuesday': '10:0-19:3..."
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,2813 Bransford Ave,Nashville,DE,37204,36.115118,-86.766925,4.0,5,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Pets, Nurseries & Gardening, Pet Stores, Hobby...","{'Monday': '9:30-17:30', 'Tuesday': '9:30-17:3..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,"6020 E 82nd St, Ste 46",Indianapolis,AB,46250,39.908707,-86.065088,3.5,8,1,"{'RestaurantsPriceRange2': '1', 'BusinessAccep...","Shopping, Jewelry, Piercing, Toy Stores, Beaut...",
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,2472 Troy Rd,Edwardsville,AB,62025,38.782351,-89.950558,4.0,24,1,"{'BusinessParking': '{'garage': False, 'street...","Fitness/Exercise Equipment, Eyewear & Optician...","{'Monday': '9:0-20:0', 'Tuesday': '9:0-20:0', ..."


*Filtraremos el DataFrame, y nos quedaremos con uno de menor tamaño, solo con los registros cuyo nombre (columna **Name**) contenga **McDonald**:*

In [9]:
df_mcdonalds = df_business[df_business['Name'].str.contains('McDonald|Mc Donald|Mcdonald|McDonalds|Mc Donalds|Mcdonalds', case=False, na=False)]

*Analizamos los estados donde se encuentran las sucursales de Mc Donald's:*

In [10]:
df_mcdonalds['State'].unique()

array(['PA', 'NJ', 'AZ', 'MO', 'FL', 'CA', 'ID', 'IN', 'LA', 'TN', 'DE',
       'NV', 'AB', 'IL'], dtype=object)

In [11]:
df_mcdonalds.head(2)

Unnamed: 0,Business_Id,Name,Address,City,State,Postal_Code,Latitude,Longitude,Stars,Review_Count,Is_Open,Attributes,Categories,Hours
193,yM8LlTInbQH4FwWC97lz6w,McDonald's,1919 S Jefferson,St. Louis,PA,63104,38.612495,-90.221942,1.5,100,1,"{'Alcohol': 'u'none'', 'BikeParking': 'True', ...","Fast Food, Restaurants, Food, Burgers, Coffee ...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
361,aNXw3PkXVt8ANwLyCfcmpg,McDonald's,2333 Welsh Rd,Lansdale,NJ,19446,40.263706,-75.317916,1.5,17,1,"{'WiFi': 'u'free'', 'GoodForKids': 'True', 'Bu...","Restaurants, Fast Food, Burgers, Food, Coffee ...","{'Monday': '6:0-23:0', 'Tuesday': '6:0-23:0', ..."


### Checkin 📊

*Definimos la ruta del archivo :*

In [12]:
ruta = '../Yelp/checkin.json'

*Almacenamos el contenido del archivo en un DataFrame:*

In [13]:
df_checkin = pd.read_json(ruta, lines=True)

*Visualizamos la estructura del DataFrame:*

In [14]:
df_checkin.head(2)

Unnamed: 0,business_id,date
0,---kPU91CF4Lq2-WlRu9Lw,"2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020..."
1,--0iUa4sNDFiZFrAdIWhZQ,"2010-09-13 21:43:09, 2011-05-04 23:08:15, 2011..."


*Normalizamos los nombres de las columnas del DataFrame:*

In [15]:
ut.normalizacion_columnas(df_checkin)

Unnamed: 0,Business_Id,Date
0,---kPU91CF4Lq2-WlRu9Lw,"2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020..."
1,--0iUa4sNDFiZFrAdIWhZQ,"2010-09-13 21:43:09, 2011-05-04 23:08:15, 2011..."
2,--30_8IhuyMHbSOcNWd6DQ,"2013-06-14 23:29:17, 2014-08-13 23:20:22"
3,--7PUidqRWpRSpXebiyxTg,"2011-02-15 17:12:00, 2011-07-28 02:46:10, 2012..."
4,--7jw19RH9JKXgFohspgQw,"2014-04-21 20:42:11, 2014-04-28 21:04:46, 2014..."
...,...,...
131925,zznJox6-nmXlGYNWgTDwQQ,"2013-03-23 16:22:47, 2013-04-07 02:03:12, 2013..."
131926,zznZqH9CiAznbkV6fXyHWA,2021-06-12 01:16:12
131927,zzu6_r3DxBJuXcjnOYVdTw,"2011-05-24 01:35:13, 2012-01-01 23:44:33, 2012..."
131928,zzw66H6hVjXQEt0Js3Mo4A,"2016-12-03 23:33:26, 2018-12-02 19:08:45"


### Tip 📊

*Definimos la ruta del archivo :*

In [18]:
ruta = '../Yelp/tip.json' 

*Almacenamos el contenido del archivo en un DataFrame:*

In [19]:
df_tip = pd.read_json(ruta, lines=True)

*Visualizamos la estructura del DataFrame:*

In [20]:
df_tip.head(2)

Unnamed: 0,user_id,business_id,text,date,compliment_count
0,AGNUgVwnZUey3gcPCJ76iw,3uLgwr0qeCNMjKenHJwPGQ,Avengers time with the ladies.,2012-05-18 02:17:21,0
1,NBN4MgHP9D3cw--SnauTkA,QoezRbYQncpRqyrLH6Iqjg,They have lots of good deserts and tasty cuban...,2013-02-05 18:35:10,0


### Review 📊

*Definimos la ruta del archivo :*

In [21]:
ruta = '../review-002.json' 

*Creamos un función que lea el archivo en formato **.json** en fragmentos de 10000 y lo almacene en una lista vacia. Concatenamos los frangmentos de DataFrame en uno solo y guardamos el archivo en formato **.parquet** para que su peso sea menor:*

In [22]:
# Lista vacia para almacenar los fragmentos del DataFrame
dataframes = []

# Tamaño de los fragmentos
tamano_fragmento = 10000

# Leemos el archivo en fragmentos y creamos DataFrames
with open(ruta, 'r') as file:
    for chunk in pd.read_json(file, lines=True, chunksize=tamano_fragmento):
        dataframes.append(chunk)

# Concatenamos los DataFrames en uno solo
df_final = pd.concat(dataframes, ignore_index=True)

# Guardamos el DataFrame en formato Parquet para que su peso sea menor
df_final.to_parquet('../review-002.parquet')


*Almacenamos el contenido del archivo en un DataFrame, para su posterior transformación:*

In [23]:
df_reviews = pd.read_parquet('../review-002.parquet')

*Visualizamos la estructura del DataFrame:*

In [24]:
df_reviews.head(2)

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3,0,0,0,"If you decide to eat here, just be aware it is...",2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5,1,0,1,I've taken a lot of spin classes over the year...,2012-01-03 15:28:18


*Normalizamos los nombres de las columnas del DataFrame:*

In [25]:
ut.normalizacion_columnas(df_reviews)

Unnamed: 0,Review_Id,User_Id,Business_Id,Stars,Useful,Funny,Cool,Text,Date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3,0,0,0,"If you decide to eat here, just be aware it is...",2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5,1,0,1,I've taken a lot of spin classes over the year...,2012-01-03 15:28:18
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3,0,0,0,Family diner. Had the buffet. Eclectic assortm...,2014-02-05 20:30:30
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5,1,0,1,"Wow! Yummy, different, delicious. Our favo...",2015-01-04 00:01:03
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4,1,0,1,Cute interior and owner (?) gave us tour of up...,2017-01-14 20:54:15
...,...,...,...,...,...,...,...,...,...
6990275,H0RIamZu0B0Ei0P4aeh3sQ,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5,1,2,1,Latest addition to services from ICCU is Apple...,2014-12-17 21:45:20
6990276,shTPgbgdwTHSuU67mGCmZQ,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5,2,1,2,"This spot offers a great, affordable east week...",2021-03-31 16:55:10
6990277,YNfNhgZlaaCO5Q_YJR4rEw,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4,1,0,0,This Home Depot won me over when I needed to g...,2019-12-30 03:56:30
6990278,i-I4ZOhoX70Nw5H0FwrQUA,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5,1,0,0,For when I'm feeling like ignoring my calorie-...,2022-01-19 18:59:27


### User 📊

*Definimos la ruta del archivo :*

In [26]:
ruta = '../user-001.parquet'

*Almacenamos el contenido del archivo en una variable, **df_user**:*

In [27]:
df_user = pq.read_table(ruta, use_threads=True)

*Lo convertimos a un DataFrame de pandas:*

In [28]:
df_user = df_user.to_pandas()

*Visualizamos la estructura del DataFrame:*

In [29]:
df_user.head(2)

Unnamed: 0,user_id,name,review_count,yelping_since,useful,funny,cool,elite,friends,fans,...,compliment_more,compliment_profile,compliment_cute,compliment_list,compliment_note,compliment_plain,compliment_cool,compliment_funny,compliment_writer,compliment_photos
0,qVc8ODYU5SZjKXVBgXdI7w,Walker,585,2007-01-25 16:47:26,7217,1259,5994,2007,"NSCy54eWehBJyZdG2iE84w, pe42u7DcCH2QmI81NX-8qA...",267,...,65,55,56,18,232,844,467,467,239,180
1,j14WgRoU_-2ZE1aw1dXrJg,Daniel,4333,2009-01-25 04:35:42,43091,13066,27281,"2009,2010,2011,2012,2013,2014,2015,2016,2017,2...","ueRPE0CX75ePGMqOFVj6IQ, 52oH4DrRvzzl8wh5UXyU0A...",3138,...,264,184,157,251,1847,7054,3131,3131,1521,1946


*Normalizamos los nombres de las columnas del DataFrame:*

In [30]:
ut.normalizacion_columnas(df_user)

Unnamed: 0,User_Id,Name,Review_Count,Yelping_Since,Useful,Funny,Cool,Elite,Friends,Fans,...,Compliment_More,Compliment_Profile,Compliment_Cute,Compliment_List,Compliment_Note,Compliment_Plain,Compliment_Cool,Compliment_Funny,Compliment_Writer,Compliment_Photos
0,qVc8ODYU5SZjKXVBgXdI7w,Walker,585,2007-01-25 16:47:26,7217,1259,5994,2007,"NSCy54eWehBJyZdG2iE84w, pe42u7DcCH2QmI81NX-8qA...",267,...,65,55,56,18,232,844,467,467,239,180
1,j14WgRoU_-2ZE1aw1dXrJg,Daniel,4333,2009-01-25 04:35:42,43091,13066,27281,"2009,2010,2011,2012,2013,2014,2015,2016,2017,2...","ueRPE0CX75ePGMqOFVj6IQ, 52oH4DrRvzzl8wh5UXyU0A...",3138,...,264,184,157,251,1847,7054,3131,3131,1521,1946
2,2WnXYQFK0hXEoTxPtV2zvg,Steph,665,2008-07-25 10:41:00,2086,1010,1003,20092010201120122013,"LuO3Bn4f3rlhyHIaNfTlnA, j9B4XdHUhDfTKVecyWQgyA...",52,...,13,10,17,3,66,96,119,119,35,18
3,SZDeASXq7o05mMNLshsdIA,Gwen,224,2005-11-29 04:38:33,512,330,299,200920102011,"enx1vVPnfdNUdPho6PH_wg, 4wOcvMLtU6a9Lslggq74Vg...",28,...,4,1,6,2,12,16,26,26,10,9
4,hA5lMy-EnncsH4JoR-hFGQ,Karen,79,2007-01-05 19:40:59,29,15,7,,"PBK4q9KEEBHhFvSXCUirIw, 3FWPpM7KU1gXeOM_ZbYMbA...",1,...,1,0,0,0,1,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2105592,4QGxxakRZeOlg_qDuxmTeQ,Jennilee,38,2012-01-19 23:33:02,74,9,6,,kmwNG5LZSHFmveg6wYYdrw,0,...,1,0,0,0,1,4,0,0,1,0
2105593,tmelBbVBGAzXBVfH2u_R6g,Gerry,19,2009-06-09 16:34:54,14,5,2,,"BFYdCAMFyjYHDwesndEXEg, _9fTIqfSJc7g3V_o76XRVg...",1,...,1,0,0,0,0,1,0,0,0,0
2105594,tpBznnD6uJN3m_pJubj09w,Emily,26,2013-08-13 23:18:11,4,1,2,,"bKV3ly2MuK-K1cptMrFknQ, liel18zRoSB4tEkUP7i6Cg...",0,...,0,0,0,0,1,0,0,0,0,0
2105595,Kst_srPw7GdYydMFYdCtzw,Heatheranne,25,2015-01-10 00:06:25,21,2,5,,"dzHTk52vbGtbktRm_B-wEg, fOfFLV7IbBDN6lzARaLqdg...",0,...,0,0,0,0,0,1,0,0,0,0


### Metadatos Sitios 📊

*Definimos la ruta del archivo :*

In [31]:
ruta = '../Google Maps/metadata-sitios'

*Leemos los archivos **.json** de la carpeta **MetadatosSitios** de nuestro bucket, los filtramos quedandonos con aquellos registros cuyo **name** contenga **McDonald** en el nombre del comercio y los unimos, conteniendolos en un DataFrame. Convertimos el DataFrame en un archivo compatible con parquet, y lo guardamos de manera local para su posterior uso:*

In [33]:
import os

In [34]:
#Generamos una lista vacia donde almacenaremos los DataFrames
dfs = []

#Iteramos sobre los archivos del directorio
for filename in os.listdir(ruta):

    #Si el archivo es un JSON, lo leemos y filtramos los registros que contengan la palabra McDonald's
    if filename.endswith('.json'):
        filepath = os.path.join(ruta, filename)
        df = pd.read_json(filepath, lines=True)
        df_filtered = df[df['name'].str.contains(r"\bMcDonald's\b|\bMc Donald's\b", case=False, na=False, regex=True)]
        dfs.append(df_filtered)

#Si se encontraron DataFrames, los concatenamos y guardamos el resultado en formato Parquet
if len(dfs) > 0:
    merged_df = pd.concat(dfs)
    merged_df.reset_index(drop=True, inplace=True)

    # Guardamos el DataFrame en formato Parquet
    merged_df.to_parquet('../metadatos-sitios.parquet')
    
else:
    print("No se encontraron archivos JSON en el directorio que cumplieran con la condición.")



*Leemos el archivo generado anteriormente y lo almacenamos en un DataFrame para proceder con las transformaciones del mismo:*

In [37]:
df_sitios = pd.read_parquet('../metadadatos-sitios.parquet')

*Normalizamos los nombres de las columnas del mismo:*

In [38]:
ut.normalizacion_columnas(df_sitios)

Unnamed: 0,Name,Address,Gmap_Id,Description,Latitude,Longitude,Category,Avg_Rating,Num_Of_Reviews,Price,Hours,Misc,State,Relative_Results,Url
0,McDonald's,"McDonald's, 1205 S Main St, Manteca, CA 95336",0x80904101ce001281:0x76db23c5d22346ae,"Classic, long-running fast-food chain known fo...",37.785995,-121.218062,"[Fast food restaurant, Breakfast restaurant, C...",2.4,48,$,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x8090410018dc2657:0xed7a807ae3eeee6d, 0x8090...",https://www.google.com/maps/place//data=!4m2!3...
1,McDonald's,"McDonald's, 1000 Palisades Center Dr, West Nya...",0x89c2e9cf8e139235:0x24bfb20e9e09f260,"Classic, long-running fast-food chain known fo...",41.097768,-73.955392,"[Fast food restaurant, Breakfast restaurant, C...",2.2,18,$,,{'Accessibility': ['Wheelchair accessible rest...,,"[0x89c2e9e6ef010ddb:0xe923f7207b70d6f9, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
2,McDonald's,"McDonald's, 341 5th Ave, New York, NY 10016",0x89c259a9b55adb77:0xfe5e87207e736efc,"Classic, long-running fast-food chain known fo...",40.747916,-73.984586,[Fast food restaurant],3.1,16,,,"{'Accessibility': None, 'Amenities': None, 'At...",,"[0x89c259a9b2e6f0b1:0xca9f9eef13b45d33, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
3,McDonald's,"McDonald's, 2400 Aviation Dr, Dallas, TX 75261",0x864c2b8770fc957d:0xdbc6f271dec0dcef,"Classic, long-running fast-food chain known fo...",32.902380,-97.037369,[Fast food restaurant],4.3,4,$,,{'Accessibility': ['Wheelchair accessible entr...,,,https://www.google.com/maps/place//data=!4m2!3...
4,McDonald's,"McDonald's, 571 Walton Blvd, Las Cruces, NM 88001",0x86de3d67b2b54843:0xa54f3893ef44d96,"Classic, long-running fast-food chain known fo...",32.315042,-106.750119,[Fast food restaurant],4.0,5,$,"[[Wednesday, 5AM–11PM], [Thursday, 5AM–11PM], ...","{'Accessibility': None, 'Amenities': ['Good fo...",Permanently closed,"[0x86de17e200933271:0xb244ae1ca1025934, 0x86de...",https://www.google.com/maps/place//data=!4m2!3...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1075,McDonald's,"McDonald's, 6830 Normandy Blvd, Jacksonville, ...",0x88e5b8c338923433:0xd9e7a0e51cb221df,"Classic, long-running fast-food chain known fo...",30.301466,-81.758458,"[Fast food restaurant, Breakfast restaurant, C...",3.0,98,$,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x88e5b8dd48a26893:0xb19a87d6cf5abd5, 0x88e5b...",https://www.google.com/maps/place//data=!4m2!3...
1076,McDonald's,"McDonald's, 427 10th Ave, New York, NY 10001",0x89c259b390583a49:0x4cd01dab5eb15b8,"Classic, long-running fast-food chain known fo...",40.754286,-73.999704,"[Fast food restaurant, American restaurant, Br...",3.0,26,,"[[Saturday, Open 24 hours], [Sunday, Open 24 h...","{'Accessibility': None, 'Amenities': ['Good fo...",Permanently closed,"[0x89c25852fd69fe15:0x111a7d314332dcf3, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
1077,Martin Brower/McDonald's Distribution Center,"Martin Brower/McDonald's Distribution Center, ...",0x88d902d94ebfb5e5:0x6ba01da2209293b8,,26.250990,-80.138780,[Wholesaler],3.7,28,,"[[Saturday, Open 24 hours], [Sunday, Open 24 h...",{'Accessibility': ['Wheelchair accessible entr...,Open 24 hours,"[0x88d91d0c82d408d1:0xb8074e2d66fc6879, 0x88d9...",https://www.google.com/maps/place//data=!4m2!3...
1078,McDonald's,"McDonald's, 2545 Rimrock Ave, Grand Junction, ...",0x87471cfab4899bc1:0xf222ae275fc2025a,"Classic, long-running fast-food chain known fo...",39.076767,-108.581491,"[Fast food restaurant, Breakfast restaurant, C...",3.8,54,$,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x87471ce3b65bba6d:0x5093f43dc1437cdf, 0x8747...",https://www.google.com/maps/place//data=!4m2!3...


*Verificamos los valores unicos en la columna **Name** y podemos ver que hay otros comercios que incluyen el nombre Mc Donald's, pero también hay comercios que son los corrrectos, los de nuestro interes, pero contienen cierta descripción o datos de la ubicación en su noombre, por lo que vamos a buscar la forma de filtrarlos:*

In [39]:
df_sitios['Name'].unique()

array(["McDonald's", "McDonald's Studio", "McDonald's Lunch",
       "Mc Donald's Towing of Paw Paw", "Mc Donald's",
       "McDonald's OMG Building", "McDonald's Corporate Office.",
       "Norman McDonald's Country Drive-In",
       "McDonald's Warehouse Corporate office", "McDonald's Cafe & BBQ",
       "Mcdonald's Self Storage", "McDonald's Budget Printing",
       "Peninsula McDonald's Office", "Mc Donald's Kennels",
       "Graviss McDonald's Disc Golf Course", "Mcdonald's Playplace",
       "McDonald's Regional Office", "Mc Donald's RV Park & Car Wash",
       "McDonald's Kennel",
       "Martin Brower L.L.C/McDonald's Distribution Center",
       "Mc Donald's on church ave", "McDonald's - Corporate Office",
       "Mcdonald's Play Area", "Mc Donald's Service Station",
       "Bluemound Rd. at McDonald's",
       "Cabin in the Clouds Christmas Forest (Formerly McDonald's Tree Farm)",
       "McDonald's Transmission Repair", "McDonald's / Ross's",
       "McDonald's HVAC", "McDon

*Podemos ver que los elementos contenidos en nuestra columna **Category**, se encuentran en forma de lista. Procederemos a transformarlos para poder realizar las transformaciones necesarias en esa columna:*

In [40]:
df_sitios['Category'].value_counts()

Category
[Fast food restaurant]                                                                                                                    32
[Corporate office]                                                                                                                        11
[Restaurant]                                                                                                                              10
[Dessert shop]                                                                                                                             5
[Bus stop]                                                                                                                                 2
                                                                                                                                          ..
[Fast food restaurant, Breakfast restaurant, Coffee shop, Hamburger restaurant, Restaurant, Sandwich shop]                                 1
[Fas

*Primero, cambiamos el tipo de dato a **str**:*

In [41]:
df_sitios['Category'] = df_sitios['Category'].astype(str)

*Luego, recorremos cada elemento de la columna, y eliminamos los corchetes y las comillas simples:*

In [42]:
df_sitios['Category'] = df_sitios['Category'].apply(lambda x: x.replace('[','').replace(']','').replace('\'',''))

*Verificamos:*

In [43]:
df_sitios.head(2)

Unnamed: 0,Name,Address,Gmap_Id,Description,Latitude,Longitude,Category,Avg_Rating,Num_Of_Reviews,Price,Hours,Misc,State,Relative_Results,Url
0,McDonald's,"McDonald's, 1205 S Main St, Manteca, CA 95336",0x80904101ce001281:0x76db23c5d22346ae,"Classic, long-running fast-food chain known fo...",37.785995,-121.218062,Fast food restaurant Breakfast restaurant Coff...,2.4,48,$,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x8090410018dc2657:0xed7a807ae3eeee6d, 0x8090...",https://www.google.com/maps/place//data=!4m2!3...
1,McDonald's,"McDonald's, 1000 Palisades Center Dr, West Nya...",0x89c2e9cf8e139235:0x24bfb20e9e09f260,"Classic, long-running fast-food chain known fo...",41.097768,-73.955392,Fast food restaurant Breakfast restaurant Coff...,2.2,18,$,,{'Accessibility': ['Wheelchair accessible rest...,,"[0x89c2e9e6ef010ddb:0xe923f7207b70d6f9, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...


*Filtramos el DataFrame, quedandonos con aquellos registros que poseen las palabras "Restaurant", "restaurant", "Fast food" y/o "fast-food" en los elementos de la columna **Category**:*

In [44]:
df_sitios_ = df_sitios[df_sitios['Category'].str.contains('restaurant|Fast food|Restaurant|fast-food')]

*Analizamos aquellos que no contienen ninguno de los juegos de palabras anteriores:*

In [45]:
df_sitios[~df_sitios['Category'].str.contains('restaurant|Fast food|Restaurant|fast-food')]

Unnamed: 0,Name,Address,Gmap_Id,Description,Latitude,Longitude,Category,Avg_Rating,Num_Of_Reviews,Price,Hours,Misc,State,Relative_Results,Url
24,McDonald's Studio,"McDonald's Studio, 141 Bridge Ave E, Delano, M...",0x52b4a9af2ca9aa29:0x20efe9b9990af8b3,,45.04181,-93.788087,Portrait studio,4.9,8,,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x52b4a9105e951a1d:0x1ac4d8fb0264249d, 0x52b4...",https://www.google.com/maps/place//data=!4m2!3...
28,Mc Donald's Towing of Paw Paw,"Mc Donald's Towing of Paw Paw, 39617 W Red Arr...",0x881741d71056f773:0xde1e96ed0a460024,,42.211307,-85.933143,Towing service Auto wrecker,3.2,8,,,,,"[0x88176a4b799092ed:0x10fc20d91d5ecfc9, 0x8817...",https://www.google.com/maps/place//data=!4m2!3...
33,McDonald's OMG Building,"McDonald's OMG Building, 103H-05-016.13, Stark...",0x88813551237acdef:0xbefd10848e7590fc,,33.450522,-88.845755,Corporate office,2.0,2,,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x8881352e623fac9d:0x9ee1e3c04c81e6b0, 0x8881...",https://www.google.com/maps/place//data=!4m2!3...
35,McDonald's Corporate Office.,"McDonald's Corporate Office., 511 E John Carpe...",0x864e835e3ad12929:0x42d2edb1cd7fbad1,,32.860478,-96.934822,Corporate office,1.3,95,,"[[Saturday, Open 24 hours], [Sunday, Open 24 h...",{'Accessibility': ['Wheelchair accessible entr...,Open 24 hours,,https://www.google.com/maps/place//data=!4m2!3...
40,McDonald's Warehouse Corporate office,"McDonald's Warehouse Corporate office, 16097 N...",0x872b75ccc33b9bbd:0x212af1c91857fe50,"Classic, long-running fast-food chain known fo...",33.631859,-111.903212,Corporate office,3.1,8,,"[[Saturday, Closed], [Sunday, Closed], [Monday...",{'Accessibility': ['Wheelchair accessible entr...,Closed ⋅ Opens 8AM Mon,"[0x872b38981deaaaab:0xe20c4449799fb5b9, 0x872b...",https://www.google.com/maps/place//data=!4m2!3...
96,Mcdonald's Self Storage,"Mcdonald's Self Storage, 9509 US-69, Huntingto...",0x863839f97e03a02b:0xe8aa756b8df4bf0e,,31.283903,-94.589142,Self-storage facility,3.6,16,,,{'Accessibility': ['Wheelchair accessible entr...,,"[0x8638375feb1884b9:0x3ccd79d971bf7d10, 0x8638...",https://www.google.com/maps/place//data=!4m2!3...
97,McDonald's Budget Printing,"McDonald's Budget Printing, 2647 Bechelli Ln, ...",0x54d2ecbc2d2583e7:0x7c4e83a1efc41831,,40.568718,-122.361943,Commercial printer,4.5,35,,"[[Wednesday, 8AM–5PM], [Thursday, 8AM–5PM], [F...",,Closed ⋅ Opens 8AM Thu,"[0x54d2ecb78115eb79:0x3084caf3c5ebe29, 0x54d29...",https://www.google.com/maps/place//data=!4m2!3...
98,Peninsula McDonald's Office,"Peninsula McDonald's Office, 9465 Provost Rd N...",0x54903aa212ecc13d:0xd9be99e02cb7f1a3,,47.649122,-122.708039,Payroll service,3.7,3,,"[[Wednesday, 9AM–4PM], [Thursday, 9AM–4PM], [F...",{'Accessibility': ['Wheelchair accessible entr...,Closed ⋅ Opens 9AM Thu,"[0x54903ac1a0c44f39:0x8b2100d0e1037251, 0x5490...",https://www.google.com/maps/place//data=!4m2!3...
111,Mc Donald's Kennels,"Mc Donald's Kennels, 3502 NW Half Mile Rd, Sil...",0x5490252e9337616b:0x48cb2a89612bba3f,,47.680898,-122.696435,Kennel Pet groomer,4.5,57,,"[[Wednesday, 8:30AM–5:30PM], [Thursday, 8:30AM...",{'Accessibility': ['Wheelchair accessible entr...,Closed ⋅ Opens 8:30AM,"[0x549024be85a4ce81:0x6b10b12bd0ba42a7, 0x5490...",https://www.google.com/maps/place//data=!4m2!3...
132,Graviss McDonald's Disc Golf Course,"Graviss McDonald's Disc Golf Course, Versaille...",0x88426fe899ed6003:0x23ac8e623940f45f,,38.049935,-84.763302,Disc golf course,4.6,17,,"[[Wednesday, 6AM–9PM], [Thursday, 6AM–9PM], [F...",,Closed ⋅ Opens 6AM,"[0x88427348ab7754e3:0xbca7895eaf29216e, 0x8842...",https://www.google.com/maps/place//data=!4m2!3...


*Podemos ver claramente, que corresponden a otras categorias totalmente distintas y no pertenecen a la cadena de comidas rapidas de nuestro interes.*