- **IMPORTACIÓN, EXTRACCIÓN Y CONVERSION DE LOS DATOS DE METADATA**

- Leer todos los json de metadata, joinearlos y filtrar por starbucks y dunkin, obteniendo dos dataframes por separado.

In [1]:
import os
import json
import pandas as pd

def convert_json_to_dataframe(input_folder):
    # Listar todos los archivos en la carpeta de entrada
    files = os.listdir(input_folder)
    
    # Lista para almacenar los DataFrames
    dataframes = []
    
    for file in files:
        if file.endswith('.json'):
            file_path = os.path.join(input_folder, file)
            try:
                # Leer el archivo JSON línea por línea
                with open(file_path, 'r') as f:
                    data = [json.loads(line) for line in f]
                
                # Convertir a DataFrame y agregar a la lista
                df = pd.DataFrame(data)
                dataframes.append(df)
                print('Processed', file)
            except ValueError as e:
                print('Error processing', file, ':', e)
    
    # Unir todos los DataFrames en uno solo
    combined_df = pd.concat(dataframes, ignore_index=True)
    
    # Filtrar el dataframe solo para las filas que contengan "Starbucks" en la columna "name"
    combined_df['name'] = combined_df['name'].apply(lambda x: x if x is not None else [])
    starbucks_merged_df = combined_df[combined_df['name'].apply(lambda x: 'Starbucks' in x)]
    
    # Filtrar el dataframe solo para las filas que contengan "Dunkin" en la columna "name"
    dunkin_merged_df = combined_df[combined_df['name'].apply(lambda x: 'Dunkin' in x)]
    
    return starbucks_merged_df, dunkin_merged_df

# Llamar a la función
starbucks_merged_df, dunkin_merged_df = convert_json_to_dataframe('metadata-sitios')

# Contar el número de filas
num_rows_starbucks = starbucks_merged_df.shape[0]
num_rows_dunkin = dunkin_merged_df.shape[0]
print('Número de filas de Starbucks:', num_rows_starbucks)
print('Número de filas de Dunkin:', num_rows_dunkin)

Processed 1.json
Processed 10.json
Processed 11.json
Processed 2.json
Processed 3.json
Processed 4.json
Processed 5.json
Processed 6.json
Processed 7.json
Processed 8.json
Processed 9.json
Número de filas de Starbucks: 3494
Número de filas de Dunkin: 2180


In [7]:
starbucks_merged_df.to_parquet('Starbucks metadata google crudo.parquet')

In [8]:
dunkin_merged_df.to_parquet('Dunkin metadata google crudo.parquet')

In [None]:
starbucks_merged_df.head()

Unnamed: 0,name,address,gmap_id,description,latitude,longitude,category,avg_rating,num_of_reviews,price,hours,MISC,state,relative_results,url
2759,Starbucks,"Starbucks, 777 Coushatta Dr, Kinder, LA 70648",0x863b157fa8b51d01:0x1e4fe1352f3c5410,Seattle-based coffeehouse chain known for its ...,30.544849,-92.813979,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.3,3,$$,"[[Thursday, 7AM–7PM], [Friday, 7AM–7PM], [Satu...","{'Service options': ['Takeout', 'Dine-in', 'De...",Closed ⋅ Opens 7AM,"[0x863b14b4c7136f01:0xb153879cf4c9fd95, 0x863b...",https://www.google.com/maps/place//data=!4m2!3...
4689,Starbucks,"Starbucks, 1021 S Highline Pl, Sioux Falls, SD...",0x878eb38c36597305:0xcbf23f742073a95a,Seattle-based coffeehouse chain known for its ...,43.539009,-96.654927,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.2,18,$$,"[[Wednesday, 7AM–8PM], [Thursday, 7AM–8PM], [F...","{'Service options': ['Delivery', 'Takeout', 'D...",Closed ⋅ Opens 7AM Thu,"[0x878eb38bf781ca91:0xc0710c30260e2429, 0x878e...",https://www.google.com/maps/place//data=!4m2!3...
6767,Starbucks,"Starbucks, 3285 Crosspark Rd, Coralville, IA 5...",0x87e445ce7b5d0903:0x64ef7bd0dd566918,Seattle-based coffeehouse chain known for its ...,41.721587,-91.60453,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.0,15,$$,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",Closes soon ⋅ 8PM ⋅ Opens 6AM Thu,"[0x87e441aeb4a25f27:0xb13d1072372fbdd1, 0x87e4...",https://www.google.com/maps/place//data=!4m2!3...
8853,Starbucks,"Starbucks, 9600 Falls of Neuse Rd, Raleigh, NC...",0x89ac57caf94e3281:0x8d8e0a9bcb3797f4,Seattle-based coffeehouse chain known for its ...,35.90497,-78.60109,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.5,8,$$,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Dine-in', 'De...",Open ⋅ Closes 8PM,,https://www.google.com/maps/place//data=!4m2!3...
9937,Starbucks,"Starbucks, Exit 326 Eastbound, Milepost 324, 6...",0x89c693867ed7de97:0x3f76a336d8d512e0,Seattle-based coffeehouse chain known for its ...,40.083163,-75.4401,"[Coffee shop, Cafe, Coffee store, Espresso bar]",2.9,15,$$,"[[Wednesday, 7AM–7PM], [Thursday, 7AM–7PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",Open ⋅ Closes 7PM,"[0x89c6944bb5799605:0x16d89163c48f625a, 0x89c6...",https://www.google.com/maps/place//data=!4m2!3...


Unnamed: 0,name,address,gmap_id,description,latitude,longitude,category,avg_rating,num_of_reviews,price,hours,MISC,state,relative_results,url
2759,Starbucks,"Starbucks, 777 Coushatta Dr, Kinder, LA 70648",0x863b157fa8b51d01:0x1e4fe1352f3c5410,Seattle-based coffeehouse chain known for its ...,30.544849,-92.813979,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.3,3,,"[[Thursday, 7AM–7PM], [Friday, 7AM–7PM], [Satu...","{'Service options': ['Takeout', 'Dine-in', 'De...",LA,"[0x863b14b4c7136f01:0xb153879cf4c9fd95, 0x863b...",https://www.google.com/maps/place//data=!4m2!3...
4689,Starbucks,"Starbucks, 1021 S Highline Pl, Sioux Falls, SD...",0x878eb38c36597305:0xcbf23f742073a95a,Seattle-based coffeehouse chain known for its ...,43.539009,-96.654927,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.2,18,,"[[Wednesday, 7AM–8PM], [Thursday, 7AM–8PM], [F...","{'Service options': ['Delivery', 'Takeout', 'D...",SD,"[0x878eb38bf781ca91:0xc0710c30260e2429, 0x878e...",https://www.google.com/maps/place//data=!4m2!3...
6767,Starbucks,"Starbucks, 3285 Crosspark Rd, Coralville, IA 5...",0x87e445ce7b5d0903:0x64ef7bd0dd566918,Seattle-based coffeehouse chain known for its ...,41.721587,-91.60453,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.0,15,,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",IA,"[0x87e441aeb4a25f27:0xb13d1072372fbdd1, 0x87e4...",https://www.google.com/maps/place//data=!4m2!3...
8853,Starbucks,"Starbucks, 9600 Falls of Neuse Rd, Raleigh, NC...",0x89ac57caf94e3281:0x8d8e0a9bcb3797f4,Seattle-based coffeehouse chain known for its ...,35.90497,-78.60109,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.5,8,,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Dine-in', 'De...",NC,,https://www.google.com/maps/place//data=!4m2!3...
9937,Starbucks,"Starbucks, Exit 326 Eastbound, Milepost 324, 6...",0x89c693867ed7de97:0x3f76a336d8d512e0,Seattle-based coffeehouse chain known for its ...,40.083163,-75.4401,"[Coffee shop, Cafe, Coffee store, Espresso bar]",2.9,15,,"[[Wednesday, 7AM–7PM], [Thursday, 7AM–7PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",PA,"[0x89c6944bb5799605:0x16d89163c48f625a, 0x89c6...",https://www.google.com/maps/place//data=!4m2!3...


In [19]:
dunkin_merged_df.head()

Unnamed: 0,name,address,gmap_id,description,latitude,longitude,category,avg_rating,num_of_reviews,price,hours,MISC,state,relative_results,url
535,Dunkin Bridge,"Dunkin Bridge, Yale, OK 74085",0x87b16b690c76dc71:0xdf78fabac3bdaa5f,,36.044505,-96.820584,[Bridge],5.0,5,,,,,"[0x87b0dfddc4e496ad:0x295ee748aa3bdf41, 0x87b1...",https://www.google.com/maps/place//data=!4m2!3...
742,Dunkin',"Dunkin', 4008 Bell Blvd, Queens, NY 11361",0x89c261f60bdf13db:0x38da730e4687a97b,Long-running chain serving signature breakfast...,40.763985,-73.77143,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",3.5,8,$,"[[Thursday, 6AM–7PM], [Friday, 6AM–7PM], [Satu...","{'Service options': ['Delivery', 'Takeout', 'D...",Open ⋅ Closes 7PM,"[0x89c3ab9229879ec3:0x3f4b2b46d7d2c503, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
2139,Dunkin,"Dunkin, 1132 Mineral Spring Ave, North Provide...",0x89e44489cbeccc03:0xd3b75bf4e9a39824,Long-running chain serving signature breakfast...,41.867869,-71.428798,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",3.9,8,$,,"{'Service options': ['Delivery', 'Takeout'], '...",,"[0x89e445c7c4df7a27:0x43ad29caf35d3302, 0x89e4...",https://www.google.com/maps/place//data=!4m2!3...
7003,Dunkin',"Dunkin', 525 Pleasant Valley Ave, Mt Laurel To...",0x89c1352001dc66d1:0xb8ca54f815dbb1bf,Long-running chain serving signature breakfast...,39.948164,-74.949908,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",4.1,8,$,"[[Wednesday, 5AM–8PM], [Thursday, 5AM–8PM], [F...","{'Service options': ['Delivery', 'Drive-throug...",Closes soon ⋅ 8PM ⋅ Opens 5AM Thu,,https://www.google.com/maps/place//data=!4m2!3...
14129,Dunkin',"Dunkin', In Stop & Shop, 380 Main Ave, Norwalk...",0x89e81daec8b2f445:0x6fb1428534e11ad0,Long-running chain serving signature breakfast...,41.140586,-73.423947,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",3.9,15,$,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",Open ⋅ Closes 8PM,"[0x89e81dae24931a4f:0x90566a736c61470d, 0x89e8...",https://www.google.com/maps/place//data=!4m2!3...


- **IMPORTACIÓN, EXTRACCIÓN Y CONVERSIÓN DE LOS DATOS DE REVIEWS-ESTADOS.**

-Aplicamos una funcion, para mergear todos lo archivos parquet de los reviews de cada estado con la data filtrada de starbucks de metadata y 

In [4]:
import os
import pandas as pd
import json

def direct_json_to_dataframe():
    # Listar todas las carpetas dentro de 'reviews-estados'
    base_folder = 'reviews-estados'
    folders = [f for f in os.listdir(base_folder) if os.path.isdir(os.path.join(base_folder, f))]
    
    # Lista para almacenar los DataFrames
    df_list = []
    
    for folder in folders:
        folder_path = os.path.join(base_folder, folder)
        # Listar archivos JSON en la carpeta
        json_files = [f for f in os.listdir(folder_path) if f.endswith('.json')]
        
        # Leer y combinar archivos JSON
        for json_file in json_files:
            file_path = os.path.join(folder_path, json_file)
            with open(file_path, 'r') as file:
                data = [json.loads(line) for line in file if line.strip()]
            df = pd.DataFrame(data)
            df_list.append(df)
    
    # Concatenar todos los DataFrames
    combined_df = pd.concat(df_list, ignore_index=True)
    return combined_df

# Ejecutar la función para obtener el DataFrame combinado
reviews_df = direct_json_to_dataframe()

# Filtrar el dataframe solo para las filas que contengan "Starbucks" en la columna "name"
starbucks_merged_df['name'] = starbucks_merged_df['name'].apply(lambda x: x if x is not None else [])
starbucks_merged_df = starbucks_merged_df[starbucks_merged_df['name'].apply(lambda x: 'Starbucks' in x)]

# Filtrar el dataframe solo para las filas que contengan "Dunkin" en la columna "name"
dunkin_merged_df['name'] = dunkin_merged_df['name'].apply(lambda x: x if x is not None else [])
dunkin_merged_df = dunkin_merged_df[dunkin_merged_df['name'].apply(lambda x: 'Dunkin' in x)]

# Buscar los gmap_id que coinciden
starbucks_reviews_df = reviews_df[reviews_df['gmap_id'].isin(starbucks_merged_df['gmap_id'])]
dunkin_reviews_df = reviews_df[reviews_df['gmap_id'].isin(dunkin_merged_df['gmap_id'])]

# Mostrar el número de filas y las primeras filas de los DataFrames resultantes
print('Número de filas de Starbucks reviews:', starbucks_reviews_df.shape[0])

print('Número de filas de Dunkin reviews:', dunkin_reviews_df.shape[0])


Número de filas de Starbucks reviews: 156729
Número de filas de Dunkin reviews: 115892


In [5]:
starbucks_reviews_df.to_parquet('Starbucks reviews google crudo.parquet')

In [6]:
dunkin_reviews_df.to_parquet('Dunkin reviews google crudo.parquet')

In [9]:
starbucks_reviews_df.head()

Unnamed: 0,user_id,name,time,rating,text,pics,resp,gmap_id
28071,100107003653040726165,Jacob McCalpin,1505996531691,5,Chanel is the greatest barista of all time. I'...,,,0x88891beed225fed1:0x3c63ad3e69972d22
28072,108921061266588850634,Alex Z,1538579468609,2,The food is always warm and delicious but the ...,,,0x88891beed225fed1:0x3c63ad3e69972d22
28073,115087327175786879005,James Drummond,1557117732370,1,The location is a franchise of sorts operated ...,,,0x88891beed225fed1:0x3c63ad3e69972d22
28074,103797448577708424762,Matthew Pearson,1555686635302,1,Go to the one in Sterne. This place is a mess....,,,0x88891beed225fed1:0x3c63ad3e69972d22
28075,104674782787422072897,Craig Winn,1534647989256,5,Open early and well staffed.,[{'url': ['https://lh5.googleusercontent.com/p...,,0x88891beed225fed1:0x3c63ad3e69972d22


In [10]:
dunkin_reviews_df.head()

Unnamed: 0,user_id,name,time,rating,text,pics,resp,gmap_id
87803,104957977998342094168,William Clark,1487172724801,4,Great coffee and Donuts. Iced tea is also grea...,[{'url': ['https://lh5.googleusercontent.com/p...,,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87804,117483898282300950125,Birthday Bandit,1486494183709,5,I stop by here often because it's on my way to...,,,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87805,114586050187658234891,Michael Connolly,1489493684756,5,Sad to see this location close up. I stopped h...,,,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87806,105146756185088866130,Carleen Yates,1427380070256,5,I had a great experience when I went in the fo...,,,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87807,105407381034578943677,Ian Cobb,1484142431869,4,I used to live up north in they only had Dunki...,,,0x889a4e8a2f05a603:0xea1325e2785d9fb4


- **MANIPULACIÓN DE DATOS NULOS.**

In [11]:
starbucks_merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3494 entries, 2759 to 3024062
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              3494 non-null   object 
 1   address           3494 non-null   object 
 2   gmap_id           3494 non-null   object 
 3   description       3275 non-null   object 
 4   latitude          3494 non-null   float64
 5   longitude         3494 non-null   float64
 6   category          3494 non-null   object 
 7   avg_rating        3494 non-null   float64
 8   num_of_reviews    3494 non-null   int64  
 9   price             3266 non-null   object 
 10  hours             3167 non-null   object 
 11  MISC              3493 non-null   object 
 12  state             3171 non-null   object 
 13  relative_results  2838 non-null   object 
 14  url               3494 non-null   object 
dtypes: float64(3), int64(1), object(11)
memory usage: 436.8+ KB


In [12]:
dunkin_merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2180 entries, 535 to 3024249
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              2180 non-null   object 
 1   address           2179 non-null   object 
 2   gmap_id           2180 non-null   object 
 3   description       1919 non-null   object 
 4   latitude          2180 non-null   float64
 5   longitude         2180 non-null   float64
 6   category          2180 non-null   object 
 7   avg_rating        2180 non-null   float64
 8   num_of_reviews    2180 non-null   int64  
 9   price             1908 non-null   object 
 10  hours             1816 non-null   object 
 11  MISC              2173 non-null   object 
 12  state             1822 non-null   object 
 13  relative_results  1787 non-null   object 
 14  url               2180 non-null   object 
dtypes: float64(3), int64(1), object(11)
memory usage: 272.5+ KB


In [13]:
starbucks_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 156729 entries, 28071 to 89582065
Data columns (total 8 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   user_id  156729 non-null  object
 1   name     156729 non-null  object
 2   time     156729 non-null  int64 
 3   rating   156729 non-null  int64 
 4   text     75038 non-null   object
 5   pics     2866 non-null    object
 6   resp     42 non-null      object
 7   gmap_id  156729 non-null  object
dtypes: int64(2), object(6)
memory usage: 10.8+ MB


In [14]:
dunkin_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 115892 entries, 87803 to 89025369
Data columns (total 8 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   user_id  115892 non-null  object
 1   name     115892 non-null  object
 2   time     115892 non-null  int64 
 3   rating   115892 non-null  int64 
 4   text     56708 non-null   object
 5   pics     1509 non-null    object
 6   resp     270 non-null     object
 7   gmap_id  115892 non-null  object
dtypes: int64(2), object(6)
memory usage: 8.0+ MB


In [15]:
starbucks_reviews_df.isnull().sum()

user_id         0
name            0
time            0
rating          0
text        81691
pics       153863
resp       156687
gmap_id         0
dtype: int64

In [16]:
dunkin_reviews_df.isnull().sum()

user_id         0
name            0
time            0
rating          0
text        59184
pics       114383
resp       115622
gmap_id         0
dtype: int64

-Se eliminan las columnas que tienen una cantidad relevante de nulos, en este caso "pics" y "resp".

In [17]:
starbucks_reviews_df = starbucks_reviews_df.drop(columns=['pics', 'resp'])
print('Columnas pics y resp han sido eliminadas.')

Columnas pics y resp han sido eliminadas.


In [18]:
dunkin_reviews_df = dunkin_reviews_df.drop(columns=['pics', 'resp'])
print('Columnas pics y resp han sido eliminadas.')

Columnas pics y resp han sido eliminadas.


- **TRANSFORMACIÓN DE COLUMNAS**

-Le cambiamos el nombre a la columna "state" por "open-close".

In [19]:
# Cambiar el nombre de la columna 'state' a 'open-close' de reviews starbucks
starbucks_reviews = starbucks_reviews_df.rename(columns={'state': 'open-close'})
print('La columna state ha sido renombrado como open-close.')

La columna state ha sido renombrado como open-close.


In [20]:
# Cambiar el nombre de la columna 'state' a 'open-close' de reviews dunkin 
dunkin_reviews_ = dunkin_reviews_df.rename(columns={'state': 'open-close'})
print('La columna state ha sido renombrado como open-close.')

La columna state ha sido renombrado como open-close.


-Creamos una nueva columna llamada "state" con las siglas de cada estado.

In [21]:

# Lista de abreviaturas estatales
state_abbreviations = ['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC']

# Función para extraer la abreviatura del estado de "address"
def extract_state(address):
    if address is None:
        return None
    for state in state_abbreviations:
        if state in address:
            return state
    return None

# Aplicar la función para crear la nueva columna
dunkin_merged_df['state'] = dunkin_merged_df['address'].apply(extract_state)



In [22]:
starbucks_merged_df['state'] = starbucks_merged_df['address'].apply(extract_state)

In [23]:
null_count = starbucks_merged_df['state'].isnull().sum()
print('Cantidad de valores nulos en la columna state:', null_count)
null_count2 = dunkin_merged_df['state'].isnull().sum()
print('Cantidad de valores nulos en la columna state:', null_count)

Cantidad de valores nulos en la columna state: 1
Cantidad de valores nulos en la columna state: 1


- **CONVERSIÓN DE TIPOS DE DATOS.**

-Convertimos la columna "time" a valores reales de fecha y hora.

In [24]:
import datetime
import pandas as pd

# Función para convertir timestamp en milisegundos a formato legible
def convertir_timestamp(timestamp_ms):
    if pd.isna(timestamp_ms):
        return None
    try:
        timestamp_s = float(timestamp_ms) / 1000
        date_time = datetime.datetime.fromtimestamp(timestamp_s)
        return date_time.strftime('%Y-%m-%d %H:%M:%S')
    except Exception as e:
        print(f"Error converting timestamp: {timestamp_ms}, error: {e}")
        return None

# Convertir la columna 'time' a tipo numérico
starbucks_reviews_df['time'] = pd.to_numeric(starbucks_reviews_df['time'], errors='coerce')

# Aplicar la función a la columna 'time'
starbucks_reviews_df['time'] = starbucks_reviews_df['time'].apply(convertir_timestamp)

# Mostrar el head del dataframe para verificar la conversión
starbucks_reviews_df.head()

Unnamed: 0,user_id,name,time,rating,text,gmap_id
28071,100107003653040726165,Jacob McCalpin,2017-09-21 09:22:11,5,Chanel is the greatest barista of all time. I'...,0x88891beed225fed1:0x3c63ad3e69972d22
28072,108921061266588850634,Alex Z,2018-10-03 12:11:08,2,The food is always warm and delicious but the ...,0x88891beed225fed1:0x3c63ad3e69972d22
28073,115087327175786879005,James Drummond,2019-05-06 01:42:12,1,The location is a franchise of sorts operated ...,0x88891beed225fed1:0x3c63ad3e69972d22
28074,103797448577708424762,Matthew Pearson,2019-04-19 12:10:35,1,Go to the one in Sterne. This place is a mess....,0x88891beed225fed1:0x3c63ad3e69972d22
28075,104674782787422072897,Craig Winn,2018-08-19 00:06:29,5,Open early and well staffed.,0x88891beed225fed1:0x3c63ad3e69972d22


In [25]:
# Convertir la columna 'time' a tipo numérico
dunkin_reviews_df['time'] = pd.to_numeric(dunkin_reviews_df['time'], errors='coerce')

# Aplicar la función a la columna 'time'
dunkin_reviews_df['time'] = dunkin_reviews_df['time'].apply(convertir_timestamp)

# Mostrar el head del dataframe para verificar la conversión
dunkin_reviews_df.head()

Unnamed: 0,user_id,name,time,rating,text,gmap_id
87803,104957977998342094168,William Clark,2017-02-15 12:32:04,4,Great coffee and Donuts. Iced tea is also grea...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87804,117483898282300950125,Birthday Bandit,2017-02-07 16:03:03,5,I stop by here often because it's on my way to...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87805,114586050187658234891,Michael Connolly,2017-03-14 09:14:44,5,Sad to see this location close up. I stopped h...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87806,105146756185088866130,Carleen Yates,2015-03-26 11:27:50,5,I had a great experience when I went in the fo...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87807,105407381034578943677,Ian Cobb,2017-01-11 10:47:11,4,I used to live up north in they only had Dunki...,0x889a4e8a2f05a603:0xea1325e2785d9fb4


-Analizamos la columna "price".

In [26]:
# Contar los valores en la columna "price"
price_counts1 = starbucks_merged_df['price'].value_counts()
price_counts2 = dunkin_merged_df['price'].value_counts()

# Mostrar los conteos
print(price_counts1)
print(price_counts2)

price
$$    3111
₩₩     152
$        2
₩        1
Name: count, dtype: int64
price
$    1844
₩      64
Name: count, dtype: int64


In [27]:
# Diccionario de mapeo de símbolos a etiquetas
price_mapping = {
    '$': 'Low',
    '$$': 'Moderate',    '$$$': 'high'
}

# Aplicar el mapeo a la columna "price"
starbucks_merged_df['price'] = starbucks_merged_df['price'].map(price_mapping)
dunkin_merged_df['price'] = dunkin_merged_df['price'].map(price_mapping)

In [28]:
starbucks_merged_df.head()

Unnamed: 0,name,address,gmap_id,description,latitude,longitude,category,avg_rating,num_of_reviews,price,hours,MISC,state,relative_results,url
2759,Starbucks,"Starbucks, 777 Coushatta Dr, Kinder, LA 70648",0x863b157fa8b51d01:0x1e4fe1352f3c5410,Seattle-based coffeehouse chain known for its ...,30.544849,-92.813979,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.3,3,Moderate,"[[Thursday, 7AM–7PM], [Friday, 7AM–7PM], [Satu...","{'Service options': ['Takeout', 'Dine-in', 'De...",LA,"[0x863b14b4c7136f01:0xb153879cf4c9fd95, 0x863b...",https://www.google.com/maps/place//data=!4m2!3...
4689,Starbucks,"Starbucks, 1021 S Highline Pl, Sioux Falls, SD...",0x878eb38c36597305:0xcbf23f742073a95a,Seattle-based coffeehouse chain known for its ...,43.539009,-96.654927,"[Coffee shop, Cafe, Coffee store, Espresso bar]",3.2,18,Moderate,"[[Wednesday, 7AM–8PM], [Thursday, 7AM–8PM], [F...","{'Service options': ['Delivery', 'Takeout', 'D...",SD,"[0x878eb38bf781ca91:0xc0710c30260e2429, 0x878e...",https://www.google.com/maps/place//data=!4m2!3...
6767,Starbucks,"Starbucks, 3285 Crosspark Rd, Coralville, IA 5...",0x87e445ce7b5d0903:0x64ef7bd0dd566918,Seattle-based coffeehouse chain known for its ...,41.721587,-91.60453,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.0,15,Moderate,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",IA,"[0x87e441aeb4a25f27:0xb13d1072372fbdd1, 0x87e4...",https://www.google.com/maps/place//data=!4m2!3...
8853,Starbucks,"Starbucks, 9600 Falls of Neuse Rd, Raleigh, NC...",0x89ac57caf94e3281:0x8d8e0a9bcb3797f4,Seattle-based coffeehouse chain known for its ...,35.90497,-78.60109,"[Coffee shop, Cafe, Coffee store, Espresso bar]",4.5,8,Moderate,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Dine-in', 'De...",NC,,https://www.google.com/maps/place//data=!4m2!3...
9937,Starbucks,"Starbucks, Exit 326 Eastbound, Milepost 324, 6...",0x89c693867ed7de97:0x3f76a336d8d512e0,Seattle-based coffeehouse chain known for its ...,40.083163,-75.4401,"[Coffee shop, Cafe, Coffee store, Espresso bar]",2.9,15,Moderate,"[[Wednesday, 7AM–7PM], [Thursday, 7AM–7PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...",PA,"[0x89c6944bb5799605:0x16d89163c48f625a, 0x89c6...",https://www.google.com/maps/place//data=!4m2!3...


In [31]:
# Lista de valores permitidos
allowed_values = ['low', 'moderate', 'high']

# Filtrar los valores que no están en la lista de valores permitidos
invalid_values = starbucks_merged_df[starbucks_merged_df['price'].isin(allowed_values)]['price'].unique()

# Mostrar los valores inválidos
print(invalid_values)

[]


In [32]:
# Lista de valores permitidos
allowed_values1 = ['low', 'moderate', 'high']

# Filtrar los valores que no están en la lista de valores permitidos
invalid_values1 = dunkin_merged_df[dunkin_merged_df['price'].isin(allowed_values)]['price'].unique()

# Mostrar los valores inválidos
print(invalid_values1)

[]


-Reordenamos las columnas.

In [37]:
ordered_columns = ['gmap_id','name', 'address','state', 'num_of_reviews','avg_rating','price','latitude','longitude','category']
remaining_columns = [col for col in starbucks_merged_df.columns if col not in ordered_columns]
new_column_order = ordered_columns + remaining_columns

starbucks_merged_df = starbucks_merged_df[new_column_order]
print('Columns reordered successfully.')
starbucks_merged_df.head()

Columns reordered successfully.


Unnamed: 0,gmap_id,name,address,state,num_of_reviews,avg_rating,price,latitude,longitude,category,description,hours,MISC,relative_results,url
2759,0x863b157fa8b51d01:0x1e4fe1352f3c5410,Starbucks,"Starbucks, 777 Coushatta Dr, Kinder, LA 70648",LA,3,3.3,Moderate,30.544849,-92.813979,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Thursday, 7AM–7PM], [Friday, 7AM–7PM], [Satu...","{'Service options': ['Takeout', 'Dine-in', 'De...","[0x863b14b4c7136f01:0xb153879cf4c9fd95, 0x863b...",https://www.google.com/maps/place//data=!4m2!3...
4689,0x878eb38c36597305:0xcbf23f742073a95a,Starbucks,"Starbucks, 1021 S Highline Pl, Sioux Falls, SD...",SD,18,3.2,Moderate,43.539009,-96.654927,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 7AM–8PM], [Thursday, 7AM–8PM], [F...","{'Service options': ['Delivery', 'Takeout', 'D...","[0x878eb38bf781ca91:0xc0710c30260e2429, 0x878e...",https://www.google.com/maps/place//data=!4m2!3...
6767,0x87e445ce7b5d0903:0x64ef7bd0dd566918,Starbucks,"Starbucks, 3285 Crosspark Rd, Coralville, IA 5...",IA,15,4.0,Moderate,41.721587,-91.60453,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x87e441aeb4a25f27:0xb13d1072372fbdd1, 0x87e4...",https://www.google.com/maps/place//data=!4m2!3...
8853,0x89ac57caf94e3281:0x8d8e0a9bcb3797f4,Starbucks,"Starbucks, 9600 Falls of Neuse Rd, Raleigh, NC...",NC,8,4.5,Moderate,35.90497,-78.60109,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Dine-in', 'De...",,https://www.google.com/maps/place//data=!4m2!3...
9937,0x89c693867ed7de97:0x3f76a336d8d512e0,Starbucks,"Starbucks, Exit 326 Eastbound, Milepost 324, 6...",PA,15,2.9,Moderate,40.083163,-75.4401,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 7AM–7PM], [Thursday, 7AM–7PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x89c6944bb5799605:0x16d89163c48f625a, 0x89c6...",https://www.google.com/maps/place//data=!4m2!3...


In [38]:
ordered_columns = ['gmap_id','name', 'address','state', 'num_of_reviews','avg_rating','price','latitude','longitude','category']
remaining_columns = [col for col in dunkin_merged_df.columns if col not in ordered_columns]
new_column_order = ordered_columns + remaining_columns

dunkin_merged_df = dunkin_merged_df[new_column_order]
print('Columns reordered successfully.')
dunkin_merged_df.head()

Columns reordered successfully.


Unnamed: 0,gmap_id,name,address,state,num_of_reviews,avg_rating,price,latitude,longitude,category,description,hours,MISC,relative_results,url
535,0x87b16b690c76dc71:0xdf78fabac3bdaa5f,Dunkin Bridge,"Dunkin Bridge, Yale, OK 74085",OK,5,5.0,,36.044505,-96.820584,[Bridge],,,,"[0x87b0dfddc4e496ad:0x295ee748aa3bdf41, 0x87b1...",https://www.google.com/maps/place//data=!4m2!3...
742,0x89c261f60bdf13db:0x38da730e4687a97b,Dunkin',"Dunkin', 4008 Bell Blvd, Queens, NY 11361",NY,8,3.5,Low,40.763985,-73.77143,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Thursday, 6AM–7PM], [Friday, 6AM–7PM], [Satu...","{'Service options': ['Delivery', 'Takeout', 'D...","[0x89c3ab9229879ec3:0x3f4b2b46d7d2c503, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
2139,0x89e44489cbeccc03:0xd3b75bf4e9a39824,Dunkin,"Dunkin, 1132 Mineral Spring Ave, North Provide...",RI,8,3.9,Low,41.867869,-71.428798,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,,"{'Service options': ['Delivery', 'Takeout'], '...","[0x89e445c7c4df7a27:0x43ad29caf35d3302, 0x89e4...",https://www.google.com/maps/place//data=!4m2!3...
7003,0x89c1352001dc66d1:0xb8ca54f815dbb1bf,Dunkin',"Dunkin', 525 Pleasant Valley Ave, Mt Laurel To...",NJ,8,4.1,Low,39.948164,-74.949908,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Wednesday, 5AM–8PM], [Thursday, 5AM–8PM], [F...","{'Service options': ['Delivery', 'Drive-throug...",,https://www.google.com/maps/place//data=!4m2!3...
14129,0x89e81daec8b2f445:0x6fb1428534e11ad0,Dunkin',"Dunkin', In Stop & Shop, 380 Main Ave, Norwalk...",CT,15,3.9,Low,41.140586,-73.423947,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x89e81dae24931a4f:0x90566a736c61470d, 0x89e8...",https://www.google.com/maps/place//data=!4m2!3...


-Verificar que todas las columnas tengan el tipo de dato correcto.

In [41]:
import pandas as pd

# Función para convertir columnas en un DataFrame
def convert_columns(df):
    # Convertir 'latitude' y 'longitude' a float
    try:
        df['latitude'] = df['latitude'].astype(float)
        df['longitude'] = df['longitude'].astype(float)
        print('Columnas latitude y longitude convertidas a float exitosamente.')
    except Exception as e:
        print('Error al convertir latitude y longitude a float:', e)

    # Convertir 'avg_rating' a numérico
    try:
        df['avg_rating'] = pd.to_numeric(df['avg_rating'], errors='coerce')
        print('Columna avg_rating convertida a numérico exitosamente.')
    except Exception as e:
        print('Error al convertir avg_rating a numérico:', e)

    # Mostrar los tipos de datos para confirmar los cambios
    print(df.dtypes)

# Aplicar la función a starbucks_merged_df
print('Procesando starbucks_merged_df:')
convert_columns(starbucks_merged_df)

# Aplicar la función a dunkin_merged_df
print('Procesando dunkin_merged_df:')
convert_columns(dunkin_merged_df)

Procesando starbucks_merged_df:
Columnas latitude y longitude convertidas a float exitosamente.
Columna avg_rating convertida a numérico exitosamente.
gmap_id              object
name                 object
address              object
state                object
num_of_reviews        int64
avg_rating          float64
price                object
latitude            float64
longitude           float64
category             object
description          object
hours                object
MISC                 object
relative_results     object
url                  object
dtype: object
Procesando dunkin_merged_df:
Columnas latitude y longitude convertidas a float exitosamente.
Columna avg_rating convertida a numérico exitosamente.
gmap_id              object
name                 object
address              object
state                object
num_of_reviews        int64
avg_rating          float64
price                object
latitude            float64
longitude           float64
category       

In [42]:
import pandas as pd

# Función para convertir columnas en un DataFrame
def convert_columns(df):
    # Convertir 'time' a datetime
    try:
        df['time'] = pd.to_datetime(df['time'])
        print('Columna time convertida a datetime exitosamente.')
    except Exception as e:
        print('Error al convertir time a datetime:', e)

    # Convertir 'rating' a numérico
    try:
        df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
        print('Columna rating convertida a numérico exitosamente.')
    except Exception as e:
        print('Error al convertir rating a numérico:', e)

    # Mostrar los tipos de datos para confirmar los cambios
    print(df.dtypes)

# Aplicar la función a starbucks_reviews_df
print('Procesando starbucks_reviews_df:')
convert_columns(starbucks_reviews_df)

# Aplicar la función a dunkin_reviews_df
print('Procesando dunkin_reviews_df:')
convert_columns(dunkin_reviews_df)

Procesando starbucks_reviews_df:
Columna time convertida a datetime exitosamente.
Columna rating convertida a numérico exitosamente.
user_id            object
name               object
time       datetime64[ns]
rating              int64
text               object
gmap_id            object
dtype: object
Procesando dunkin_reviews_df:
Columna time convertida a datetime exitosamente.
Columna rating convertida a numérico exitosamente.
user_id            object
name               object
time       datetime64[ns]
rating              int64
text               object
gmap_id            object
dtype: object


-Asegurarse de que los datos estén en un formato consistente

In [44]:
import re

# Función para normalizar direcciones
def normalize_address(address):
    if address is None:
        return None
    address = address.strip().title()  # Eliminar espacios en blanco y poner en formato título
    address = re.sub(r'\s+', ' ', address)  # Reemplazar múltiples espacios por un solo espacio
    return address

# Aplicar la función de normalización a la columna 'address' en starbucks_merged_df
starbucks_merged_df['address'] = starbucks_merged_df['address'].apply(normalize_address)
print('Direcciones normalizadas exitosamente en starbucks_merged_df.')
print(starbucks_merged_df['address'].head())

# Aplicar la función de normalización a la columna 'address' en dunkin_merged_df
dunkin_merged_df['address'] = dunkin_merged_df['address'].apply(normalize_address)
print('Direcciones normalizadas exitosamente en dunkin_merged_df.')
print(dunkin_merged_df['address'].head())

Direcciones normalizadas exitosamente en starbucks_merged_df.
2759        Starbucks, 777 Coushatta Dr, Kinder, La 70648
4689    Starbucks, 1021 S Highline Pl, Sioux Falls, Sd...
6767    Starbucks, 3285 Crosspark Rd, Coralville, Ia 5...
8853    Starbucks, 9600 Falls Of Neuse Rd, Raleigh, Nc...
9937    Starbucks, Exit 326 Eastbound, Milepost 324, 6...
Name: address, dtype: object
Direcciones normalizadas exitosamente en dunkin_merged_df.
535                          Dunkin Bridge, Yale, Ok 74085
742              Dunkin', 4008 Bell Blvd, Queens, Ny 11361
2139     Dunkin, 1132 Mineral Spring Ave, North Provide...
7003     Dunkin', 525 Pleasant Valley Ave, Mt Laurel To...
14129    Dunkin', In Stop & Shop, 380 Main Ave, Norwalk...
Name: address, dtype: object


In [45]:
starbucks_merged_df.head()

Unnamed: 0,gmap_id,name,address,state,num_of_reviews,avg_rating,price,latitude,longitude,category,description,hours,MISC,relative_results,url
2759,0x863b157fa8b51d01:0x1e4fe1352f3c5410,Starbucks,"Starbucks, 777 Coushatta Dr, Kinder, La 70648",LA,3,3.3,Moderate,30.544849,-92.813979,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Thursday, 7AM–7PM], [Friday, 7AM–7PM], [Satu...","{'Service options': ['Takeout', 'Dine-in', 'De...","[0x863b14b4c7136f01:0xb153879cf4c9fd95, 0x863b...",https://www.google.com/maps/place//data=!4m2!3...
4689,0x878eb38c36597305:0xcbf23f742073a95a,Starbucks,"Starbucks, 1021 S Highline Pl, Sioux Falls, Sd...",SD,18,3.2,Moderate,43.539009,-96.654927,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 7AM–8PM], [Thursday, 7AM–8PM], [F...","{'Service options': ['Delivery', 'Takeout', 'D...","[0x878eb38bf781ca91:0xc0710c30260e2429, 0x878e...",https://www.google.com/maps/place//data=!4m2!3...
6767,0x87e445ce7b5d0903:0x64ef7bd0dd566918,Starbucks,"Starbucks, 3285 Crosspark Rd, Coralville, Ia 5...",IA,15,4.0,Moderate,41.721587,-91.60453,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x87e441aeb4a25f27:0xb13d1072372fbdd1, 0x87e4...",https://www.google.com/maps/place//data=!4m2!3...
8853,0x89ac57caf94e3281:0x8d8e0a9bcb3797f4,Starbucks,"Starbucks, 9600 Falls Of Neuse Rd, Raleigh, Nc...",NC,8,4.5,Moderate,35.90497,-78.60109,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Dine-in', 'De...",,https://www.google.com/maps/place//data=!4m2!3...
9937,0x89c693867ed7de97:0x3f76a336d8d512e0,Starbucks,"Starbucks, Exit 326 Eastbound, Milepost 324, 6...",PA,15,2.9,Moderate,40.083163,-75.4401,"[Coffee shop, Cafe, Coffee store, Espresso bar]",Seattle-based coffeehouse chain known for its ...,"[[Wednesday, 7AM–7PM], [Thursday, 7AM–7PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x89c6944bb5799605:0x16d89163c48f625a, 0x89c6...",https://www.google.com/maps/place//data=!4m2!3...


In [46]:
starbucks_merged_df.to_parquet('../data/Starbucks_metadata_ETL_limpio.parquet')

In [47]:
dunkin_merged_df.head()

Unnamed: 0,gmap_id,name,address,state,num_of_reviews,avg_rating,price,latitude,longitude,category,description,hours,MISC,relative_results,url
535,0x87b16b690c76dc71:0xdf78fabac3bdaa5f,Dunkin Bridge,"Dunkin Bridge, Yale, Ok 74085",OK,5,5.0,,36.044505,-96.820584,[Bridge],,,,"[0x87b0dfddc4e496ad:0x295ee748aa3bdf41, 0x87b1...",https://www.google.com/maps/place//data=!4m2!3...
742,0x89c261f60bdf13db:0x38da730e4687a97b,Dunkin',"Dunkin', 4008 Bell Blvd, Queens, Ny 11361",NY,8,3.5,Low,40.763985,-73.77143,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Thursday, 6AM–7PM], [Friday, 6AM–7PM], [Satu...","{'Service options': ['Delivery', 'Takeout', 'D...","[0x89c3ab9229879ec3:0x3f4b2b46d7d2c503, 0x89c2...",https://www.google.com/maps/place//data=!4m2!3...
2139,0x89e44489cbeccc03:0xd3b75bf4e9a39824,Dunkin,"Dunkin, 1132 Mineral Spring Ave, North Provide...",RI,8,3.9,Low,41.867869,-71.428798,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,,"{'Service options': ['Delivery', 'Takeout'], '...","[0x89e445c7c4df7a27:0x43ad29caf35d3302, 0x89e4...",https://www.google.com/maps/place//data=!4m2!3...
7003,0x89c1352001dc66d1:0xb8ca54f815dbb1bf,Dunkin',"Dunkin', 525 Pleasant Valley Ave, Mt Laurel To...",NJ,8,4.1,Low,39.948164,-74.949908,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Wednesday, 5AM–8PM], [Thursday, 5AM–8PM], [F...","{'Service options': ['Delivery', 'Drive-throug...",,https://www.google.com/maps/place//data=!4m2!3...
14129,0x89e81daec8b2f445:0x6fb1428534e11ad0,Dunkin',"Dunkin', In Stop & Shop, 380 Main Ave, Norwalk...",CT,15,3.9,Low,41.140586,-73.423947,"[Coffee shop, Bagel shop, Bakery, Breakfast re...",Long-running chain serving signature breakfast...,"[[Wednesday, 6AM–8PM], [Thursday, 6AM–8PM], [F...","{'Service options': ['Takeout', 'Delivery', 'D...","[0x89e81dae24931a4f:0x90566a736c61470d, 0x89e8...",https://www.google.com/maps/place//data=!4m2!3...


In [48]:
dunkin_merged_df.to_parquet('../data/Dunkin_metadata_ETL_limpio.parquet')

-Le cambiamos el nombre a la columna "time" por "date" a starbucks.

In [3]:
# Cambiar el nombre de la columna
starbucks_reviews_df.rename(columns={'time': 'date'}, inplace=True)

In [4]:
starbucks_reviews_df.head()

Unnamed: 0,user_id,name,date,rating,text,gmap_id
28071,100107003653040726165,Jacob McCalpin,2017-09-21 09:22:11,5,Chanel is the greatest barista of all time. I'...,0x88891beed225fed1:0x3c63ad3e69972d22
28072,108921061266588850634,Alex Z,2018-10-03 12:11:08,2,The food is always warm and delicious but the ...,0x88891beed225fed1:0x3c63ad3e69972d22
28073,115087327175786879005,James Drummond,2019-05-06 01:42:12,1,The location is a franchise of sorts operated ...,0x88891beed225fed1:0x3c63ad3e69972d22
28074,103797448577708424762,Matthew Pearson,2019-04-19 12:10:35,1,Go to the one in Sterne. This place is a mess....,0x88891beed225fed1:0x3c63ad3e69972d22
28075,104674782787422072897,Craig Winn,2018-08-19 00:06:29,5,Open early and well staffed.,0x88891beed225fed1:0x3c63ad3e69972d22


In [5]:
starbucks_reviews_df.to_parquet('../data/Starbucks_reviews_ETL_limpio.parquet')

-Le cambiamos el nombre a la columna "time" por "date" a dunkin

In [8]:
# Cambiar el nombre de la columna
dunkin_reviews_df.rename(columns={'time': 'date'}, inplace=True)

In [9]:
dunkin_reviews_df.head()

Unnamed: 0,user_id,name,date,rating,text,gmap_id
87803,104957977998342094168,William Clark,2017-02-15 12:32:04,4,Great coffee and Donuts. Iced tea is also grea...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87804,117483898282300950125,Birthday Bandit,2017-02-07 16:03:03,5,I stop by here often because it's on my way to...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87805,114586050187658234891,Michael Connolly,2017-03-14 09:14:44,5,Sad to see this location close up. I stopped h...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87806,105146756185088866130,Carleen Yates,2015-03-26 11:27:50,5,I had a great experience when I went in the fo...,0x889a4e8a2f05a603:0xea1325e2785d9fb4
87807,105407381034578943677,Ian Cobb,2017-01-11 10:47:11,4,I used to live up north in they only had Dunki...,0x889a4e8a2f05a603:0xea1325e2785d9fb4


In [10]:
dunkin_reviews_df.to_parquet('../data/Dunkin_reviews_ETL_limpio.parquet')