### Edición de los datos de Airbnb para agregar el sector de cada comuna a la columna `neighbourhood_group` y eliminar las columnas que no irán en el dashboard.

Comenzamos importando pandas para trabajar.

In [None]:
import pandas as pd #Importar la librería de trabajo

Montamos el dataset y previsualizamos.

In [3]:
data = pd.read_csv('listings.csv')
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,49392,Share my Flat in Providencia,224592,Maria,,Providencia,-33.43277,-70.59892,Private room,48422.0,3,0,,,1,178,0,
1,52811,Suite Providencia 1 Santiago Chile,244792,Cristián,,Providencia,-33.42959,-70.6188,Entire home/apt,49335.0,1,45,2021-11-04,0.27,3,0,0,
2,53494,depto centro ski el colorado chile,249097,Paulina,,Lo Barnechea,-33.34521,-70.29543,Entire home/apt,235714.0,2,48,2024-09-01,0.48,1,365,11,
3,65058,Dpto amoblado centro historico,318016,Patricio,,Recoleta,-33.43049,-70.64079,Private room,,2,0,,,1,0,0,
4,73752,Barrio Lastarria,374124,Daniela&Ricardo,,Santiago,-33.43865,-70.64241,Private room,,3,0,,,1,0,0,


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13974 entries, 0 to 13973
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              13974 non-null  int64  
 1   name                            13974 non-null  object 
 2   host_id                         13974 non-null  int64  
 3   host_name                       13974 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   13974 non-null  object 
 6   latitude                        13974 non-null  float64
 7   longitude                       13974 non-null  float64
 8   room_type                       13974 non-null  object 
 9   price                           12111 non-null  float64
 10  minimum_nights                  13974 non-null  int64  
 11  number_of_reviews               13974 non-null  int64  
 12  last_review                     

Identificamos las comunas.

In [5]:
data['neighbourhood'].unique()

array(['Providencia', 'Lo Barnechea', 'Recoleta', 'Santiago',
       'La Florida', 'Las Condes', 'La Reina', 'Ñuñoa', 'Independencia',
       'San Miguel', 'Vitacura', 'Maipú', 'Peñalolén', 'Estación Central',
       'Pedro Aguirre Cerda', 'San Joaquín', 'Macul', 'El Bosque',
       'Lo Espejo', 'La Cisterna', 'Quinta Normal', 'Quilicura',
       'Pudahuel', 'Lo Prado', 'Huechuraba', 'Renca', 'Cerrillos',
       'La Granja', 'Conchalí', 'Cerro Navia', 'San Ramón'], dtype=object)

Eliminamos la columna vacía.

In [6]:
data.drop(columns='neighbourhood_group', inplace=True)

Construímos la función para clasificar por sector y la aplicamos en la nueva columna `neighbourhood_group`.

In [7]:
def clasificar_sector(row):
    """
    Clasifica las comunas de la Región Metropolitana en sectores.

    Parámetros:
        row (dict o Series): Fila con información, donde 'comuna' contiene el nombre de la comuna.

    Retorna:
        str: Sector al que pertenece la comuna.
    """
    comuna = row['neighbourhood'] 

    # Clasificación por sectores
    if comuna in ['Las Condes', 'Providencia', 'Vitacura', 'La Reina', 'Ñuñoa', 'Lo Barnechea']:
        return 'Nororiente'
    elif comuna in ['Santiago', 'Estación Central', 'Quinta Normal', 'Independencia', 'Recoleta']:
        return 'Centro'
    elif comuna in ['Maipú', 'Cerrillos', 'Pudahuel', 'Lo Prado', 'Renca', 'Cerro Navia']:
        return 'Poniente'
    elif comuna in ['Pedro Aguirre Cerda', 'San Joaquín', 'San Miguel','San Bernardo', 'Lo Espejo', 'La Cisterna', 'La Granja', 'Puente Alto', 'La Florida', 'La Pintana', 'El Bosque', 'San Ramón']:
        return 'Sur'
    elif comuna in ['Peñalolén', 'Macul', 'La Granja', 'Pirque']:
        return 'Suroriente'
    elif comuna in ['Colina', 'Huechuraba', 'Quilicura', 'Lampa', 'Tiltil', 'Conchalí']:
        return 'Norte'
    elif comuna in ['San José de Maipo']:
        return 'Cordillera'
    else:
        return 'Desconocido'  # Por si la comuna no está en la lista

data['neighbourhood_group'] = data.apply(clasificar_sector, axis=1)
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license,neighbourhood_group
0,49392,Share my Flat in Providencia,224592,Maria,Providencia,-33.43277,-70.59892,Private room,48422.0,3,0,,,1,178,0,,Nororiente
1,52811,Suite Providencia 1 Santiago Chile,244792,Cristián,Providencia,-33.42959,-70.6188,Entire home/apt,49335.0,1,45,2021-11-04,0.27,3,0,0,,Nororiente
2,53494,depto centro ski el colorado chile,249097,Paulina,Lo Barnechea,-33.34521,-70.29543,Entire home/apt,235714.0,2,48,2024-09-01,0.48,1,365,11,,Nororiente
3,65058,Dpto amoblado centro historico,318016,Patricio,Recoleta,-33.43049,-70.64079,Private room,,2,0,,,1,0,0,,Centro
4,73752,Barrio Lastarria,374124,Daniela&Ricardo,Santiago,-33.43865,-70.64241,Private room,,3,0,,,1,0,0,,Centro


Comprobamos que estén todas las comunas debidamente clasificadas.

In [8]:
data[data['neighbourhood_group'] == 'Desconocido']

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license,neighbourhood_group


Eliminamos las columnas que no serán utilizadas en el dashboard.

In [12]:
data.drop(columns=['minimum_nights', 'calculated_host_listings_count', 'availability_365', 'number_of_reviews_ltm', 'license'], inplace=True)
data.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,number_of_reviews,last_review,reviews_per_month,neighbourhood_group
0,49392,Share my Flat in Providencia,224592,Maria,Providencia,-33.43277,-70.59892,Private room,48422.0,0,,,Nororiente
1,52811,Suite Providencia 1 Santiago Chile,244792,Cristián,Providencia,-33.42959,-70.6188,Entire home/apt,49335.0,45,2021-11-04,0.27,Nororiente
2,53494,depto centro ski el colorado chile,249097,Paulina,Lo Barnechea,-33.34521,-70.29543,Entire home/apt,235714.0,48,2024-09-01,0.48,Nororiente
3,65058,Dpto amoblado centro historico,318016,Patricio,Recoleta,-33.43049,-70.64079,Private room,,0,,,Centro
4,73752,Barrio Lastarria,374124,Daniela&Ricardo,Santiago,-33.43865,-70.64241,Private room,,0,,,Centro


Y, por último, generamos el nuevo dataset.

In [13]:
output_file = "airbnb_stgo.csv"
data.to_csv(output_file, index=False)

print(f"Archivo CSV creado: {output_file}")

Archivo CSV creado: airbnb_stgo.csv
