# EDA de los alojamientos turísticos en Madrid (Airbnb)

En este notebook realizamos un **Análisis Exploratorio de Datos (EDA)** sobre los alojamientos turísticos de Madrid anunciados en Airbnb, utilizando el dataset `listings.csv` de Inside Airbnb.

**Objetivo del análisis**

- Entender cómo se distribuyen los precios por noche.
- Analizar el impacto de:
  - la **ubicación** (distritos / barrios),
  - el **tipo de alojamiento**,
  - la **capacidad (número de huéspedes)**

en el **precio por noche**.

Este notebook corresponde al archivo `main.ipynb` del proyecto `EDA_Alojamientos_turisticos_Madrid`.

In [28]:
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Estilo de gráficos
plt.style.use("ggplot")
sns.set(rc={"figure.figsize": (10, 5)})

# Crear carpeta de imágenes si no existe
os.makedirs("src/img", exist_ok=True)

In [29]:
# Ruta al fichero de datos
ruta_datos = "src/data/listings.csv"

# Carga del dataset
df = pd.read_csv(ruta_datos, low_memory=False)

# Vistazo rápido
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license;
0,21853,Bright and airy room,83531.0,Abdel,Latina,Cármenes,40.40381,-3.7413,Private room,,4.0,33.0,2018-07-15,0.25,2.0,198.0,0.0,;
1,30320,Apartamentos Dana Sol,130907.0,Danuta Weronika,Centro,Sol,40.41476,-3.70418,Entire home/apt,157.0,5.0,173.0,2025-08-27,0.93,17.0,342.0,1.0,;
2,30959,Beautiful loft in Madrid Center,132883.0,Angela,Centro,Embajadores,40.41259,-3.70105,Entire home/apt,,3.0,8.0,2017-05-30,0.06,1.0,0.0,0.0,;
3,40916,Apartasol Apartamentos Dana,130907.0,Danuta Weronika,Centro,Universidad,40.42247,-3.70577,Entire home/apt,143.0,5.0,53.0,2025-09-11,0.29,17.0,341.0,4.0,;
4,62423,MAGIC ARTISTIC HOUSE IN THE CENTER OF MADRID,303845.0,Arturo,Centro,Justicia,40.41884,-3.69655,Private room,65.0,1.0,249.0,2025-09-05,2.78,3.0,299.0,41.0,;


In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25094 entries, 0 to 25093
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              25094 non-null  object 
 1   name                            22442 non-null  object 
 2   host_id                         22442 non-null  float64
 3   host_name                       22349 non-null  object 
 4   neighbourhood_group             22442 non-null  object 
 5   neighbourhood                   22442 non-null  object 
 6   latitude                        22442 non-null  float64
 7   longitude                       22442 non-null  float64
 8   room_type                       22442 non-null  object 
 9   price                           16931 non-null  float64
 10  minimum_nights                  22442 non-null  float64
 11  number_of_reviews               22442 non-null  float64
 12  last_review                     

In [31]:
df = df[df.isna().sum(axis=1) <= 10]
df.info()


<class 'pandas.core.frame.DataFrame'>
Index: 22442 entries, 0 to 25093
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              22442 non-null  object 
 1   name                            22442 non-null  object 
 2   host_id                         22442 non-null  float64
 3   host_name                       22349 non-null  object 
 4   neighbourhood_group             22442 non-null  object 
 5   neighbourhood                   22442 non-null  object 
 6   latitude                        22442 non-null  float64
 7   longitude                       22442 non-null  float64
 8   room_type                       22442 non-null  object 
 9   price                           16931 non-null  float64
 10  minimum_nights                  22442 non-null  float64
 11  number_of_reviews               22442 non-null  float64
 12  last_review                     17670

In [32]:

df[df["host_name"].isna()]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license;
743,6899113,Habitación en Vallehermoso,36068881.0,,Chamberí,Vallehermoso,40.438820,-3.706190,Private room,,1.0,0.0,,,1.0,0.0,0.0,;
854,7712666,LINDO APARTAMENTO MUY BIEN SITUADO,40516896.0,,Tetuán,Castillejos,40.462600,-3.690190,Entire home/apt,,7.0,0.0,,,1.0,0.0,0.0,;
978,8963605,loft cerca de atocha,46845753.0,,Puente de Vallecas,San Diego,40.395250,-3.668770,Entire home/apt,,1.0,0.0,,,1.0,0.0,0.0,;
3210,22943516,A3 Comfortable And Pleasant Apartment In Madrid,91506664.0,,Centro,Embajadores,40.405418,-3.701753,Entire home/apt,100.0,2.0,174.0,2025-08-25,1.88,16.0,329.0,15.0,H28391AAV68;
3211,22943847,A5 Comfortable And Pleasant Apartment In Madrid,91506664.0,,Centro,Embajadores,40.405708,-3.701652,Entire home/apt,95.0,2.0,121.0,2025-03-23,1.31,16.0,317.0,6.0,H28391AAV68;
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24436,1471538723582381999,Apartamento en Noviciado,518331793.0,,Centro,Universidad,40.425558,-3.705515,Entire home/apt,241.0,1.0,7.0,2025-08-31,6.18,53.0,0.0,7.0,ESFCTU00000304400021559900000000000000000VT-50...
24437,1471610422615046582,Apartamento en Noviciado 2,518331793.0,,Centro,Universidad,40.425558,-3.705515,Entire home/apt,327.0,1.0,7.0,2025-08-30,6.56,53.0,0.0,7.0,ESFCTU00000304400021559900000000000000000VT-50...
24807,1496936866768919379,Aparta-Estudio en Plaza Mayor,91506664.0,,Centro,Sol,40.414688,-3.707189,Entire home/apt,143.0,1.0,0.0,,,16.0,323.0,0.0,ESHFNT0000280910004794110010000000000000000000...
25062,1506426210772187082,Studio in Plaza Callao Madrid,91506664.0,,Centro,Universidad,40.421598,-3.706432,Entire home/apt,123.0,2.0,0.0,,,16.0,314.0,0.0,ESHFNT0000281080000856480020000000000000000000...


In [33]:
df["host_name"] = df["host_name"].fillna("Desconocido")

In [35]:
df["host_name"].value_counts()

host_name
Francisco Andres    462
Home Club           416
Ukio                381
Jorge               339
MIT House           334
                   ... 
Roi                   1
Gustavo Miguel        1
Dunia Dalila          1
Edelmira              1
Tatiana Camila        1
Name: count, Length: 3648, dtype: int64

In [36]:
df.info(
)

<class 'pandas.core.frame.DataFrame'>
Index: 22442 entries, 0 to 25093
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              22442 non-null  object 
 1   name                            22442 non-null  object 
 2   host_id                         22442 non-null  float64
 3   host_name                       22442 non-null  object 
 4   neighbourhood_group             22442 non-null  object 
 5   neighbourhood                   22442 non-null  object 
 6   latitude                        22442 non-null  float64
 7   longitude                       22442 non-null  float64
 8   room_type                       22442 non-null  object 
 9   price                           16931 non-null  float64
 10  minimum_nights                  22442 non-null  float64
 11  number_of_reviews               22442 non-null  float64
 12  last_review                     17670

In [39]:
df[df["license;"].isna()]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license;
427,3712285,Hipster style & perfect location ;),18396400.0,Daniel,Centro,Embajadores,40.411,-3.70328,Entire home/apt,,300.0,10.0,2018-06-10,0.08,1.0,83.0,0.0,
18175,1201433668441586580,Penthouse 3 B Boutique Duplex terraces; 3-4 pe...,589559365.0,Daniel,Puente de Vallecas,Numancia,40.40134,-3.6637,Entire home/apt,147.0,1.0,5.0,2025-09-05,0.45,5.0,73.0,5.0,
18178,1201446110425479538,1 Apartament Atico 3º A (nº 6); up to 4-6 guest,589559365.0,Daniel,Puente de Vallecas,Numancia,40.40183,-3.66457,Entire home/apt,139.0,2.0,9.0,2025-08-21,1.37,5.0,62.0,9.0,
18179,1201449248718083992,Boutique Apartment Entreplanta (nº 1); 3-5 guest,589559365.0,Daniel,Puente de Vallecas,Numancia,40.40081,-3.66449,Entire home/apt,145.0,1.0,3.0,2025-05-12,0.27,5.0,72.0,3.0,
18180,1201452489545612646,Studio Apartment 1ºB (nº3) or 2ºB (nº5); 2-4 g...,589559365.0,Daniel,Puente de Vallecas,Numancia,40.40134,-3.6637,Entire home/apt,130.0,1.0,10.0,2025-07-06,0.91,5.0,80.0,10.0,
21996,1380771706672353061,Luxury Collection; Conde de Aranda,5697443.0,The Everywhere Home,Salamanca,Recoletos,40.422244,-3.686983,Entire home/apt,1725.0,3.0,0.0,,,7.0,107.0,0.0,
22218,1390793860643108945,San Blas Apartment; Pool and Padel for 4 Guests.,687823325.0,Natalia,San Blas - Canillejas,Simancas,40.439161,-3.617413,Entire home/apt,97.0,1.0,25.0,2025-09-03,5.03,1.0,351.0,25.0,


In [40]:
df["license;"].value_counts()

license;
;                                                         14266
Exempt;                                                     603
En proceso;                                                 276
ESABCD123456789123456789123456789123456-HH-7891234567;       49
350202309690;                                                27
                                                          ...  
ESFCTU00002811800046654500000000000000000000VT-143906;        1
ESFCTU00000607000021559700000000000000000VT-509234-A1;        1
ESFCNT00002810400043401100000000000000000000000000007;        1
ESFCTU000028091000091767000000000000000000000VT130756;        1
ESFCNT00002810900032334100000000000000000000000000005;        1
Name: count, Length: 6101, dtype: int64

In [41]:
df["license;"]=df["license;"].fillna("desconocido")

In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 22442 entries, 0 to 25093
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              22442 non-null  object 
 1   name                            22442 non-null  object 
 2   host_id                         22442 non-null  float64
 3   host_name                       22442 non-null  object 
 4   neighbourhood_group             22442 non-null  object 
 5   neighbourhood                   22442 non-null  object 
 6   latitude                        22442 non-null  float64
 7   longitude                       22442 non-null  float64
 8   room_type                       22442 non-null  object 
 9   price                           16931 non-null  float64
 10  minimum_nights                  22442 non-null  float64
 11  number_of_reviews               22442 non-null  float64
 12  last_review                     17670

In [47]:
df_precios = df[df["price"].isna()==False]
df_precios=df_precios.drop(columns=["last_review","reviews_per_month"])
df_precios.info()

<class 'pandas.core.frame.DataFrame'>
Index: 16931 entries, 1 to 25093
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              16931 non-null  object 
 1   name                            16931 non-null  object 
 2   host_id                         16931 non-null  float64
 3   host_name                       16931 non-null  object 
 4   neighbourhood_group             16931 non-null  object 
 5   neighbourhood                   16931 non-null  object 
 6   latitude                        16931 non-null  float64
 7   longitude                       16931 non-null  float64
 8   room_type                       16931 non-null  object 
 9   price                           16931 non-null  float64
 10  minimum_nights                  16931 non-null  float64
 11  number_of_reviews               16931 non-null  float64
 12  calculated_host_listings_count  16931

In [50]:
df_reviews = df[df["last_review"].isna()==False]
df_reviews=df_reviews.drop(columns=["price"])
df_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17670 entries, 0 to 24975
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              17670 non-null  object 
 1   name                            17670 non-null  object 
 2   host_id                         17670 non-null  float64
 3   host_name                       17670 non-null  object 
 4   neighbourhood_group             17670 non-null  object 
 5   neighbourhood                   17670 non-null  object 
 6   latitude                        17670 non-null  float64
 7   longitude                       17670 non-null  float64
 8   room_type                       17670 non-null  object 
 9   minimum_nights                  17670 non-null  float64
 10  number_of_reviews               17670 non-null  float64
 11  last_review                     17670 non-null  object 
 12  reviews_per_month               17670

In [53]:
df_completo = df.drop(columns=["price","last_review","reviews_per_month"])
df_completo.info()

<class 'pandas.core.frame.DataFrame'>
Index: 22442 entries, 0 to 25093
Data columns (total 15 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              22442 non-null  object 
 1   name                            22442 non-null  object 
 2   host_id                         22442 non-null  float64
 3   host_name                       22442 non-null  object 
 4   neighbourhood_group             22442 non-null  object 
 5   neighbourhood                   22442 non-null  object 
 6   latitude                        22442 non-null  float64
 7   longitude                       22442 non-null  float64
 8   room_type                       22442 non-null  object 
 9   minimum_nights                  22442 non-null  float64
 10  number_of_reviews               22442 non-null  float64
 11  calculated_host_listings_count  22442 non-null  float64
 12  availability_365                22442

In [59]:
df_precio_reviews = df[df["price"].isna()==False]
df_precio_reviews = df_precio_reviews[df["last_review"].isna()==False]
df_precio_reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14022 entries, 1 to 24975
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              14022 non-null  object 
 1   name                            14022 non-null  object 
 2   host_id                         14022 non-null  float64
 3   host_name                       14022 non-null  object 
 4   neighbourhood_group             14022 non-null  object 
 5   neighbourhood                   14022 non-null  object 
 6   latitude                        14022 non-null  float64
 7   longitude                       14022 non-null  float64
 8   room_type                       14022 non-null  object 
 9   price                           14022 non-null  float64
 10  minimum_nights                  14022 non-null  float64
 11  number_of_reviews               14022 non-null  float64
 12  last_review                     14022

  df_precio_reviews = df_precio_reviews[df["last_review"].isna()==False]
