# **Ciudad de Mexico Airbnb EDA**

## **Introduction**

[InsideAirbnb](https://insideairbnb.com) is a project which scrapes real airbnb listings for the purpose of analysis and advocacy. Airbnb claims to be part of the "sharing economy" and disrupting the hotel industry. However, data shows that the majority of Airbnb listing in most cities are entire homes, many of which are rented all year round - disruptiong housing and commuities.

The main question i want to answer is:
### **How is Airbnb really being used in and affecting the neighbourhoods of CDMX?**

Structured question tree:
- descriptive->diagnostic->predictive->prescriptive

Let's breakdown the question into smaller subquestions:

**Inventory & Distribution**
- How many Airbnb listings exists in CDMX, and how are they distributed geograpically
- Which colonias have the highest density of listings
- What % of housing inventory in each colonia is Airbnb vs. long-term residential units?

**Behavious & Usage**
- What % of hosts have multiple listings (professional operators) vs. single-unit hosts?
- Which neighbourhoods have the highest share of multi-property hosts?
- What % of hosts are foreing vs. nationals
- Are listings primarily short-stay (<7 days) or long-stay (>28 days) oriented?

**Occupancy & Demand**
- What are average occuncy rates by colonia?
- What is seasonal demand like (high and low months)?
- What is the average lenght of stay by neighbourhood?

**Pricing**
- What are nightly prices by area?
- How have prices changed YoY?
- Are Airbnb prices correlated with tourism hotspots or metro accessibility?

**Housing Market**
- Do colonias with high Airbnb density also show higher increases in ernt prices (INEGI rent/rate data)
- Correlation between Airbnb listing growth and:
    - rent increases
    - % residents reporting household displacement (if i find it)

**Local Economy**
- Estimated Airbnb revenue injected into each colonia - Where does the money go?
    - % captured by foreing / non-local operators
    - potential "leakage" where hosts do not reside locally
- Are neighbourhood with high Airbnb density seeing:
    - growth in restaurants/cafés (openings)
    - drop in local traditional commerce?

### **Social & Urban-Frabic Questions**
- Are neighborhoods with high Airbnb use experiencing demographic shifts (proxy metrics: long-stay pricing, English-language listing share)
- Which colonias show a mismatch between resident population size and number of active short-term rentals?
- Are there signs of "Airbnb-driven tourism corridors" (Roma -> Condesa -> Juarez, etc.)?

### **Spatial Questions (dashboards + maps)**
- Heatmap: Listings per hectare by colonia
- Which Airbnb clusters appear near:
    - metro stations
    - parks
    - tourist attractions
    - coworking spaces
- Are listing clusters expanding over time?

### **Strategic / Policy-Oriented Questions**
- Should regulations limit multi-listing hosts in CDMX?
- Which olonias are at highest risk of displacement?
- Where could Airbnb be redirected to boost tourism in under-visited districts?

## **Datasets**
- listings.csv.gz: Detailed listings data
- listings.csv: Summary information and metrics for listings in Mexico City (good for visualisations).
- calendar.csv.gz: Detailed calendar data
- reviews.csv.gz: Detailed review data
- reviews.csv: Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).
- neighbourhoods.csv: Neighbourhood list for geo filter. Sourced from city or open source GIS files.
- neighbourhoods.geojson: GeoJSON file of neighbourhoods of the city.
- INEGI housing / rent data (for correlation)
- SEDECO economic statistics (local businesses)

In [7]:
# Importing libraries
import warnings
warnings.filterwarnings('ignore')

import numpy as np # Linear algebra
import pandas as pd # Data manipulation
import matplotlib.pyplot as plt # Plots
import seaborn as sns # More plots

datasetpath = "./dataset/"

In [39]:
listings = pd.read_csv(datasetpath + "listings.csv.gz")
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27051 entries, 0 to 27050
Data columns (total 79 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            27051 non-null  int64  
 1   listing_url                                   27051 non-null  object 
 2   scrape_id                                     27051 non-null  int64  
 3   last_scraped                                  27051 non-null  object 
 4   source                                        27051 non-null  object 
 5   name                                          27051 non-null  object 
 6   description                                   26309 non-null  object 
 7   neighborhood_overview                         13736 non-null  object 
 8   picture_url                                   27051 non-null  object 
 9   host_id                                       27051 non-null 

In [42]:
# Cleaning listings dataset
columms_to_drop = [
    "listing_url", "scrape_id", "last_scraped", "source", "picture_url",
    "host_url", "host_since", "host_response_time", "host_response_rate",
    "host_acceptance_rate", "host_is_superhost", "host_thumbnail_url" ,
    "host_picture_url", "host_has_profile_pic", "bathrooms", "bathrooms_text",
    "bedrooms", "beds", "amenities", "number_of_reviews",
    "number_of_reviews_ltm", "number_of_reviews_l30d", "number_of_reviews_ly",
    "first_review", "last_review", "review_scores_rating",
    "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin",
    "review_scores_communication", "review_scores_location", "review_scores_value",
    "instant_bookable", "neighbourhood_group_cleansed", "calendar_updated", "license"
]
listings.drop(columms_to_drop, axis=1, inplace=True)

In [44]:
listings.head(5)

Unnamed: 0,id,name,description,neighborhood_overview,host_id,host_name,host_location,host_about,host_neighbourhood,host_listings_count,...,availability_365,calendar_last_scraped,availability_eoy,estimated_occupancy_l365d,estimated_revenue_l365d,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,35797,Villa Dante,"Dentro de Villa un estudio de arte con futon, ...","Santa Fe Shopping Mall, Interlomas Park and th...",153786,Dici,"Mexico City, Mexico","Master in visual arts, film photography & Mark...",,1.0,...,363,2025-09-27,94,0,0.0,1,1,0,0,
1,44616,Condesa Haus,A new concept of hosting in mexico through a b...,,196253,Fernando,"Mexico City, Mexico",Condesa Haus Rentals offers independent stud...,Condesa,13.0,...,360,2025-09-28,90,6,108000.0,9,4,2,0,0.38
2,56074,Great space in historical San Rafael,This great apartment is located in one of the ...,Very traditional neighborhood with all service...,265650,Maris,"Mexico City, Mexico",I am a University Professor now retired after ...,San Rafael,1.0,...,333,2025-09-28,63,30,17730.0,1,1,0,0,0.48
3,67703,"2 bedroom apt. deco bldg, Condesa","Comfortably furnished, sunny, 2 bedroom apt., ...",,334451,Nicholas,"Mexico City, Mexico","I am a journalist writing about food, (book an...",Hipódromo,3.0,...,252,2025-09-28,9,6,,2,2,0,0,0.3
4,70644,Beautiful light Studio Coyoacan- full equipped !,COYOACAN designer studio quiet & safe! well eq...,Coyoacan is a beautiful neighborhood famous fo...,212109,Trisha,"Mexico City, Mexico","I am a documentary film maker & photo curator,...",Coyoacán,3.0,...,234,2025-09-28,51,48,,3,2,1,0,0.81
