&nbsp;
# **Airbnb vs Long-Term Rental Price Analysis**

# Goal

To assess how the concentration of short‑term rentals correlates with neighborhood rental‑price trends in Paris & London cities by integrating and analyzing public housing and Airbnb datasets.

# Table of Contents

1. Setup
   - Import utility functions (Plotters & parsers)
   - Load Paris Data
   - Load London Data

2. Comparative Analysis 
   - What are the Airbnb densities in Paris & London? 
   - What are the long term rental price increase in Paris & London?  

3. Data Exploration 
   - Market Structure 
   - The plot twist 
   - Let’s do a simple polynomial regression 
   - What is the best polynomial degree to fit these data points? 
   - Can we get a better sense of this trend?
   - And an even better?
   - What are the correlations?
   - Optional observation for London

4. Conclusions

5. Open‑ended Challenge

&nbsp;

&nbsp;
# 1) Setup
&nbsp;

In [None]:
%load_ext autoreload
%autoreload 2
import json
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import shape, Point
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_validate, KFold
from sklearn.metrics import r2_score
import seaborn as sns
from scipy.stats import pearsonr

&nbsp;
## Load utility functions (Plotters & parsers)
#### Located in the ```utils.py``` file
&nbsp;

In [None]:
from utils import *

&nbsp;
## Load Paris Data
###### Data fetched from :
###### https://www.data.gouv.fr/fr/datasets/logement-encadrement-des-loyers/#:~:text=Ce%20jeu%20de%20donn%C3%A9es%20pr%C3%A9sente,des%20ann%C3%A9es%20pr%C3%A9c%C3%A9dentes%20est%20conserv%C3%A9
###### https://insideairbnb.com
&nbsp;

In [None]:
def load_paris_data():
    print("\n" + "="*50)
    print("LOADING PARIS DATA")
    print("="*50)

    # File paths
    csv_rentals = "../data/paris/paris_rentals.csv"
    csv_airbnb = "../data/paris/paris_airbnb.csv"

    df_rentals_initial = pd.read_csv(csv_rentals, 
                                     delimiter=';', 
                                     on_bad_lines='skip', 
                                     encoding='utf-8')
    
    # Use fine grid neighborhoods from rentals data (geojson is given)
    df_neigh = df_rentals_initial.drop_duplicates(subset="Numéro du quartier")
    df_neigh = df_neigh[["Numéro du quartier", "geo_shape"]]
    print(f"Number of unique Paris neighborhoods: {len(df_neigh)}")
    
    # Rename and convert the GeoJSON geometry to Shapely objects
    df_neigh.rename(columns={"Numéro du quartier": "neigh_id"}, inplace=True)
    df_neigh["geometry"] = df_neigh["geo_shape"].apply(convert_geojson_to_shape)
    gdf_neigh = gpd.GeoDataFrame(df_neigh, geometry="geometry", crs="EPSG:4326")
    
    # Load Paris Airbnb data
    df_airbnb = pd.read_csv(csv_airbnb, 
                            delimiter=',', 
                            on_bad_lines='skip', 
                            encoding='utf-8')
    
    # Create point geometry from latitude and longitude
    df_airbnb['geometry'] = df_airbnb.apply(lambda row: create_point_from_coords(row, 7, 6), axis=1)
    df_airbnb = df_airbnb[df_airbnb['geometry'].notnull()]
    gdf_airbnb = gpd.GeoDataFrame(df_airbnb, geometry='geometry', crs="EPSG:4326")
    
    # Spatial join: assign Airbnb listings to neighborhoods
    gdf_airbnb_joined = gpd.sjoin(gdf_airbnb, gdf_neigh, how='left', predicate='within')
    airbnb_counts = gdf_airbnb_joined.groupby('neigh_id').size().reset_index(name='airbnb_count')
    gdf_neigh = gdf_neigh.merge(airbnb_counts, on='neigh_id', how='left')
    gdf_neigh['airbnb_count'] = gdf_neigh['airbnb_count'].fillna(0).astype(int)
    
    # Compute area in km² and calculate density
    gdf_neigh['area_km2'] = gdf_neigh.to_crs(epsg=3857).area / 1e6
    gdf_neigh['airbnb_density'] = gdf_neigh['airbnb_count'] / gdf_neigh['area_km2']
    print(f"Total number of Airbnb listings in Paris: {len(gdf_airbnb)}")
    
    # Filter rental data by year
    df_rentals_2024 = df_rentals_initial[df_rentals_initial.iloc[:, 0] == 2024].copy()
    df_rentals_2019 = df_rentals_initial[df_rentals_initial.iloc[:, 0] == 2019].copy()
    
    # Parse coordinates and rental prices
    df_rentals_2024['geometry'] = df_rentals_2024.iloc[:, 13].apply(parse_paris_coords)
    df_rentals_2019['geometry'] = df_rentals_2019.iloc[:, 13].apply(parse_paris_coords)
    
    # Rental price is in column index 7
    df_rentals_2024['rental_price'] = pd.to_numeric(df_rentals_2024.iloc[:, 7], errors='coerce')
    df_rentals_2019['rental_price'] = pd.to_numeric(df_rentals_2019.iloc[:, 7], errors='coerce')
    
    # Filter valid entries
    df_rentals_2024 = df_rentals_2024[df_rentals_2024['geometry'].notnull() & df_rentals_2024['rental_price'].notnull()]
    df_rentals_2019 = df_rentals_2019[df_rentals_2019['geometry'].notnull() & df_rentals_2019['rental_price'].notnull()]
    
    # Create GeoDataFrames in preparation for spatial join
    gdf_rentals_2024 = gpd.GeoDataFrame(df_rentals_2024, geometry='geometry', crs="EPSG:4326")
    gdf_rentals_2019 = gpd.GeoDataFrame(df_rentals_2019, geometry='geometry', crs="EPSG:4326")
    
    # Spatial join rentals to neighborhoods
    gdf_rentals_2024_joined = gpd.sjoin(gdf_rentals_2024, gdf_neigh, how='left', predicate='within')
    gdf_rentals_2019_joined = gpd.sjoin(gdf_rentals_2019, gdf_neigh, how='left', predicate='within')
    
    # Calculate average prices for each neighborhood
    avg_prices_2024 = gdf_rentals_2024_joined.groupby('neigh_id')['rental_price'].mean().reset_index(name='avg_rental_price_2024')
    gdf_neigh = gdf_neigh.merge(avg_prices_2024, on='neigh_id', how='left')
    gdf_neigh['avg_rental_price_2024'] = gdf_neigh['avg_rental_price_2024'].fillna(0)
    
    avg_prices_2019 = gdf_rentals_2019_joined.groupby('neigh_id')['rental_price'].mean().reset_index(name='avg_rental_price_2019')
    gdf_neigh = gdf_neigh.merge(avg_prices_2019, on='neigh_id', how='left')
    gdf_neigh['avg_rental_price_2019'] = gdf_neigh['avg_rental_price_2019'].fillna(0)
    
    # Compute rental price increase (2024 - 2019)
    gdf_neigh['price_increase'] = gdf_neigh['avg_rental_price_2024'] - gdf_neigh['avg_rental_price_2019']
    
    return gdf_neigh

&nbsp;
## Load London Data
###### Data fetched from :
###### https://data.london.gov.uk/dataset/average-private-rents-borough
###### https://insideairbnb.com
&nbsp;

In [None]:
def load_london_data():
    print("\n" + "="*50)
    print("LOADING LONDON DATA")
    print("="*50)
    
    # File paths
    excel_rentals = "../data/london/london_rentals.xls"
    csv_airbnb = "../data/london/london_airbnb.csv"
    geojson_neigh = "../data/london/london_neighbourhoods.geojson"
    
    # Load Airbnb listings
    df_airbnb = pd.read_csv(csv_airbnb, encoding="utf-8")
    
    # Build GeoDataFrame for Airbnb data
    DF = df_airbnb.copy()
    DF['geometry'] = DF.apply(lambda row: create_point_from_coords(row, 7, 6), axis=1)
    DF = DF[DF['geometry'].notnull()]
    gdf_airbnb = gpd.GeoDataFrame(DF, geometry='geometry', crs='EPSG:4326')
    
    # Load and process rental data
    # positional column indices in the Excel
    YEAR_COL, QUARTER_COL, NEIGH_COL, CATEGORY_COL, PRICE_COL = 0, 1, 3, 4, 6
    NEIGH_NAME = 'neighbourhood'
    
    # read raw rentals sheet
    raw = pd.read_excel(excel_rentals, sheet_name="Raw data", header=None)
    raw.rename(columns={NEIGH_COL: NEIGH_NAME}, inplace=True)
    
    # filter by years, Q1, all categories
    df_filt = raw[(raw.iloc[:, YEAR_COL].isin([LONDON_START_YEAR, LONDON_END_YEAR])) &
                   (raw.iloc[:, QUARTER_COL]=='Q1') &
                   (raw.iloc[:, CATEGORY_COL]=='All categories')].copy()
    
    # parse price and drop NAs
    df_filt[PRICE_COL] = pd.to_numeric(df_filt.iloc[:, PRICE_COL], errors='coerce')
    df_filt.dropna(subset=[NEIGH_NAME, PRICE_COL], inplace=True)
    
    # average price per neighbourhood per year
    avg_start = df_filt[df_filt.iloc[:, YEAR_COL]==LONDON_START_YEAR]
    avg_start = avg_start.groupby(NEIGH_NAME)[PRICE_COL].mean().reset_index(name=f"avg_price_{LONDON_START_YEAR}")
    avg_end = df_filt[df_filt.iloc[:, YEAR_COL]==LONDON_END_YEAR]
    avg_end = avg_end.groupby(NEIGH_NAME)[PRICE_COL].mean().reset_index(name=f"avg_price_{LONDON_END_YEAR}")
    
    # merge average prices
    df_rentals = pd.merge(avg_start, avg_end, on=NEIGH_NAME, how='outer').fillna(0)
    
    # compute price change always
    df_rentals['price_change'] = df_rentals[f"avg_price_{LONDON_END_YEAR}"] - df_rentals[f"avg_price_{LONDON_START_YEAR}"]
    
    # Merge with neighbourhood geometries and compute Airbnb density
    gdf_neigh = gpd.read_file(geojson_neigh)
    gdf_neigh = gdf_neigh.merge(df_rentals, on=NEIGH_NAME, how='left').fillna(0)
    joined = gpd.sjoin(gdf_airbnb, gdf_neigh, how='left', predicate='within')
    counts = joined.groupby('neighbourhood_right').size().reset_index(name='airbnb_count')
    counts.rename(columns={'neighbourhood_right':'neighbourhood'}, inplace=True)
    gdf_neigh = gdf_neigh.merge(counts, on='neighbourhood', how='left').fillna({'airbnb_count':0})
    gdf_neigh['area_km2'] = gdf_neigh.to_crs(epsg=3857).area / 1e6
    gdf_neigh['airbnb_density'] = gdf_neigh['airbnb_count'] / gdf_neigh['area_km2']
    
    print(f"Number of unique London neighborhoods: {len(gdf_neigh)}")
    print(f"Total Airbnb listings in London: {len(gdf_airbnb)}")
    
    return gdf_neigh

&nbsp;
# 2) Comparative Analysis
&nbsp;

In [None]:
# Load data for both cities
paris_data = load_paris_data()
london_data = load_london_data()
print("\nOK")

&nbsp;
## What are the Airbnb densities in Paris & London ?
&nbsp;

In [None]:
# 1. Airbnb Density Maps
fig, axes = plt.subplots(1, 2, figsize=(18, 8))

paris_data.plot(column='airbnb_density', cmap='Blues', legend=True, ax=axes[0], edgecolor='black')
axes[0].set_title('Paris: Airbnb Density (listings/km²)')

london_data.plot(column='airbnb_density', cmap='Greens', legend=True, ax=axes[1], edgecolor='black')
axes[1].set_title('London: Airbnb Density (listings/km²)')

plt.tight_layout()
plt.show()

&nbsp;
## What are the long term rental price increase in Paris & London ?
&nbsp;

In [None]:
# 2. Rental Price Increase Maps
fig, axes = plt.subplots(1, 2, figsize=(18, 8))

paris_data.plot(column='price_increase', cmap='Blues', legend=True, ax=axes[0], edgecolor='black')
axes[0].set_title('Paris: Rental Price Increase (2024 - 2019)')

london_data.plot(column='price_change', cmap='Greens', legend=True, ax=axes[1], edgecolor='black')
axes[1].set_title(f'London: Rental Price Change ({LONDON_END_YEAR}–{LONDON_START_YEAR})')

plt.tight_layout()
plt.show()

&nbsp;
# 3) Data Exploration

## Market Structure

In [None]:
# 3. Density Distribution Comparison
fig, ax = plot_density_distribution_comparison(paris_data, london_data)
plt.show()

Right-skewed distributions suggest a few neighborhoods with very high Airbnb concentration, which might represent two distinct type of neighbourhoods : tourist vs residential.
&nbsp;
## How does the Airbnb density compare to rental price increase ?
&nbsp;

In [None]:
# 4. Scatter Plot: Airbnb Density vs Price Increase
fig, axes = plot_airbnb_density_scatter_comparison(paris_data, london_data)
plt.show()

&nbsp;
## Let's do a simple polynomial regression
### With cross validation to find the best degree of approximation
&nbsp;

In [None]:
# 5. Bias-Variance Trade-off Analysis
paris_X = paris_data['price_increase'].values.reshape(-1, 1)
paris_Y = paris_data['airbnb_density'].values
paris_cv_results = fit_polynomial_models(paris_X, paris_Y)
print("\nParis Cross-validation results:")
print(paris_cv_results)

london_X = london_data['price_change'].values.reshape(-1, 1)
london_Y = london_data['airbnb_density'].values
london_cv_results = fit_polynomial_models(london_X, london_Y)
print("\nLondon Cross-validation results:")
print(london_cv_results)

fig, axes = plot_bias_variance_tradeoff_comparison(paris_cv_results, london_cv_results)
plt.show()

&nbsp;
## What is the best polynomial degree to fit theses datapoints ?
&nbsp;

In [None]:
# Fit best models for both Paris and London
paris_best_deg, paris_poly, paris_model, paris_r = fit_best_model(paris_X, paris_Y, paris_cv_results)
london_best_deg, london_poly, london_model, london_r = fit_best_model(london_X, london_Y, london_cv_results)

print(f"Best Polynomial Degree for Paris : {paris_best_deg}")
print(f"Best Polynomial Degree for London : {london_best_deg}")
# 6. Plot polynomial regression comparison
fig, axes = plot_polynomial_regression_comparison(
    paris_X, paris_Y, paris_poly, paris_model, paris_r, paris_best_deg,
    london_X, london_Y, london_poly, london_model, london_r, london_best_deg
)
plt.show()

&nbsp;
## Can we get a better sens of this tends ?
### Using a box plot
&nbsp;

In [None]:
# 7. Box Plots - with 5 bins
paris_data['price_increase_bin'] = pd.qcut(paris_data['price_increase'], q=5, duplicates='drop')
london_data['price_bin'] = pd.qcut(london_data['price_change'], q=5, duplicates='drop')

plot_boxplot_comparison(paris_data, london_data)
plt.show()

&nbsp;
## And an even better ?
### By plotting the median of each bins of the box plots
&nbsp;

In [None]:
# 8. Regression on Binned Data
plot_quadratic_fit_comparison(paris_data, london_data, paris_best_deg, london_best_deg)
plt.show()

&nbsp;
## What are the correlations ?
### By using Pearson correlation
The Pearson correlation coefficient encapsulates in a single value between –1 and 1 the strength and direction of a straight‑line relationship: +1 denotes perfect positive alignment, –1 perfect negative alignment, and 0 no linear connection. It could potentially embedded more information than the R² term.
&nbsp;

In [None]:
# 9. Pearson Correlation Comparison
paris_corr, paris_pvalue = pearsonr(paris_data['price_increase'], paris_data['airbnb_density'])
london_corr, london_pvalue = pearsonr(london_data['price_change'], london_data['airbnb_density'])

# Create correlation bar chart
plot_correlation_bar_chart(paris_corr, london_corr, paris_pvalue, london_pvalue)
plt.show()

significant_paris = paris_pvalue < 0.05
significant_london = london_pvalue < 0.05

print("\nParis Analysis:")
print(f"- The correlation coefficient between Price Increase and Airbnb Density of {paris_corr:.3f} is {'statistically significant' if significant_paris else 'not statistically significant'}")
print(f"- Neighborhoods with {'higher' if paris_corr > 0 else 'lower'} rental price changes tend to have {'higher' if paris_corr > 0 else 'lower'} Airbnb density")

print("\nLondon Analysis:")
print(f"- The correlation coefficient between Price Increase and Airbnb Density of {london_corr:.3f} is {'statistically significant' if significant_london else 'not statistically significant'}")
print(f"- Neighborhoods with {'higher' if london_corr > 0 else 'lower'} rental price changes tend to have {'higher' if london_corr > 0 else 'lower'} Airbnb density")

print("\nComparison:")
if (paris_corr > 0) == (london_corr > 0):
    print("- Both cities show a similar directional relationship between rental price changes and Airbnb density")
else:
    print("- The cities show opposite directional relationships between rental price changes and Airbnb density")

print("- Paris shows a", abs(paris_corr) > abs(london_corr) and "stronger" or "weaker", "correlation than London")

&nbsp;
## Optional observation For London
&nbsp;

In [None]:
# 10. London: Price Change vs Latest Price Comparison
fig, axes = plot_london_price_comparisons(london_data, london_best_deg)
plt.show()
print(secret_observation)

&nbsp;
# 4) Conclusions

In **Paris**, short‑term rental density peaks in neighbourhoods with **moderate** rental‑price increases (approximately 1.4–2 €/m²) and then declines in the areas experiencing the **lowest** and **highest** price growth.

In **London**, Airbnb density rises **consistently** across quintiles of rental‑price change, with median densities climbing from around 6 to 80 listings/km² and extreme values approaching 200 listings/km². The nearly monotonic increase indicates a moderate but direct link between stronger market growth and short‑term rental concentration.

Across both cities, these findings demonstrate a positive association between short‑term rentals and rental‑market prices, implying that Airbnb both follows and may contribute to rising rents. To address housing affordability concerns, further data analyses including controls for tourism, housing supply, and regulatory policies are essential for establishing causation and informing targeted interventions.  

&nbsp;
# 5) Open-ended Challenge

Assuming London is experiencing a higher increase in
rental prices than Paris. How would one try to explain that incremental
difference?

## a. What would be useful to explain that difference

#### 1. Law and Regulations
Example: Barcelona's proposed ban on short-term rentals could slow market growth ([Cities Today](https://cities-today.com/barcelona-set-to-ban-short-term-rentals/)). London or Paris might as well edict regulations.

#### 2. Public Opinion (Cultural Factors)
Indicators: Google Trends data, hashtags on social media platforms tracking evolving sentiment toward short-term rentals.

#### 3. Airbnb Occupancy Rates and Growth Dynamics
Data tracking Airbnb listings, occupancy rates, booking frequencies, and overall growth trends.

#### 4. GDP and Macroeconomic Factors
Regions with robust GDP growth and high employment levels support greater renter purchasing power, increasing rental market competition and prices.

#### 5. Local Infrastructure Dynamics
Impact of openings or closures of major cultural venues, tech campuses, or other significant attractions influencing nearby rental markets.

#### 6. Transportation Infrastructure
Influence of changes in airport operations and flight availability. I have this example in mind: Potential withdrawal of Ryanair from Carcassonne Airport, which may significantly affect local tourism and rental demand ([L’Indépendant](https://www.lindependant.fr/2025/03/18/la-compagnie-aerienne-ryanair-va-t-elle-se-retirer-de-laeroport-de-carcassonne-12577376.php)).

#### 7. Taxation and Mortgage Rates
Increased property-transaction taxes or mortgage-interest rates raise costs of ownership, pushing more households into rental markets and boosting rental demand.

#### 8. Age Distribution of the Population
Younger demographics, especially students and recent graduates, increase rental turnover and demand; ageing populations typically stabilize or reduce rental pressure.

#### 9. Demographics and Migration Flows
High net in-migration from domestic or international sources, coupled with high household-formation rates, intensifies demand and rental market pressures.

 



&nbsp;

## b. Investigation and analytical structure 

1. **Data Acquisition**  
   | Theme                  | Key Artifacts                          | Source & Frequency                                   | 
   |------------------------|--------------------------------------------------|------------------------------------------------------|
   | *Rent & Housing*       | Median asking rents, hedonic rent index, housing starts | Zoopla API (UK), SeLoger API (FR) – monthly          |
   | *Short‑Let Dynamics*   | Airbnb active listings, occupancy, RevPAR        | AirDNA – quarterly                                      |
   | *Regulation Timeline*  | Rent caps, tax changes                  | City council minutes, gov sites – event dates         |
   | *Public Sentiment*     | Google Trends scores, Instagram/X hashtag counts    | Google Trends, Brandwatch – weekly                    |
   | *Macro & Labour*       | GDP, unemployment, mortgage rate, CPI             | ONS (UK), INSEE/Eurostat (FR) – quarterly             |
   | *Population Movers*    | Net migration, student enrolments, Population age                 | Eurostat city stats – annual                          |
   | *Narrative Signals*    | Scrap top housing journal headlines and dates                          | Guardian, Le Monde,... – daily                        |

2. **Data Preparation**  
   - Harmonise geography (Greater London vs. Île‑de‑France) and time unit (quarters).  
   - Clean & fill gaps (like interpolate annual data; flag policy‑change quarters).  
   - Create simple lags (1 quarter) and interaction flags where relevant.

3. **Exploratory Analysis**  
   - Plot rent‑growth curves side by side to confirm the London–Paris gap.  
   - Compute correlations between rent growth and each driver.  
   - Note any sharp deviations around known events (policy roll‑outs, infrastructure openings).

4. **Attribution Check**  
   - For each candidate driver, calculate ΔDriver = Driver_London – Driver_Paris over the period.  
   - Apply a simple elasticity (for example %Δrent / %Δdriver) to estimate each factor’s contribution to the rent‑growth gap.

5. **Robustness & Sensitivity**  
   - Swap rent measures (mean vs. median; furnished vs. unfurnished).  
   - Flag extraordinary periods (Like COVID lockdown quarters).  
   - Test one‑at‑a‑time hypothetical swaps (London under Paris rent‑cap values).

6. **Synthesis & Communication**  
   - Rank the top 2–3 drivers by their estimated gap contribution.  
   - Visualise results with a simple bar chart: “Factor contributions to London–Paris rent gap.”  
   - Summarise findings in a brief narrative:  
     > “About 60 % of the higher rent price growth in London is explained by short‑let expansion and looser regulations, with mortgage‑cost shifts accounting for another 25 %.”  

