Connexion SQLite

In [1]:
import sqlite3
import pandas as pd

conn = sqlite3.connect("../data/processed/airbnb_analysis.db")


1) KPI — Nombre d’annonces par ville (via schéma étoile)

In [2]:
query = """
SELECT c.city AS city,
       COUNT(*) AS nb_listings
FROM fact_listings f
JOIN dim_city c ON f.city_id = c.city_id
GROUP BY c.city
ORDER BY nb_listings DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,nb_listings
0,Madrid,18833
1,Barcelona,15054


2) KPI — Prix moyen par ville

In [3]:
query = """
SELECT c.city AS city,
       ROUND(AVG(f.price), 2) AS avg_price
FROM fact_listings f
JOIN dim_city c ON f.city_id = c.city_id
GROUP BY c.city
ORDER BY avg_price DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,avg_price
0,Barcelona,158.26
1,Madrid,134.16


3) KPI — Prix médian par ville (SQL pur)

In [4]:
query = """
WITH ranked AS (
  SELECT
    c.city AS city,
    f.price AS price,
    ROW_NUMBER() OVER (PARTITION BY c.city ORDER BY f.price) AS rn,
    COUNT(*) OVER (PARTITION BY c.city) AS cnt
  FROM fact_listings f
  JOIN dim_city c ON f.city_id = c.city_id
)
SELECT city,
       ROUND(AVG(price), 2) AS median_price
FROM ranked
WHERE rn IN ((cnt + 1) / 2, (cnt + 2) / 2)
GROUP BY city;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,median_price
0,Barcelona,129.0
1,Madrid,110.0


4) KPI — Prix moyen par type de logement et par ville

In [5]:
query = """
SELECT c.city AS city,
       rt.room_type AS room_type,
       ROUND(AVG(f.price), 2) AS avg_price,
       COUNT(*) AS nb_listings
FROM fact_listings f
JOIN dim_city c ON f.city_id = c.city_id
JOIN dim_room_type rt ON f.room_type_id = rt.room_type_id
GROUP BY c.city, rt.room_type
ORDER BY c.city, avg_price DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,room_type,avg_price,nb_listings
0,Barcelona,Hotel room,223.94,50
1,Barcelona,Entire home/apt,191.6,10288
2,Barcelona,Private room,85.01,4610
3,Barcelona,Shared room,76.21,106
4,Madrid,Entire home/apt,157.63,13561
5,Madrid,Hotel room,151.1,41
6,Madrid,Private room,73.94,5084
7,Madrid,Shared room,46.42,147


5) KPI — Disponibilité moyenne par ville

In [6]:
query = """
SELECT c.city AS city,
       ROUND(AVG(f.availability_365), 2) AS avg_availability
FROM fact_listings f
JOIN dim_city c ON f.city_id = c.city_id
GROUP BY c.city
ORDER BY avg_availability DESC;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,avg_availability
0,Barcelona,227.26
1,Madrid,212.92


6) KPI — Top quartiers les plus chers (avec seuil pour éviter bruit)

In [7]:
query = """
SELECT c.city AS city,
       n.neighbourhood AS neighbourhood,
       ROUND(AVG(f.price), 2) AS avg_price,
       COUNT(*) AS nb_listings
FROM fact_listings f
JOIN dim_city c ON f.city_id = c.city_id
JOIN dim_neighbourhood n ON f.neighbourhood_id = n.neighbourhood_id
GROUP BY c.city, n.neighbourhood
HAVING nb_listings >= 30
ORDER BY avg_price DESC
LIMIT 10;
"""
pd.read_sql(query, conn)


Unnamed: 0,city,neighbourhood,avg_price,nb_listings
0,Barcelona,Diagonal Mar i el Front Marítim del Poblenou,241.86,126
1,Barcelona,la Dreta de l'Eixample,223.32,1902
2,Madrid,Recoletos,216.59,260
3,Barcelona,la Vila Olímpica del Poblenou,213.86,132
4,Madrid,Castellana,208.83,160
5,Madrid,Goya,187.02,313
6,Barcelona,Sant Antoni,185.49,794
7,Barcelona,l'Antiga Esquerra de l'Eixample,185.34,787
8,Barcelona,el Fort Pienc,178.73,385
9,Madrid,Cortes,176.94,855


In [8]:
conn.close()
print("✅ Étape 2 terminée : KPI SQL prêts.")


✅ Étape 2 terminée : KPI SQL prêts.
