# Projet 5 – VizData – Groupe Y
### Air Pollution Exposure Near Educational Institutions in Île-de-France
**Team Members & Roles:**
- CEO: [Name]
- HR Director: [Name]
- Software Engineer: [Name]
- Data Scientist: [Name]
- Marketing Director: [Name]
- Communication Director: [Name]

## 1. Problem Statement

Air pollution is a major concern in urban areas like Paris, especially around educational institutions. This project aims to analyze pollution data (NO₂, PM10, PM2.5) from 2012 to 2017 around schools and crèches in Île-de-France. Our goals:
- Track pollution trends over time
- Identify the most affected zones
- Recommend safer areas for future school planning

## 2. Dataset Description

The data was compiled from public datasets provided by data.gouv.fr and Airparif. It includes:
- Annual average concentrations of NO₂, PM10, and PM2.5
- Geolocated measurements near educational institutions
- Cross-referenced data from EAJE and school address registries

📎 [Data Source](https://www.data.gouv.fr/fr/datasets/r/cc16163c-aca0-4977-97da-8ce592f78de1)

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('air_pollution_paris.csv')  # update with your file path

# Preview
df.head()

### 4.1 Trends Over Time

In [ ]:
sns.lineplot(data=df, x="Year", y="NO2", label="NO₂")
sns.lineplot(data=df, x="Year", y="PM10", label="PM10")
sns.lineplot(data=df, x="Year", y="PM2.5", label="PM2.5")
plt.title("Pollution Trends Over Time")
plt.ylabel("µg/m³")
plt.show()

### 4.2 Correlation Between Pollutants

In [ ]:
sns.heatmap(df[["NO2", "PM10", "PM2.5"]].corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

### 4.3 Pollution by Arrondissement

In [ ]:
arr_mean = df.groupby("Arrondissement")[["NO2", "PM10", "PM2.5"]].mean().sort_values("NO2", ascending=False)
arr_mean.plot(kind="bar", figsize=(12,6), title="Average Pollution by Arrondissement")
plt.ylabel("µg/m³")
plt.show()

## 5. Interpretation of Results

- NO₂ and PM10 show steady decline, PM2.5 fluctuates.
- High correlation between pollutants indicates shared sources (traffic, combustion).
- Most polluted districts: 75004, 75020. Safest: 75015, 75016.

## 6. Recommendations

- Prioritize construction in safe zones (75015/75016)
- Avoid new institutions in high-risk zones (75004/75020)
- Retrofit marginal districts (e.g. 75010) with filters and greenery

## 7. Supporting Materials

🖼️ Project Poster: `./poster/air_pollution_poster.png`  
📊 PowerPoint: `./presentation/AirPollution_Presentation.pptx`

## 8. Conclusion

This analysis highlights the importance of urban planning in protecting children's health. Our results can support data-driven decisions for safer school placement and long-term air quality improvement policies in Île-de-France.