# Hello DBMS+ : De SQL aux Empreintes Carbone
Ce notebook présente un parcours complet d'analyse de données, alliant **requêtes SQL**, **exploration de tables multiples**, et **calcul de l’empreinte carbone**.  
L'objectif est de démontrer comment exploiter différentes sources de données pour générer des **insights pertinents**, à la fois pour la compréhension des données mondiales et pour l'évaluation environnementale.

Le notebook est structuré en deux grandes parties :  
1. **Jobs SQL 1 à 9** : exploration, filtrage, agrégation et jointures sur plusieurs tables (`world`, `students`, `nobel`, `SomeCompany`, etc.), permettant de comprendre et manipuler les données de manière progressive et pédagogique.  
2. **Big Job : Empreinte Carbone** : analyse des données de consommation énergétique par pays et région, calcul des émissions de CO2, visualisations et estimation du nombre d’arbres nécessaires pour compenser ces émissions.

In [1]:
# Imports

# Manipulation et analyse de données
import pandas as pd
import numpy as np

# Visualisation
import matplotlib.pyplot as plt
import seaborn as sns

# Pour afficher les graphiques directement dans le notebook
%matplotlib inline

# SQL magic pour exécuter des requêtes directement dans les cellules
%load_ext sql


# formatage des DataFrames pour la lisibilité
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

## Exploration et Analyse de Données avec SQL

In [2]:
# connecteur Postgresql
%sql postgresql://postgres:5219--ZmId*@localhost:65432/postgres
# %sql postgresql://postgres:****--****@localhost:*****/postgres

## Big Job

### Nettoyage du fichier csv 

#### Notes : 

1. **CSV et format**  
   - Encodage spécifique : `WIN1252`  
   - Délimiteur : `;`  
   - Ligne 143 complètement vide → problème d’import dans PostgreSQL  

2. **Séparation du dataset en deux tables**  
   - **Table `country`** → toutes les lignes avant `"World"` (noms de pays)  
   - **Table `world`** → ligne `"World"` + toutes les lignes régionales  

3. **Minuscule pour les noms**  
   - Tous les noms de pays sont stockés en minuscules (`LOWER(country)`) pour faciliter les futures requêtes  

4. **Colonnes des tables**  
   - `country` : colonnes → `country, coal, gas, oil, hydro, renewable, nuclear`  
   - `world` : colonnes → `region, coal, gas, oil, hydro, renewable, nuclear`


In [3]:
# import des données du CSV dans un dataframe pandas
csv_path= r"C:\Users\Paul-Emmanuel Buffe\Desktop\la_plateforme\travaux_la_plateforme\hello-dbms\data\carbon-footprint-data.csv"
df =pd.read_csv(csv_path,delimiter=';', encoding= 'windows-1252') # Encodage spécifique

# Suppression des lignes vides
df.dropna(how='all', inplace=True) # on ne supprime que les lignes qui sont complètement vides

In [4]:
print(df.head(3))

   Country  Coal   Gas   Oil  Hydro  Renewable  Nuclear
0  Albania  0.00  0.00  0.00 100.00       0.00     0.00
1  Algeria  0.00 97.80  1.80   0.40       0.00     0.00
2   Angola  0.00  0.00 46.80  53.20       0.00     0.00


### Création des tables
**Table "original_raw"**

In [5]:
# %%sql
# CREATE TABLE IF NOT EXISTS original_raw
#     country VARCHAR(100),
#     coal NUMERIC NULL,
#     gas NUMERIC NULL,
#     oil NUMERIC NULL,
#     hydro NUMERIC NULL,
#     renewable NUMERIC NULL,
#     nuclear NUMERIC NULL
# );

In [6]:
# insertion des données depuis le df
# for i, row in df.iterrows():
#     sql= f"""
#             INSERT INTO original_raw (country, coal, gas, oil, hydro, renewable, nuclear)
#             VALUES ('{row['Country']}', '{row['Coal']}', '{row['Gas']}', '{row['Oil']}', '{row['Hydro']}', '{row['Renewable']}', '{row['Nuclear']}')
#           """
#     %sql {sql}

**Table "country"**

In [7]:
# %%sql
# CREATE TABLE IF NOT EXISTS country(
#     country VARCHAR(100),
#     coal NUMERIC NULL,
#     gas NUMERIC NULL,
#     oil NUMERIC NULL,
#     hydro NUMERIC NULL,
#     renewable NUMERIC NULL,
#     nuclear NUMERIC NULL
# );

In [8]:
# %%sql
# INSERT INTO country(country, coal, gas, oil, hydro, renewable, nuclear)
# SELECT 
#     LOWER(country), 
#     coal, 
#     gas, 
#     oil, 
#     hydro, 
#     renewable, 
#     nuclear
# FROM original_raw
# WHERE country NOT IN ('World', 'East Asia & Pacific', 'Europe & Central',
#                       'Latin America & Caribbean', 'Middle East & North Afrika',
#                       'North America', 'South Asia', 'Sub­Saharan Africa');

**Table "world"**

In [9]:
# %%sql
# CREATE TABLE IF NOT EXISTS world(
#     region VARCHAR(100),
#     coal NUMERIC NULL,
#     gas NUMERIC NULL,
#     oil NUMERIC NULL,
#     hydro NUMERIC NULL,
#     renewable NUMERIC NULL,
#     nuclear NUMERIC NULL
# );

In [10]:
# %%sql
# INSERT INTO world(region, coal, gas, oil, hydro, renewable, nuclear)
# SELECT LOWER(country), 
#         coal, 
#         gas, 
#         oil, 
#         hydro, 
#         renewable, 
#         nuclear
# FROM original_raw
# WHERE country IN ('World', 'East Asia & Pacific', 'Europe & Central',
#                       'Latin America & Caribbean', 'Middle East & North Afrika',
#                       'North America', 'South Asia', 'Sub­Saharan Africa');

**Vérification des tables crées**

In [11]:
%sql SELECT * FROM country LIMIT 3;

 * postgresql://postgres:***@localhost:65432/postgres
3 rows affected.


country,coal,gas,oil,hydro,renewable,nuclear
albania,0.0,0.0,0.0,100.0,0.0,0.0
algeria,0.0,97.8,1.8,0.4,0.0,0.0
angola,0.0,0.0,46.8,53.2,0.0,0.0


In [12]:
%sql SELECT * FROM world LIMIT 3;

 * postgresql://postgres:***@localhost:65432/postgres
3 rows affected.


region,coal,gas,oil,hydro,renewable,nuclear
world,40.7,21.6,4.1,16.2,6.0,10.6
east asia & pacific,60.6,13.5,2.2,15.0,4.2,3.8
europe & central,24.1,24.3,1.3,16.6,10.5,22.4


### Préparation et analyse des sources d’énergie des pays pour l’étude des émissions de CO2

**Analyse des données au niveau mondial**

In [13]:
%%sql
WITH filtered AS (
  SELECT *
  FROM world
  WHERE region = 'world'
)
SELECT 'coal' AS source, coal AS value FROM filtered
UNION ALL
SELECT 'gas' AS source, gas AS value FROM filtered
UNION ALL
SELECT 'oil' AS source, oil AS value FROM filtered
UNION ALL
SELECT 'hydro' AS source, hydro AS value FROM filtered
UNION ALL
SELECT 'renewable' AS source, renewable AS value FROM filtered
UNION ALL
SELECT 'nuclear' AS source, nuclear AS value FROM filtered;




 * postgresql://postgres:***@localhost:65432/postgres
6 rows affected.


source,value
coal,40.7
gas,21.6
oil,4.1
hydro,16.2
renewable,6.0
nuclear,10.6


**Analyse des différents mix energétique des grands ensembles régionaux : Energies Fossiles VS Autres Energies**

In [14]:
%%sql
SELECT
    region,
    CEIL(gas + oil + coal) AS fossil_energie,
    ROUND(hydro + renewable + nuclear) AS others
FROM world WHERE region = 'world'
UNION ALL
SELECT*
FROM (SELECT 
region,
    (gas + oil + coal) AS fossil_energie,
    ROUND(hydro + renewable + nuclear) AS others
FROM world
ORDER BY 2 DESC LIMIT 1)
UNION ALL
SELECT*
FROM (SELECT 
region,
    (gas + oil + coal) AS fossil_energie,
    ROUND(hydro + renewable + nuclear) AS others
FROM world
ORDER BY 2 asc LIMIT 1)

 * postgresql://postgres:***@localhost:65432/postgres
3 rows affected.


region,fossil_energie,others
world,67.0,33
middle east & north afrika,96.3,3
latin america & caribbean,43.1,55


**Régions du monde pour lesquelles le charbon rentre dans plus du quart de leur production d'electricité**

In [15]:
%%sql
SELECT 
    DISTINCT region,
    MAX(coal) AS max_coal
FROM world
WHERE 1=1
AND region NOT IN ('world')
AND coal > 25
group by 1
ORDER BY 2 DESC

 * postgresql://postgres:***@localhost:65432/postgres
4 rows affected.


region,max_coal
south asia,65.7
east asia & pacific,60.6
sub­saharan africa,51.4
north america,35.7


**Analyse de l'utilisation des énérgies dites "propres" par région.**

**Energies renouvelables:**

In [16]:
%%sql
SELECT 
    DISTINCT region,
    MAX(renewable) AS max_renewable
FROM world
WHERE region NOT IN ('world')
group by 1
ORDER BY 2 DESC
LIMIT 1

 * postgresql://postgres:***@localhost:65432/postgres
1 rows affected.


region,max_renewable
europe & central,10.5


**Energie Nucléaire**

In [17]:
%%sql
SELECT 
    DISTINCT region,
    MAX(nuclear) AS max_nuclear
FROM world
WHERE region NOT IN ('world')
group by 1
ORDER by 2 DESC
LIMIT 1

 * postgresql://postgres:***@localhost:65432/postgres
1 rows affected.


region,max_nuclear
europe & central,22.4


**Energie Hydrolique**

In [18]:
%%sql
SELECT 
    region,
    MAX(hydro) AS max_hydro
FROM world
GROUP BY region
ORDER BY max_hydro DESC
LIMIT 1



 * postgresql://postgres:***@localhost:65432/postgres
1 rows affected.


region,max_hydro
latin america & caribbean,46.5


**Analyses des mix energétiques de la communanuté mondiale**

**Les pays avec plus de 66 % d'énergies dites propres**

In [19]:
%%sql
SELECT
country,
coal,
gas,
oil,
nuclear,
hydro,
renewable,
nuclear + hydro + renewable AS "total_energies_propres"
FROM country
WHERE nuclear + hydro + renewable >= 66
ORDER by 8 DESC



 * postgresql://postgres:***@localhost:65432/postgres
34 rows affected.


country,coal,gas,oil,nuclear,hydro,renewable,total_energies_propres
paraguay,0.0,0.0,0.0,0.0,100.0,0.0,100.0
albania,0.0,0.0,0.0,0.0,100.0,0.0,100.0
nepal,0.0,0.0,0.0,0.0,99.8,0.2,100.0
iceland,0.0,0.0,0.0,0.0,71.0,28.9,99.9
"congo, dem. rep.",0.0,0.1,0.0,0.0,99.9,0.0,99.9
ethiopia,0.0,0.0,0.1,0.0,95.6,4.3,99.9
namibia,0.0,0.0,0.9,0.0,99.1,0.0,99.1
sweden,0.6,0.3,0.2,42.3,41.5,14.3,98.1
norway,0.1,1.8,0.0,0.0,96.0,1.7,97.7
switzerland,0.0,0.7,0.1,39.3,54.3,3.8,97.4


**Les pays avec plus de 66 % d'énergies fossiles**

In [20]:
%%sql
SELECT
country,
coal,
gas,
oil,
nuclear,
hydro,
renewable,
coal + gas + oil AS "total_energies_fossiles"
FROM country
WHERE coal + gas + oil >= 66
ORDER by 8 DESC

 * postgresql://postgres:***@localhost:65432/postgres
69 rows affected.


country,coal,gas,oil,nuclear,hydro,renewable,total_energies_fossiles
"yemen, rep.",0.0,38.6,61.4,0.0,0.0,0.0,100.0
qatar,0.0,100.0,0.0,0.0,0.0,0.0,100.0
trinidad and tobago,0.0,99.8,0.2,0.0,0.0,0.0,100.0
saudi arabia,0.0,51.2,48.8,0.0,0.0,0.0,100.0
bahrain,0.0,100.0,0.0,0.0,0.0,0.0,100.0
kuwait,0.0,33.7,66.3,0.0,0.0,0.0,100.0
libya,0.0,53.7,46.3,0.0,0.0,0.0,100.0
oman,0.0,97.4,2.6,0.0,0.0,0.0,100.0
turkmenistan,0.0,100.0,0.0,0.0,0.0,0.0,100.0
botswana,95.8,0.0,4.2,0.0,0.0,0.0,100.0


**Distribution mondiale des mix energétiques de pays**

In [26]:
%%sql
SELECT
*
FROM country
WHERE coal >= 66 
ORDER BY 2 DESC

 * postgresql://postgres:***@localhost:65432/postgres
13 rows affected.


country,coal,gas,oil,hydro,renewable,nuclear
kosovo,96.9,0.0,0.3,2.8,0.0,0.0
botswana,95.8,0.0,4.2,0.0,0.0,0.0
south africa,93.0,0.0,0.1,0.4,1.0,5.5
mongolia,92.3,0.0,4.5,0.0,3.2,0.0
estonia,87.4,0.6,0.3,0.2,10.9,0.0
poland,83.0,3.4,1.0,1.4,11.1,0.0
"hong kong sar, china",76.2,23.0,0.6,0.0,0.2,0.0
india,75.1,4.9,1.8,10.2,5.2,2.8
china,72.6,2.0,0.2,18.6,4.1,2.3
kazakhstan,71.9,19.2,1.0,7.9,0.0,0.0


In [25]:
%%sql
SELECT
*
FROM country
WHERE renewable >= 33
ORDER BY 2 DESC

 * postgresql://postgres:***@localhost:65432/postgres
3 rows affected.


country,coal,gas,oil,hydro,renewable,nuclear
denmark,34.4,6.5,1.0,0.0,55.8,0.0
kenya,0.0,0.0,18.5,35.8,45.7,0.0
nicaragua,0.0,0.0,46.1,8.9,45.0,0.0


In [22]:
%%sql
SELECT
*
FROM country
WHERE nuclear >= 50

 * postgresql://postgres:***@localhost:65432/postgres
3 rows affected.


country,coal,gas,oil,hydro,renewable,nuclear
france,2.2,2.3,0.3,11.3,5.1,78.4
hungary,20.8,14.4,0.2,1.0,9.7,53.3
slovak republic,12.4,6.0,1.1,15.5,7.4,57.1


## Synthèse des observations sur la répartition mondiale des sources d’énergie électrique

### Vue d’ensemble mondiale

* Les énergies fossiles (pétrole, gaz, charbon) représentent 67 % de la production mondiale d’électricité.
* Les énergies dites propres (nucléaire, hydraulique, renouvelables) représentent seulement 33 %.
* Certaines régions atteignent des niveaux extrêmes : 96 % d’énergies fossiles pour seulement 4 % d’énergies propres.

### Analyse par régions

* L’énergie hydraulique constitue la principale composante des énergies propres dans la plupart des régions.
* L’Amérique latine est la région la moins dépendante des énergies fossiles (45 %), grâce à son importante production hydraulique.
* Trois régions utilisent encore le charbon à plus de 50 % dans leur mix électrique.
* En Amérique du Nord, le charbon représente encore 35 % de la production.
* L’Europe est la seule région ayant développé à la fois :

  * le nucléaire (22,4 %),
  * les énergies renouvelables (10,5 %).
    → Ensemble, ces deux sources ne couvrent néanmoins qu’un tiers du mix énergétique européen.

### Analyse par pays

* Grâce à l’hydroélectricité, 34 pays atteignent plus de 66 % d’énergies propres dans leur mix électrique.
* À l’opposé, 69 pays utilisent plus de 66 % d’énergies fossiles.
* Plus de 13 pays produisent plus des deux tiers de leur électricité à partir du charbon, notamment :

  * Chine : 73 %,
  * Inde : 75 %.
* Certains pays cumulant pourtant de fortes parts de renouvelables restent encore dépendants du charbon :

  * Exemple : Danemark, 56 % renouvelables mais 34 % charbon.
* Quelques exceptions notables :

  * France : 78 % d’électricité produite grâce au nucléaire.
  * Kenya : environ 90 % issus de l’hydraulique et des renouvelables.
