# Exploration — Environnement

Exploration des fichiers environnementaux :
- **CatNat** : `catnat_gaspar.csv` (arrêtés de catastrophes naturelles)
- **Risques** : `risq_gaspar.csv` (risques GASPAR par commune)

In [1]:
import pandas as pd
import os

DATA_DIR = os.path.join(os.path.dirname(os.getcwd()), '')  # dossier parent = environnement/
print(f"Dossier données : {DATA_DIR}")

Dossier données : /Users/cedricsanchez/Documents/MSPR/MSPR_1/data/input/environnement/


## 1. Catastrophes naturelles (CatNat)

In [2]:
catnat = pd.read_csv(os.path.join(DATA_DIR, 'catnat_gaspar.csv'), sep=',', low_memory=False)
print(f"Shape : {catnat.shape}")
print(f"Colonnes : {list(catnat.columns)}")
catnat.head()

Shape : (260799, 1)
Colonnes : ['cod_nat_catnat;cod_commune;lib_commune;num_risque_jo;lib_risque_jo;dat_deb;dat_fin;dat_pub_arrete;dat_pub_jo;dat_maj']


Unnamed: 0,cod_nat_catnat;cod_commune;lib_commune;num_risque_jo;lib_risque_jo;dat_deb;dat_fin;dat_pub_arrete;dat_pub_jo;dat_maj
0,BUDD8750027A;06120;Saint-Étienne-de-Tinée;GLT;...
1,BUDD8750038A;2B050;Calvi;ICB;Inondations et/ou...
2,BUDD8750038A;30006;Aimargues;ICB;Inondations e...
3,BUDD8750038A;30020;Aubord;ICB;Inondations et/o...
4,BUDD8750038A;30033;Beauvoisin;ICB;Inondations ...


In [3]:
catnat.info()
print()
catnat.describe(include='all')

<class 'pandas.DataFrame'>
RangeIndex: 260799 entries, 0 to 260798
Data columns (total 1 columns):
 #   Column                                                                                                                Non-Null Count   Dtype
---  ------                                                                                                                --------------   -----
 0   cod_nat_catnat;cod_commune;lib_commune;num_risque_jo;lib_risque_jo;dat_deb;dat_fin;dat_pub_arrete;dat_pub_jo;dat_maj  260799 non-null  str  
dtypes: str(1)
memory usage: 36.0 MB



Unnamed: 0,cod_nat_catnat;cod_commune;lib_commune;num_risque_jo;lib_risque_jo;dat_deb;dat_fin;dat_pub_arrete;dat_pub_jo;dat_maj
count,260799
unique,248398
top,INTE9600255A;13055;Marseille;MVT;Mouvement de ...
freq,6


In [4]:
# Types de catastrophes naturelles
for col in catnat.select_dtypes(include='object').columns:
    n_unique = catnat[col].nunique()
    if n_unique <= 30:
        print(f"{col} ({n_unique} valeurs) :")
        print(catnat[col].value_counts().head(15))
        print()

See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  for col in catnat.select_dtypes(include='object').columns:


## 2. Risques GASPAR

In [5]:
risques = pd.read_csv(os.path.join(DATA_DIR, 'risq_gaspar.csv'), sep=',', low_memory=False)
print(f"Shape : {risques.shape}")
print(f"Colonnes : {list(risques.columns)}")
risques.head()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 1858, saw 2


In [None]:
risques.info()
print()
risques.describe(include='all')

## 3. Qualité des données

In [None]:
for nom, df in [('CatNat', catnat), ('Risques', risques)]:
    nulls = df.isnull().sum()
    total = nulls.sum()
    print(f"{nom} — {total} valeurs manquantes sur {df.shape[0] * df.shape[1]}")
    if total > 0:
        for col, n in nulls[nulls > 0].head(10).items():
            print(f"  {col}: {n} ({100*n/len(df):.1f}%)")