![logo uvbf](https://drive.google.com/uc?export=download&id=1eP-0JTAV3p7a_mhAPHyVdFiB9pPckKr6)

### Projet: **Traitement de donnés avec Pandas [(voir la description du projet ici)](https://drive.google.com/uc?export=download&id=1aV0Ubxitme75fwMlKcR-cWazEJ--zysC)**

- **etudiant**: kabore abdoul fataoh <br>
- **email**: [abdoulfataoh@gmail.com](mailto:abdoulfataoh@gmail.com)
- **classe**: Fouille de Donneés et Intelligence Artificielle

<hr style="height: 2px">

<br>

### 0. exporter la base de données qui recense les informations sur le Burkina Faso

### 1. Téléchargement et décompression du fichier zip (sous un shell bash)

```sh
 wget https://download.geonames.org/export/dump/BF.zip
 unzip BF.zip
```

### 2. Description des features du jeu de données ```BF.txt```

- ```geonameid```         : integer id of record in geonames database
- ```name```              : name of geographical point (utf8) varchar(200)
- ```asciiname```         : name of geographical point in plain ascii characters, varchar(200)
- ```alternatenames```    : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
- ```latitude```          : latitude in decimal degrees (wgs84)
- ```longitude```         : longitude in decimal degrees (wgs84)
- ```feature class```     : see http://www.geonames.org/export/codes.html, char(1)
- ```feature code```      : see http://www.geonames.org/export/codes.html, varchar(10)
- ```country code```      : ISO-3166 2-letter country code, 2 characters
- ```cc2```               : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
- ```admin1 code```       : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
- ```admin2 code```       : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) 
- ```admin3 code```       : code for third level administrative division, varchar(20)
- ```admin4 code```       : code for fourth level administrative division, varchar(20)
- ```population```        : bigint (8 byte int) 
- ```elevation```         : in meters, integer
- ```dem```               : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
- ```timezone```          : the iana timezone id (see file timeZone.txt) varchar(40)
- ```modification date``` : date of last modification in yyyy-MM-dd format

### 3. Chargement du jeu de données
Nous utilisons la bibliothèque **pandas** pour traiter les donnees dans ce projet. [(comment installer pandas)](https://pypi.org/project/pandas/)

In [1]:
import pandas as pd

In [2]:
features_names = [
    "geonameid",
    "name",
    "asciiname",
    "alternatenames",
    "latitude",
    "longitude",
    "feature class",
    "feature code",
    "country code",
    "cc2",
    "admin1 code",
    "admin2 code",
    "admin3 code",
    "admin4 code",
    "population",
    "elevation",
    "dem",
    "timezone",
    "modification date"
]

In [3]:
dataset = pd.read_csv(r"BF.txt", sep = "\t", names = features_names, keep_default_na=False)

In [4]:
dataset.head()

Unnamed: 0,geonameid,name,asciiname,alternatenames,latitude,longitude,feature class,feature code,country code,cc2,admin1 code,admin2 code,admin3 code,admin4 code,population,elevation,dem,timezone,modification date
0,2353158,Zyonguen,Zyonguen,,12.36667,-0.45,P,PPL,BF,,4,,,,0,,293,Africa/Ouagadougou,2012-06-05
1,2353159,Zyiliwèlè,Zyiliwele,,12.38333,-2.73333,P,PPL,BF,,6,,,,0,,277,Africa/Ouagadougou,2012-06-05
2,2353160,Zyanko,Zyanko,,12.78333,-0.41667,P,PPL,BF,,5,,,,0,,301,Africa/Ouagadougou,2012-06-05
3,2353161,Zouta,Zouta,,13.14908,-1.28197,P,PPL,BF,,5,70.0,,,0,,306,Africa/Ouagadougou,2010-07-31
4,2353162,Zourtenga,Zourtenga,,12.95741,-1.28745,P,PPL,BF,,5,,,,0,,290,Africa/Ouagadougou,2018-09-05


### 3. Opérations de prétraitement et filtres 

#### - Creation du nouveau dataset avec les colonnes ```Identifiants```,  ```Noms de lieux```,  ```latitudes```, ```longitudes```

In [5]:
select_columns = ["geonameid", "asciiname", "latitude", "longitude"]

# Nous avons chosi la colonne `asciiname` au lieu de `name` dans le but d'ignorer
# les caracteres accentuees. ceci impacterai nos resulats dans question 4 car les villes 
# Éléoui, Èrmos, Èrza commencent chacune avec un caractere accentueé:

In [6]:
burkina_location = dataset[select_columns]
burkina_location.head()

Unnamed: 0,geonameid,asciiname,latitude,longitude
0,2353158,Zyonguen,12.36667,-0.45
1,2353159,Zyiliwele,12.38333,-2.73333
2,2353160,Zyanko,12.78333,-0.41667
3,2353161,Zouta,13.14908,-1.28197
4,2353162,Zourtenga,12.95741,-1.28745


#### - Renommez les avec les noms suivants : ```'ID'```, ```'location_name'```, ```'lat'```, ```'long'```

In [7]:
new_column_names = ["ID", "location_name", "lat", "long"]

In [8]:
burkina_location.columns = new_column_names
burkina_location.head()

Unnamed: 0,ID,location_name,lat,long
0,2353158,Zyonguen,12.36667,-0.45
1,2353159,Zyiliwele,12.38333,-2.73333
2,2353160,Zyanko,12.78333,-0.41667
3,2353161,Zouta,13.14908,-1.28197
4,2353162,Zourtenga,12.95741,-1.28745


#### - Sauvegarder ces données dans un fichier CSV, nommez-le ```burkina_location.csv```

In [9]:
burkina_location.to_csv("burkina_location.csv", encoding="utf-8")

 <br>

### 4. Opérations sur le fichier CSV burkina_location.csv.

##### - Extraire les données contenant le nom ```'gounghin'```, enregistrez-le sous le fichier ```gounghin.csv```

In [10]:
# note: la casse est ignorée
mask = burkina_location["location_name"].str.contains(r"gounghin", case=False, regex=True)== True
gounghin = burkina_location[mask]

In [11]:
# Sauvegarde
gounghin.to_csv("gounghin.csv", encoding="utf-8")
gounghin

Unnamed: 0,ID,location_name,lat,long
147,2353306,Gounghin,12.06677,-1.42134
7256,2360473,Gounghin,12.62488,-1.36398
10227,2570204,Gounghin,12.31436,-1.379
10688,10342749,Gounghin,12.06667,-0.15
10701,10629032,BICIAB // Gounghin,12.35921,-1.54273
10760,11257296,Gounghin Department,12.06671,-0.15484
10787,11900526,Gounghin Nord,12.3612,-1.55055
10788,11900528,Zone Industrielle de Gounghin,12.36631,-1.54137
10794,11900619,Gounghin Sud,12.35298,-1.54342
10808,11900680,Gounghin,12.35895,-1.54442


##### - Extraire la sous-partie de la base de données (fichier burkina_location.csv), dont les premières lettres des noms de lieux sont compris entre 'A' et 'P' (ordre alphabétique).

In [12]:
mask = burkina_location["location_name"].str.contains(r"^[A-P]", regex=True, case=True)== True

In [13]:
burkina_location_A_P = burkina_location[mask].sort_values(by=["location_name"], ascending=True)
burkina_location_A_P

Unnamed: 0,ID,location_name,lat,long
10590,6913771,Abanda,15.06808,-0.59805
10013,2363251,Abanga,13.32429,0.31151
11035,11980339,Abassi,12.27728,-1.13662
10551,6874881,Abaye,13.44080,-3.90190
10012,2363250,Aberekui,12.50000,-3.41667
...,...,...,...,...
10357,2597270,Province du Zondoma,13.18333,-2.36667
3107,2356291,Pwedogo,12.67657,-1.86640
3106,2356290,Pwiga,14.42583,-0.50691
3105,2356289,Pyeongou,12.11667,0.55000


##### - Identifiez respectivement, la latitude, la longitude minimale et les noms de lieux correspondants dans le fichier burkina_location.csv.

In [14]:
# latitude minimale
min_lat_mask = burkina_location["lat"] == burkina_location["lat"].min()
burkina_location[min_lat_mask]

Unnamed: 0,ID,location_name,lat,long
11149,12224995,Fadio-Mepehn,9.4295,-2.7775


In [15]:
# longitude minimale
min_long_mask = burkina_location["long"] == burkina_location["long"].min()
burkina_location[min_long_mask]

Unnamed: 0,ID,location_name,lat,long
1255,2354426,Tinobole,10.75,-5.48333


##### - Quels sont les lieux dont les coordonnées sont comprises entre (lat >= 11 et lon <= 0.5)

In [16]:
mask = (burkina_location["lat"] >= 11) & (burkina_location["long"] <= 0.5)
burkina_location[mask]

Unnamed: 0,ID,location_name,lat,long
0,2353158,Zyonguen,12.36667,-0.45000
1,2353159,Zyiliwele,12.38333,-2.73333
2,2353160,Zyanko,12.78333,-0.41667
3,2353161,Zouta,13.14908,-1.28197
4,2353162,Zourtenga,12.95741,-1.28745
...,...,...,...,...
11288,12358467,Kate,14.13461,-0.81244
11290,12358654,Koulhole,13.43898,-1.16817
11293,12358657,Nagbingou,13.55244,-0.46760
11295,12358676,Sella,14.35699,0.28666


### 5. Sorties Excel

#### - Créer un fichier Excel et nommer le : mini_projet

In [17]:
writer = pd.ExcelWriter("mini_projet.xlsx", engine="xlsxwriter")

#### - Créer une feuille dans ce fichier, du nom gounghin et enregistrer les données contenant le nom ‘gounghin’ obtenues dans 4.1

In [18]:
gounghin.to_excel(writer, sheet_name="gounghin", encoding="utf-8")

#### - Créer une second feuille dans ce même fichier, du nom A_to_P et enregistrer les données de 4.2

In [19]:
burkina_location_A_P.to_excel(writer, sheet_name="A_to_P", encoding="utf-8")

In [20]:
#### Sauvegarde du fichier sur le disque

In [21]:
writer.save()