![logo uvbf](meta/logo.png)

### Projet: **Traitement de donnés avec Pandas**

- **etudiant**: kabore abdoul fataoh <br>
- **email**: [abdoulfataoh@gmail.com](mailto:abdoulfataoh@gmail.com)
- **classe**: Fouille de Donneés et Intelligence Artificielle

<hr style="height: 2px">

#### 1. Exporter la base de données qui recense les informations sur le Burkina Faso

#### 0. Téléchargement et décompression du fichier zip (sous un shell bash)

```sh
 wget https://download.geonames.org/export/dump/BF.zip
 unzip BF.zip
```

#### 1. Description des features du jeu de données ```BF.txt```

- ```geonameid```         : integer id of record in geonames database
- ```name```              : name of geographical point (utf8) varchar(200)
- ```asciiname```         : name of geographical point in plain ascii characters, varchar(200)
- ```alternatenames```    : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
- ```latitude```          : latitude in decimal degrees (wgs84)
- ```longitude```         : longitude in decimal degrees (wgs84)
- ```feature class```     : see http://www.geonames.org/export/codes.html, char(1)
- ```feature code```      : see http://www.geonames.org/export/codes.html, varchar(10)
- ```country code```      : ISO-3166 2-letter country code, 2 characters
- ```cc2```               : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
- ```admin1 code```       : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
- ```admin2 code```       : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) 
- ```admin3 code```       : code for third level administrative division, varchar(20)
- ```admin4 code```       : code for fourth level administrative division, varchar(20)
- ```population```        : bigint (8 byte int) 
- ```elevation```         : in meters, integer
- ```dem```               : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
- ```timezone```          : the iana timezone id (see file timeZone.txt) varchar(40)
- ```modification date``` : date of last modification in yyyy-MM-dd format

#### 2. Chargement du jeu de données

In [14]:
import pandas as pd

In [45]:
features_names = [
    "geonameid",
    "name",
    "asciiname",
    "alternatenames",
    "latitude",
    "longitude",
    "feature class",
    "feature code",
    "country code",
    "cc2",
    "admin1 code",
    "admin2 code",
    "admin3 code",
    "admin4 code",
    "population",
    "elevation",
    "dem",
    "timezone",
    "modification date"
]

In [41]:
dataset = pd.read_csv(r"BF.txt", sep = "\t", names = features_names, keep_default_na=False)

In [44]:
dataset.head()

Unnamed: 0,geonameid,name,asciiname,alternatenames,latitude,longitude,feature class,feature code,country code,cc2,admin1 code,admin2 code,admin3 code,admin4 code,population,elevation,dem,timezone,modification date
0,2353158,Zyonguen,Zyonguen,,12.36667,-0.45,P,PPL,BF,,4,,,,0,,293,Africa/Ouagadougou,2012-06-05
1,2353159,Zyiliwèlè,Zyiliwele,,12.38333,-2.73333,P,PPL,BF,,6,,,,0,,277,Africa/Ouagadougou,2012-06-05
2,2353160,Zyanko,Zyanko,,12.78333,-0.41667,P,PPL,BF,,5,,,,0,,301,Africa/Ouagadougou,2012-06-05
3,2353161,Zouta,Zouta,,13.14908,-1.28197,P,PPL,BF,,5,70.0,,,0,,306,Africa/Ouagadougou,2010-07-31
4,2353162,Zourtenga,Zourtenga,,12.95741,-1.28745,P,PPL,BF,,5,,,,0,,290,Africa/Ouagadougou,2018-09-05


#### 3. Appliquons les opérations de prétraitement et filtres nécessaires à ce fichier, pour ne garder que les colonnes correspondantes :
- Identifiants, Noms de lieux, latitudes, longitudes
- Renommez les avec les noms suivants : 'ID', 'location_name', 'lat', 'long'
- Sauvegarder ces données dans un fichier CSV, nommez-le burkina_location.csv

In [52]:
select_columns = ["geonameid", "name", "latitude", "longitude"]

In [54]:
burkina_location = dataset[select_columns]
burkina_location.head()

Unnamed: 0,geonameid,name,latitude,longitude
0,2353158,Zyonguen,12.36667,-0.45
1,2353159,Zyiliwèlè,12.38333,-2.73333
2,2353160,Zyanko,12.78333,-0.41667
3,2353161,Zouta,13.14908,-1.28197
4,2353162,Zourtenga,12.95741,-1.28745


In [55]:
new_column_names = ["ID", "location_name", "lat", "long"]

In [60]:
burkina_location.columns = new_column_names
burkina_location.head()

Unnamed: 0,ID,location_name,lat,long
0,2353158,Zyonguen,12.36667,-0.45
1,2353159,Zyiliwèlè,12.38333,-2.73333
2,2353160,Zyanko,12.78333,-0.41667
3,2353161,Zouta,13.14908,-1.28197
4,2353162,Zourtenga,12.95741,-1.28745


###### Sauvegarde de ```burkina_location``` sous format csv

In [70]:
burkina_location.to_csv("burkina_location.csv", encoding="utf-8")

 <br>

#### 4. Opérations sur le fichier CSV burkina_location.csv.

##### - Extraire les données contenant le nom 'gounghin', enregistrez-le sous le fichier ```gounghin.csv```

In [113]:
# note: la casse est ignorée
mask = burkina_location["location_name"].str.contains(r"gounghin", case=False, regex=True)== True

In [None]:
gounghin = burkina_location[mask]

In [114]:
# Sauvegarde
gounghin = burkina_location[mask]
gounghin.to_csv("gounghin.csv", encoding="utf-8")
gounghin

Unnamed: 0,ID,location_name,lat,long
147,2353306,Gounghin,12.06677,-1.42134
7256,2360473,Gounghin,12.62488,-1.36398
10227,2570204,Gounghin,12.31436,-1.379
10688,10342749,Gounghin,12.06667,-0.15
10701,10629032,BICIAB // Gounghin,12.35921,-1.54273
10760,11257296,Gounghin Department,12.06671,-0.15484
10787,11900526,Gounghin Nord,12.3612,-1.55055
10788,11900528,Zone Industrielle de Gounghin,12.36631,-1.54137
10794,11900619,Gounghin Sud,12.35298,-1.54342
10808,11900680,Gounghin,12.35895,-1.54442
