# 19 fev. 2016 NUMA - Python script | [Hackfrancophonie](https://www.etalab.gouv.fr/hackfrancophonie-un-open-data-camp-autour-des-donnees-ouvertes-par-les-pays-francophones) | Ecole au Mali
---


<img src="HackFrancophonie.png"></img>

## L'équipe *Ecole au Mali*:

Thomas Roca (Agence Française de développement) & Romain Dorgueil ()<br>
Claire-lise Dubost (), Patrick Zougbede (OECD-PARIS21), Guillaume Huet () 


## Données disponibles / Data availability:
+ We are using data provided by HackFrancophonie available on the [GitHub folder](https://github.com/etalab/HackFrancophonie) of the event
+ More specifically we use data on schools in Mali [Liste des écoles au Mali](https://github.com/etalab/HackFrancophonie/wiki/Liste-des-%C3%A9coles-du-Mali) | [Download the data](https://raw.githubusercontent.com/opendatamali/datasets/master/ecole-mali/MLI_schools.csv)
+ Population in Mali (Census) from [Mali Data Atlas 2013](http://mali.opendataforafrica.org/bqrabjg/mali-data-atlas-26-april-2013) which provides information at the disctict level (Cercles) | [Download the data](https://raw.githubusercontent.com/ThomasRoca/data/master/PopulationDataMali.csv)
+ We also further information from [UN OCHA HDX platform](https://data.hdx.rwlabs.org/dataset/administrative-boundaries-cod-mli) that provides [shapefiles for the district (Cercles) in Mali](http://data.hdx.rwlabs.org/dataset/d2ec62bb-5a93-436d-8297-88b3ee9b6818/resource/986d42a2-dfa1-4317-aaa8-a1cb276ee5bd/download/mli-admnbnda-adm2-gov.zip)
+ Outils utilisés: Python 3.4 and [CartoDB](http://www.cartodb.com)


## Questions posées: 
+ les données ouvertes peuvent-elles permettre d'éclairer la répartition géographique des écoles au Mali ?
+ d'éventuelles inégalités dans la distribution des écoles sur le territoire.
+ existe-t-il des régions dans lesquelles on observerait un sous-effectif d'enseignants ?
+ Observe t-on une inégalité garçons-filles dans l'accès à l'éducation au Mali ?

## Limites et améliorations possibles
+ Un nombre important d'écoles ne sont pas géolocalisées dans la base de données à laquelle nous avons eu accès.
+ Des *crowed mappers* pourraient poursuivre le travail réalisé sur les données [Liste des écoles au Mali](https://github.com/etalab/HackFrancophonie/wiki/Liste-des-%C3%A9coles-du-Mali) et contribuer à géolocaliser les écoles qui ne le sont pas encore.

# I. Data manipulation
## I.1 School dataset

In [2]:
import pandas as pd
import numpy as np

#Loading dataset
data_school="https://raw.githubusercontent.com/opendatamali/datasets/master/ecole-mali/MLI_schools.csv"
dataset = pd.read_csv(data_school)

dataset['Cercle'] = dataset['Cercle'].str.lower().str.replace('-', ' ')
#print(dataset.columns)

dataset.head()

Unnamed: 0,Région,AE,CAP,Cercle,Commune,NOM_ETABLISSEMENT,Localites,X,Y,CODE_ETABLISSEMENT,...,STATUT,PRESENCE_RESTAURANT,PRESENCE_LATRINES,LATRINES_FILLES_SEPAREES,NOMBRE_LATRINES,EAU_POTABLE,GARCONS,FILLES,TOTAL,NBRE ENSEIGNANTS
0,BAMAKO,BAMAKO RIVE GAUCHE,SEBENICORO,bamako,COMMUNE IV,ABDRAHAMANE DIALLO,,,,11073468,...,Privé laïc,0,1,0,4,1) robinet,35,29,64,9
1,BAMAKO,BAMAKO RIVE DROITE,FALADIE,bamako,COMMUNE VI,ATTAHAZIBIATOU,,,,11074493,...,Medersa,0,1,0,2,5) pas de point d'eau,26,23,49,7
2,BAMAKO,BAMAKO RIVE DROITE,KALABAN COURA,bamako,COMMUNE V,ECOLE PRIVEE LAROUSSE,,,,11074981,...,Privé laïc,0,1,0,2,1) robinet,118,134,252,8
3,BAMAKO,BAMAKO RIVE GAUCHE,SEBENICORO,bamako,COMMUNE IV,LA REFERENCE,,,,11073466,...,Privé laïc,0,1,1,2,5) pas de point d'eau,19,21,40,4
4,BAMAKO,BAMAKO RIVE DROITE,TOROKOROBOUGOU,bamako,COMMUNE V,A.E.CO.DA [1er C],,-7.980433,12.610233,766677,...,Communautaire,0,1,0,5,1) robinet,494,432,926,9


# Cleaning the data
+ The population data we have access to gives information at the *Cercle* level. 
+ Considering the limited amount of time we have we are going to use only these variables:
    + **cercle** : name of the district)
    + **statut** (Whether the school is private or public)
    + **garcons** (The number of boys attending the school)
    + **filles** (The number of girls attending the school)
    + **total** (Total number of children attending school = garcons + filles)
    + **nbre_enseignants** (the number of teacher in the school)

In [3]:
#Keep only the variable we need
dataset_school=dataset[['Cercle','GARCONS', 'FILLES', 'TOTAL', 'NBRE ENSEIGNANTS']]

#Aggregate at the Cercle level
dataset_school=dataset_school.groupby(['Cercle']).agg([np.sum])
dataset_school.to_csv('dataset_school.csv')
dataset_school.head()

Unnamed: 0_level_0,GARCONS,FILLES,TOTAL,NBRE ENSEIGNANTS
Unnamed: 0_level_1,sum,sum,sum,sum
Cercle,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
abeibara,128,55,183,8
ansongo,8944,7381,16325,483
bafoulabe,21883,14227,36110,903
bamako,236435,232342,468777,13950
banamba,22830,13530,36360,841


## I.2. Population dataset

In [4]:
import pandas as pd
import numpy as np
#Load the data
data_pop="https://raw.githubusercontent.com/ThomasRoca/data/master/PopulationDataMali.csv"
dataset_pop = pd.read_csv(data_pop)
dataset_pop.head()

Unnamed: 0,﻿Location,District,Cercle,Location frenchname,Ménages,Population résidente,Population résidente2 (Hommes),Population résidente (Femelles)
0,ML-1-KC,Kayes,Kayes Cercle,Cercle De Kayes,80763,513172,254777,258395
1,ML-1-BA,Kayes,Bafoulabé,Cercle De Bafoulabe,35266,233647,119040,114607
2,ML-1-KI,Kayes,Kita,Cercle De Kita,62129,432531,220318,212213
3,ML-1-DI,Kayes,Diéma,Cercle De Diema,32950,211772,109282,102490
4,ML-1-KE,Kayes,Keniéba,Cercle De Kenieba,33295,197050,98757,98293


## Clean the data
+  location names: to match the different file we need to have cleen *Cercle* names and all lower case
+ 

In [5]:
#Keep the variables we are interested in
dataset_pop=dataset_pop[['Location frenchname','Ménages', 'Population résidente', 'Population résidente2 (Hommes)', \
                         'Population résidente (Femelles)']]

#Suppress 'cercle de' in district names, and lower case  
dataset_pop['Location frenchname'] = dataset_pop['Location frenchname'].str.lower().str.replace('cercle de ', '') \
                                    .str.replace('district de ', '').str.replace('-', ' ')

#Set index
dataset_pop=dataset_pop.set_index(['Location frenchname']).dropna()
dataset_pop.sort_index(inplace=True)
dataset_pop.head()

Unnamed: 0_level_0,Ménages,Population résidente,Population résidente2 (Hommes),Population résidente (Femelles)
Location frenchname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
abeibara,1796,10296,4808,5488
ansongo,21966,131953,65745,66208
bafoulabe,35266,233647,119040,114607
bamako,286381,1810366,902723,907643
banamba,28278,191005,95901,95104


# Merging dataset
+ Here we need merge the census dataset and the school dataset
+ rename index and column names

In [6]:
#Merge population and school dataset
dataset_final = pd.concat([dataset_pop, dataset_school], axis=1, )

#rename columns
dataset_final.columns = ['menage', 'population', 'population_h', 'population_fem', 'population_g', \
                         'population_fi', 'nb_eleves', 'nb_enseignants']

#rename index
dataset_final.index.names = ['cercle']

# Doing some computation
+ First we need to aggregate at the district level (Cercle) so that we can match our data with the census data (solely available at the Cercle level)
+ Then we will compute averages and ratios.

In [7]:
#ratio pupil teacher
dataset_final['ratio_h_fem']=dataset_final['population_h']/dataset_final['population_fem']
dataset_final['ratio_g_fi']=dataset_final['population_g']/dataset_final['population_fi']

dataset_final['eleves_par_ens']=dataset_final['nb_eleves']/dataset_final['nb_enseignants']
dataset_final['pop_par_ens']=dataset_final['population']/dataset_final['nb_enseignants']

#Compute the spread from average number of student by teacher
dataset_final['sous_effct_ens']=dataset_final['eleves_par_ens'] - dataset_final['eleves_par_ens'].mean()

#Set a binary variable to identify region as 0 if sous_effct_ens <0 otherwise 1
dataset_final['binary_sous_effct']=np.where(dataset_final['sous_effct_ens']<0, 0 , 1)
dataset_final.to_csv('dataset_final.csv')
dataset_final.head()

Unnamed: 0_level_0,menage,population,population_h,population_fem,population_g,population_fi,nb_eleves,nb_enseignants,ratio_h_fem,ratio_g_fi,eleves_par_ens,pop_par_ens,sous_effct_ens,binary_sous_effct
cercle,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
abeibara,1796,10296,4808,5488,128,55,183,8,0.876093,2.327273,22.875,1287.0,-16.698787,0
ansongo,21966,131953,65745,66208,8944,7381,16325,483,0.993007,1.21176,33.799172,273.194617,-5.774615,0
bafoulabe,35266,233647,119040,114607,21883,14227,36110,903,1.03868,1.538132,39.988926,258.745293,0.415139,1
bamako,286381,1810366,902723,907643,236435,232342,468777,13950,0.994579,1.017616,33.604086,129.775341,-5.969701,0
banamba,28278,191005,95901,95104,22830,13530,36360,841,1.00838,1.687361,43.234245,227.116528,3.660458,1


In [25]:
from IPython.display import HTML
from string import Template
import webbrowser
#Get the data series to display
dataset_viz=dataset_final[['eleves_par_ens','pop_par_ens']]

Top5listP_P=[]
Top5listP_P_cercle=[]
Bottom5listP_P=[]
Bottom5listP_P_cercle=[]

dataset_viz=dataset_viz.sort(['pop_par_ens'], ascending=False)
for i in range(0,5):
    Top5listP_P.append(dataset_viz['pop_par_ens'][i])
    Top5listP_P_cercle.append(dataset_viz.index.values[i])
    
dataset_viz=dataset_viz.sort(['pop_par_ens'])
for i in range(0,5):
    Bottom5listP_P.append(dataset_viz['pop_par_ens'][i])
    Bottom5listP_P_cercle.append(dataset_viz.index.values[i])
        
#concatenate Top5 and Bottom 5 NB. as we got the bottom 5 by sort descending, we need to invert the bottom 5 list..
inv_Bottom5listP_P=Bottom5listP_P
inv_Bottom5listP_P.reverse()
#Store the results in P_P_list
P_P_list=Top5listP_P + inv_Bottom5listP_P
#Same with Circle
inv_Bottom5listP_P_cercle=Bottom5listP_P_cercle
inv_Bottom5listP_P_cercle.reverse()
P_P_cercle=Top5listP_P_cercle + inv_Bottom5listP_P_cercle

#Get the value of 'pop_par_ens' for the cercle we display:
E_E_list=[]
for cercle in P_P_cercle:
    E_E_list.append(dataset_viz['eleves_par_ens'][cercle])
    
#print(P_P_list,E_E_list,P_P_cercle)  

Input = {'P_P_list':P_P_list,'E_E_list':E_E_list,'P_P_cercle':P_P_cercle }

html='''
<!DOCTYPE html>
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<script type="text/javascript" src="//code.jquery.com/jquery-1.9.1.js"></script>
<title></title>
<script type='text/javascript'>
$(function () {
    $('#container').highcharts({
        chart: {type: 'column'},
        title: {text: "Enseignants, habitants et eleves dans dix regions du Mali"},
        subtitle: { text: 'Source: opendata.ml, opendataforafrica'},
        xAxis: {
            categories: $P_P_cercle,
            crosshair: true },
        yAxis: { min: 0, itle: {text: 'Effectifs'}},
        tooltip: {
           valueDecimals: 0,
            headerFormat: '<span style="font-size:10px">{point.key}</span><table>',
            pointFormat: '<tr><td style="color:{series.color};padding:0">{series.name}: </td>' +
                '<td style="padding:0"><b>{point.y} </b></td></tr>',
            footerFormat: '</table>', shared: true, useHTML: true },
        plotOptions: { column: {pointPadding: 0.2, borderWidth: 0 }},
       
           series: [{
            name: "Nombre d'habitants pour un enseignant",
            data: $P_P_list,
                color:'rgb(255, 123, 0)'
        }, {
            name: "Nombre d'eleves pour un enseignant",
            data: $E_E_list,
                visible:false,
                color:'rgb(244, 186, 48)'
        }] });});
</script>
</head><body>
<script src="https://code.highcharts.com/highcharts.js"></script>
<script src="https://code.highcharts.com/modules/exporting.js"></script>
<div id="container" style="min-width: 100%; height: 500px; margin: 0 auto"></div>
 </body>
</html>
'''

f = open("Bar_chart.html",'w')
content=Template(html).safe_substitute(Input)
f.write(content)
f.close()
filename="Bar_chart.html"

---
### Slideshow starts here...
---

# 19 fev. 2016 NUMA - Python script | [Hackfrancophonie](https://www.etalab.gouv.fr/hackfrancophonie-un-open-data-camp-autour-des-donnees-ouvertes-par-les-pays-francophones) | Ecole au Mali
---

<span style="color:#dda325; font-size:45px; line-height:1.25; font-weight:bold;" markdown="1">Ecole au Mali </span><span style="color:#ffb600; font-size:45px; line-height:1.5; font-weight:bold;" markdown="1" >où investir ?</span><br>
<br>

<img src="http://www.stats4dev.com/prez/EcoleMali2.jpg">


<span style="font-size:24px; line-height:1.5; color:#ffb600; font-weight:bold" markdown="1" > &#9733; Claire-Lise, Guillaume, Patrick, Romain, Thomas &#9733;
</span><br>
<span style="font-size:16px; line-height:1.5; color:#14b53a; font-weight:bold" markdown="1" >
Version Février 2016 - #HackFrancophonie </span>

<img src="http://www.stats4dev.com/prez/logoX.png">

# Problématique

Comment utiliser les données ouvertes du Mali pour identifier les zones géographiques dans 
lesquelles implanter des écoles pourrait résoudre des inégalités d’accès à l’éducation?

## Contexte:

+ **47,5%** de la population malienne a moins de 14ans 

+ **83,5%** des enfants sont inscrits à l’école primaire (Banque mondiale)

+ Seulement **10%** de la population vit dans les trois régions du Nord, qui représentent les deux tiers de l’ensemble du territoire national. La faible densité de population dans ces régions pose des problèmes spécifiques en matière d’accès aux services, dont l’éducation 



# First representation
+ To have a first taste of the distribution we dicided to represent the top5 and bottom5 Cercle according to the number of inhabitant per teacher and pupils per teacher

*Warning: the data we were given acces may not be exhaustive*



In [36]:
HTML('''<iframe src="Bar_chart.html" scrolling="no"  frameborder="0" width="100%" height="550px"></iframe>''')

## Mali: etat des lieux
+ Répartition géographique des écoles
+ Population par cercles
+ Villages

In [37]:
from IPython.display import HTML
HTML('''<iframe src="https://rdorgueil.cartodb.com/viz/038df9b0-d706-11e5-96ef-0e98b61680bf/embed_map" scrolling="no"  frameborder="0" width="100%" height="575px"></iframe>''')

# Enseignants par habitants

In [38]:
from IPython.display import HTML
HTML('''<iframe src="https://rdorgueil.cartodb.com/viz/5374969a-d720-11e5-8ee0-0ea31932ec1d/embed_map" scrolling="no"  frameborder="0"  width="100%" height="575px"></iframe>''')

## Tentative de recommandation politique: une meilleure allocation des resources ? 

+ Deficit d'enseignant par école

In [39]:
from IPython.display import HTML
HTML('''<iframe src="https://rdorgueil.cartodb.com/viz/975cfd7a-d71b-11e5-be79-0e31c9be1b51/embed_map" scrolling="no" frameborder="0" width="100%" height="575px"></iframe>''')

# Vers plus de granularité ?

In [40]:
from IPython.display import HTML
HTML('''<iframe src="https://rdorgueil.cartodb.com/viz/907f986e-d720-11e5-8a5a-0ea31932ec1d/embed_map" scrolling="no" frameborder="0"  width="100%" height="575px"></iframe>''')