# Public health research with python

**FAO data, files exploration and data cleaning**

In [235]:
import pandas as pd

In [236]:
import numpy as np

*a. [undernutrition.csv](http://localhost:8888/edit/undernutrition.csv) : exploration and data cleaning*

- Loading file:

In [275]:
undernutrition = pd.read_csv('undernutrition.csv', sep=';')

- First 5 rows display :

In [276]:
undernutrition.head()

Unnamed: 0,Zone,Année,Valeur (en million d'hab)
0,Afghanistan,2012-2014,8.6
1,Afghanistan,2013-2015,8.8
2,Afghanistan,2014-2016,8.9
3,Afghanistan,2015-2017,9.7
4,Afghanistan,2016-2018,10.5


- Number of rows and columns :

In [277]:
undernutrition.shape

(1218, 3)

- .dtypes display :

In [278]:
undernutrition.dtypes

Zone                         object
Année                        object
Valeur (en million d'hab)    object
dtype: object

- "Valeur" column .unique values display :

In [280]:
undernutrition["Valeur (en million d'hab)"].unique()

array(['8.6', '8.8', '8.9', '9.7', '10.5', '11.1', '2.2', '2.5', '2.8',
       '3', '3.1', '3.3', '0.1', '1.3', '1.2', nan, '7.6', '6.2', '5.3',
       '5.6', '5.8', '5.7', '1.5', '1.6', '1.1', '1.7', '<0.1', '21.7',
       '22.4', '23.3', '22.3', '21.5', '20.9', '0.8', '2', '1.9', '1.8',
       '0.4', '0.5', '0.3', '0.2', '3.2', '3.4', '3.6', '3.8', '2.1',
       '2.3', '2.4', '0.6', '0.7', '0.9', '3.9', '2.7', '1.4', '4.8',
       '4.6', '4.9', '5', '4.4', '4.3', '4.2', '4.5', '26.2', '24.3',
       '21.3', '21.1', '2.9', '5.1', '5.2', '5.4', '203.8', '198.3',
       '193.1', '190.9', '190.1', '189.2', '23.6', '24', '24.1', '3.7',
       '7.3', '7.8', '8.4', '9', '9.1', '10.1', '10', '10.7', '11.5',
       '11.9', '11.8', '8.7', '10.3', '11', '1', '5.5', '6.8', '7.9',
       '5.9', '7', '9.2', '9.4', '9.6', '6.7', '7.1', '7.2', '14.7',
       '17.4', '20.2', '22.2', '22.8', '24.6', '31.1', '28.5', '25.4',
       '24.8', '26.1', '14.5', '15.4', '16.5', '15.8', '15.7', '10.8',
       '

- Update "Valeur" column to numeric :

In [282]:
undernutrition["Valeur (en million d'hab)"] = pd.to_numeric(undernutrition["Valeur (en million d'hab)"],errors = 'coerce')

- .dtypes display :

In [283]:
undernutrition.dtypes

Zone                          object
Année                         object
Valeur (en million d'hab)    float64
dtype: object

- .unique values display :

In [284]:
undernutrition["Valeur (en million d'hab)"].unique()

array([8.600e+00, 8.800e+00, 8.900e+00, 9.700e+00, 1.050e+01, 1.110e+01,
       2.200e+00, 2.500e+00, 2.800e+00, 3.000e+00, 3.100e+00, 3.300e+00,
       1.000e-01, 1.300e+00, 1.200e+00,       nan, 7.600e+00, 6.200e+00,
       5.300e+00, 5.600e+00, 5.800e+00, 5.700e+00, 1.500e+00, 1.600e+00,
       1.100e+00, 1.700e+00, 2.170e+01, 2.240e+01, 2.330e+01, 2.230e+01,
       2.150e+01, 2.090e+01, 8.000e-01, 2.000e+00, 1.900e+00, 1.800e+00,
       4.000e-01, 5.000e-01, 3.000e-01, 2.000e-01, 3.200e+00, 3.400e+00,
       3.600e+00, 3.800e+00, 2.100e+00, 2.300e+00, 2.400e+00, 6.000e-01,
       7.000e-01, 9.000e-01, 3.900e+00, 2.700e+00, 1.400e+00, 4.800e+00,
       4.600e+00, 4.900e+00, 5.000e+00, 4.400e+00, 4.300e+00, 4.200e+00,
       4.500e+00, 2.620e+01, 2.430e+01, 2.130e+01, 2.110e+01, 2.900e+00,
       5.100e+00, 5.200e+00, 5.400e+00, 2.038e+02, 1.983e+02, 1.931e+02,
       1.909e+02, 1.901e+02, 1.892e+02, 2.360e+01, 2.400e+01, 2.410e+01,
       3.700e+00, 7.300e+00, 7.800e+00, 8.400e+00, 

- Number of unique values :

In [285]:
len(undernutrition["Valeur (en million d'hab)"].unique())

139

- "Valeur" column display :

In [286]:
undernutrition["Valeur (en million d'hab)"]

0        8.6
1        8.8
2        8.9
3        9.7
4       10.5
        ... 
1213     NaN
1214     NaN
1215     NaN
1216     NaN
1217     NaN
Name: Valeur (en million d'hab), Length: 1218, dtype: float64

- Replace NaN values by 0.00 :

In [287]:
undernutrition.fillna(0, inplace=True)

- "Valeur" column display :

In [288]:
undernutrition["Valeur (en million d'hab)"]

0        8.6
1        8.8
2        8.9
3        9.7
4       10.5
        ... 
1213     0.0
1214     0.0
1215     0.0
1216     0.0
1217     0.0
Name: Valeur (en million d'hab), Length: 1218, dtype: float64

- Update "Année" column values :

In [289]:
undernutrition = undernutrition.replace("2012-2014","2013")

In [290]:
undernutrition = undernutrition.replace("2013-2015","2014")

In [291]:
undernutrition = undernutrition.replace("2014-2016","2015")

In [292]:
undernutrition = undernutrition.replace("2015-2017","2016")

In [293]:
undernutrition = undernutrition.replace("2016-2018","2017")

In [294]:
undernutrition = undernutrition.replace("2017-2019","2018")

- undernutrition dataframe display :

In [295]:
print(undernutrition)

             Zone Année  Valeur (en million d'hab)
0     Afghanistan  2013                        8.6
1     Afghanistan  2014                        8.8
2     Afghanistan  2015                        8.9
3     Afghanistan  2016                        9.7
4     Afghanistan  2017                       10.5
...           ...   ...                        ...
1213     Zimbabwe  2014                        0.0
1214     Zimbabwe  2015                        0.0
1215     Zimbabwe  2016                        0.0
1216     Zimbabwe  2017                        0.0
1217     Zimbabwe  2018                        0.0

[1218 rows x 3 columns]


- Convert "Valeur" column values in thousands of inhabitants :

In [296]:
undernutrition["Valeur (en milliers d'hab)"] = undernutrition["Valeur (en million d'hab)"] * 1000

- undernutrition dataframe display :

In [297]:
print(undernutrition)

             Zone Année  Valeur (en million d'hab)  Valeur (en milliers d'hab)
0     Afghanistan  2013                        8.6                      8600.0
1     Afghanistan  2014                        8.8                      8800.0
2     Afghanistan  2015                        8.9                      8900.0
3     Afghanistan  2016                        9.7                      9700.0
4     Afghanistan  2017                       10.5                     10500.0
...           ...   ...                        ...                         ...
1213     Zimbabwe  2014                        0.0                         0.0
1214     Zimbabwe  2015                        0.0                         0.0
1215     Zimbabwe  2016                        0.0                         0.0
1216     Zimbabwe  2017                        0.0                         0.0
1217     Zimbabwe  2018                        0.0                         0.0

[1218 rows x 4 columns]


*b. [population.csv](http://localhost:8888/edit/population.csv) : exploration and data cleaning*

- Loading file:

In [210]:
population = pd.read_csv('population.csv', sep=';')

- First 5 rows display :

In [211]:
population.head()

Unnamed: 0,Zone,Année,Valeur (en milliers d'hab)
0,Afghanistan,2013,32269.589
1,Afghanistan,2014,33370.794
2,Afghanistan,2015,34413.603
3,Afghanistan,2016,35383.032
4,Afghanistan,2017,36296.113


- Number of rows and columns :

In [257]:
population.shape

(1416, 3)

- .dtypes display :

In [258]:
population.dtypes

Zone                           object
Année                           int64
Valeur (en milliers d'hab)    float64
dtype: object

- .unique values display :

In [260]:
population["Valeur (en milliers d'hab)"].unique()

array([32269.589, 33370.794, 34413.603, ..., 14030.331, 14236.595,
       14438.802])

- Number of unique values :

In [261]:
len(population["Valeur (en milliers d'hab)"].unique())

1413

*c. [food_help.csv](http://localhost:8888/edit/food_help.csv) : exploration and data cleaning*


- Loading file:

In [362]:
foodHelp = pd.read_csv('food_help.csv', sep=';')

- First 5 rows display :

In [363]:
foodHelp.head()

Unnamed: 0,Pays bénéficiaire,Année,Produit,Valeur
0,Afghanistan,2013,Autres non-céréales,682
1,Afghanistan,2014,Autres non-céréales,335
2,Afghanistan,2013,Blé et Farin,39224
3,Afghanistan,2014,Blé et Farin,15160
4,Afghanistan,2013,Céréales,40504


- Number of rows and columns :

In [364]:
foodHelp.shape

(1475, 4)

- .dtypes display :

In [365]:
foodHelp.dtypes

Pays bénéficiaire    object
Année                 int64
Produit              object
Valeur                int64
dtype: object

- "Valeur" column .unique values display :

In [366]:
foodHelp["Valeur"].unique()

array([  682,   335, 39224, ...,    96,  5022,  2310], dtype=int64)

- Number of unique values :

In [367]:
len(foodHelp["Valeur"].unique())

1086

- Rename "Valeur" column by "Valeur (en tonnes)" :

In [368]:
foodHelp.rename(columns = {'Valeur': 'Valeur (en tonnes)'})

Unnamed: 0,Pays bénéficiaire,Année,Produit,Valeur (en tonnes)
0,Afghanistan,2013,Autres non-céréales,682
1,Afghanistan,2014,Autres non-céréales,335
2,Afghanistan,2013,Blé et Farin,39224
3,Afghanistan,2014,Blé et Farin,15160
4,Afghanistan,2013,Céréales,40504
...,...,...,...,...
1470,Zimbabwe,2015,Mélanges et préparations,96
1471,Zimbabwe,2013,Non-céréales,5022
1472,Zimbabwe,2014,Non-céréales,2310
1473,Zimbabwe,2015,Non-céréales,306


*d. [food_availability.csv](http://localhost:8888/edit/food_availability.csv) : exploration and data cleaning*

- Loading file:

In [369]:
foodAvailibility = pd.read_csv('food_availability.csv', sep=';')

- First 5 rows display :

In [370]:
foodAvailibility.head()

Unnamed: 0,Zone,Produit,Origine,Aliments pour animaux,Autres Utilisations,Disponibilité alimentaire (Kcal/personne/jour),Disponibilité alimentaire en quantité (kg/personne/an),Disponibilité de matière grasse en quantité (g/personne/jour),Disponibilité de protéines en quantité (g/personne/jour),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Semences,Traitement,Variation de stock
0,Afghanistan,Abats Comestible,animale,,,5.0,1.72,0.2,0.77,53.0,,,53.0,,53.0,,,
1,Afghanistan,"Agrumes, Autres",vegetale,,,1.0,1.29,0.01,0.02,41.0,2.0,40.0,39.0,2.0,3.0,,,
2,Afghanistan,Aliments pour enfants,vegetale,,,1.0,0.06,0.01,0.03,2.0,,2.0,2.0,,,,,
3,Afghanistan,Ananas,vegetale,,,0.0,0.0,,,0.0,,0.0,0.0,,,,,
4,Afghanistan,Bananes,vegetale,,,4.0,2.7,0.02,0.05,82.0,,82.0,82.0,,,,,


- Number of rows and columns :

In [371]:
foodAvailibility.shape

(15605, 18)

- Replace NaN values by 0.00 :

In [372]:
foodAvailibility.fillna(0, inplace=True)

- foodAvailibility dataframe display :

In [373]:
print(foodAvailibility)

               Zone                Produit   Origine  Aliments pour animaux  \
0       Afghanistan       Abats Comestible   animale                    0.0   
1       Afghanistan        Agrumes, Autres  vegetale                    0.0   
2       Afghanistan  Aliments pour enfants  vegetale                    0.0   
3       Afghanistan                 Ananas  vegetale                    0.0   
4       Afghanistan                Bananes  vegetale                    0.0   
...             ...                    ...       ...                    ...   
15600  Îles Salomon       Viande de Suides   animale                    0.0   
15601  Îles Salomon    Viande de Volailles   animale                    0.0   
15602  Îles Salomon          Viande, Autre   animale                    0.0   
15603  Îles Salomon                    Vin  vegetale                    0.0   
15604  Îles Salomon         Épices, Autres  vegetale                    0.0   

       Autres Utilisations  Disponibilité alimentai

- .dtypes display :

In [374]:
foodAvailibility.dtypes

Zone                                                              object
Produit                                                           object
Origine                                                           object
Aliments pour animaux                                            float64
Autres Utilisations                                              float64
Disponibilité alimentaire (Kcal/personne/jour)                   float64
Disponibilité alimentaire en quantité (kg/personne/an)           float64
Disponibilité de matière grasse en quantité (g/personne/jour)    float64
Disponibilité de protéines en quantité (g/personne/jour)         float64
Disponibilité intérieure                                         float64
Exportations - Quantité                                          float64
Importations - Quantité                                          float64
Nourriture                                                       float64
Pertes                                             

- Convert use of food availability columns in tons :

In [375]:
foodAvailibility["Aliments pour animaux"] = foodAvailibility["Aliments pour animaux"] * 1000

In [376]:
foodAvailibility["Autres Utilisations"] = foodAvailibility["Autres Utilisations"] * 1000

In [377]:
foodAvailibility["Disponibilité intérieure"] = foodAvailibility["Disponibilité intérieure"] * 1000

In [378]:
foodAvailibility["Exportations - Quantité"] = foodAvailibility["Exportations - Quantité"] * 1000

In [379]:
foodAvailibility["Importations - Quantité"] = foodAvailibility["Importations - Quantité"] * 1000

In [380]:
foodAvailibility["Nourriture"] = foodAvailibility["Nourriture"] * 1000

In [381]:
foodAvailibility["Pertes"] = foodAvailibility["Pertes"] * 1000

In [382]:
foodAvailibility["Production"] = foodAvailibility["Production"] * 1000

In [383]:
foodAvailibility["Semences"] = foodAvailibility["Semences"] * 1000

In [384]:
foodAvailibility["Traitement"] = foodAvailibility["Traitement"] * 1000

In [385]:
foodAvailibility["Variation de stock"] = foodAvailibility["Variation de stock"] * 1000

- foodAvailibility dataframe display :

In [386]:
print(foodAvailibility)

               Zone                Produit   Origine  Aliments pour animaux  \
0       Afghanistan       Abats Comestible   animale                    0.0   
1       Afghanistan        Agrumes, Autres  vegetale                    0.0   
2       Afghanistan  Aliments pour enfants  vegetale                    0.0   
3       Afghanistan                 Ananas  vegetale                    0.0   
4       Afghanistan                Bananes  vegetale                    0.0   
...             ...                    ...       ...                    ...   
15600  Îles Salomon       Viande de Suides   animale                    0.0   
15601  Îles Salomon    Viande de Volailles   animale                    0.0   
15602  Îles Salomon          Viande, Autre   animale                    0.0   
15603  Îles Salomon                    Vin  vegetale                    0.0   
15604  Îles Salomon         Épices, Autres  vegetale                    0.0   

       Autres Utilisations  Disponibilité alimentai

- Rename use of food availability columns by adding "(en tonnes)" :

In [390]:
foodAvailibility.rename(columns = {'Nourriture': 'Nourriture (en tonnes)','Pertes': 'Pertes (en tonnes)','Aliments pour animaux': 'Aliments pour animaux (en tonnes)','Autres Utilisations': 'Autres Utilisations (en tonnes)','Disponibilité intérieure': 'Disponibilité intérieure (en tonnes)','Exportations - Quantité': 'Exportations - Quantité (en tonnes)','Importations - Quantité': 'Importations - Quantité (en tonnes)','Nourriture (en tonnes)': 'Nourriture (en tonnes)','Pertes (en tonnes)': 'Pertes (en tonnes)','Production': 'Production (en tonnes)','Semences': 'Semences (en tonnes)','Traitement': 'Traitement (en tonnes)','Variation de stock': 'Variation de stock (en tonnes)'}) 

Unnamed: 0,Zone,Produit,Origine,Aliments pour animaux (en tonnes),Autres Utilisations (en tonnes),Disponibilité alimentaire (Kcal/personne/jour),Disponibilité alimentaire en quantité (kg/personne/an),Disponibilité de matière grasse en quantité (g/personne/jour),Disponibilité de protéines en quantité (g/personne/jour),Disponibilité intérieure (en tonnes),Exportations - Quantité (en tonnes),Importations - Quantité (en tonnes),Nourriture (en tonnes),Pertes (en tonnes),Production (en tonnes),Semences (en tonnes),Traitement (en tonnes),Variation de stock (en tonnes)
0,Afghanistan,Abats Comestible,animale,0.0,0.0,5.0,1.72,0.20,0.77,53000.0,0.0,0.0,53000.0,0.0,53000.0,0.0,0.0,0.0
1,Afghanistan,"Agrumes, Autres",vegetale,0.0,0.0,1.0,1.29,0.01,0.02,41000.0,2000.0,40000.0,39000.0,2000.0,3000.0,0.0,0.0,0.0
2,Afghanistan,Aliments pour enfants,vegetale,0.0,0.0,1.0,0.06,0.01,0.03,2000.0,0.0,2000.0,2000.0,0.0,0.0,0.0,0.0,0.0
3,Afghanistan,Ananas,vegetale,0.0,0.0,0.0,0.00,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Afghanistan,Bananes,vegetale,0.0,0.0,4.0,2.70,0.02,0.05,82000.0,0.0,82000.0,82000.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15600,Îles Salomon,Viande de Suides,animale,0.0,0.0,45.0,4.70,4.28,1.41,3000.0,0.0,0.0,3000.0,0.0,2000.0,0.0,0.0,0.0
15601,Îles Salomon,Viande de Volailles,animale,0.0,0.0,11.0,3.34,0.69,1.14,2000.0,0.0,2000.0,2000.0,0.0,0.0,0.0,0.0,0.0
15602,Îles Salomon,"Viande, Autre",animale,0.0,0.0,0.0,0.06,0.00,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15603,Îles Salomon,Vin,vegetale,0.0,0.0,0.0,0.07,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


1. *Proportion de personnes en état de sous-nutrition en 2017 :*