# Sommaire

* [Introduction](#introduction)
* [I) Importation des données](#i)
* [II) Nettoyage du jeu de données](#ii)

# Introduction <a class="anchor" id="introduction"></a>
La Poule qui chante souhaite se développer à l'international. J'ai été chargé de mener une enquête à partir des données de la FAO pour regrouper les pays à cibler pour exporter notre volaille. L'action se déroule en 2 notebooks.

Celui-ci ne contient que les préliminaires, à savoir l'importation des données, ainsi que le nettoyage du jeu de données.

Les choses sérieuses sont dans le second : ACP, 2 méthodes différentes de clustering, ainsi que l'analyse.

Bonne lecture !

# I) Importation des données <a class="anchor" id="i"></a>

In [1]:
#J'ai pas eu besoin de tout ça, mais bon j'importe le pavé à chaque notebook au cas où
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
import seaborn as sns
import plotly.express as px
import datetime as dt
import scipy.stats as st
import math
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

In [2]:
dispo_alim = pd.read_csv('DisponibiliteAlimentaire_2017.csv')
dispo_alim

Unnamed: 0,Code Domaine,Domaine,Code zone,Zone,Code Élément,Élément,Code Produit,Produit,Code année,Année,Unité,Valeur,Symbole,Description du Symbole
0,FBS,Nouveaux Bilans Alimentaire,2,Afghanistan,5511,Production,2511,Blé et produits,2017,2017,Milliers de tonnes,4281.00,S,Données standardisées
1,FBS,Nouveaux Bilans Alimentaire,2,Afghanistan,5611,Importations - Quantité,2511,Blé et produits,2017,2017,Milliers de tonnes,2302.00,S,Données standardisées
2,FBS,Nouveaux Bilans Alimentaire,2,Afghanistan,5072,Variation de stock,2511,Blé et produits,2017,2017,Milliers de tonnes,-119.00,S,Données standardisées
3,FBS,Nouveaux Bilans Alimentaire,2,Afghanistan,5911,Exportations - Quantité,2511,Blé et produits,2017,2017,Milliers de tonnes,0.00,S,Données standardisées
4,FBS,Nouveaux Bilans Alimentaire,2,Afghanistan,5301,Disponibilité intérieure,2511,Blé et produits,2017,2017,Milliers de tonnes,6701.00,S,Données standardisées
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
176595,FBS,Nouveaux Bilans Alimentaire,181,Zimbabwe,5142,Nourriture,2899,Miscellanees,2017,2017,Milliers de tonnes,19.00,S,Données standardisées
176596,FBS,Nouveaux Bilans Alimentaire,181,Zimbabwe,645,Disponibilité alimentaire en quantité (kg/pers...,2899,Miscellanees,2017,2017,kg,1.33,Fc,Donnée calculée
176597,FBS,Nouveaux Bilans Alimentaire,181,Zimbabwe,664,Disponibilité alimentaire (Kcal/personne/jour),2899,Miscellanees,2017,2017,Kcal/personne/jour,1.00,Fc,Donnée calculée
176598,FBS,Nouveaux Bilans Alimentaire,181,Zimbabwe,674,Disponibilité de protéines en quantité (g/pers...,2899,Miscellanees,2017,2017,g/personne/jour,0.04,Fc,Donnée calculée


In [3]:
population = pd.read_csv('Population_2000_2018.csv')
population

Unnamed: 0,Code Domaine,Domaine,Code zone,Zone,Code Élément,Élément,Code Produit,Produit,Code année,Année,Unité,Valeur,Symbole,Description du Symbole,Note
0,OA,Séries temporelles annuelles,2,Afghanistan,511,Population totale,3010,Population-Estimations,2000,2000,1000 personnes,20779.953,X,Sources internationales sûres,
1,OA,Séries temporelles annuelles,2,Afghanistan,511,Population totale,3010,Population-Estimations,2001,2001,1000 personnes,21606.988,X,Sources internationales sûres,
2,OA,Séries temporelles annuelles,2,Afghanistan,511,Population totale,3010,Population-Estimations,2002,2002,1000 personnes,22600.770,X,Sources internationales sûres,
3,OA,Séries temporelles annuelles,2,Afghanistan,511,Population totale,3010,Population-Estimations,2003,2003,1000 personnes,23680.871,X,Sources internationales sûres,
4,OA,Séries temporelles annuelles,2,Afghanistan,511,Population totale,3010,Population-Estimations,2004,2004,1000 personnes,24726.684,X,Sources internationales sûres,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4406,OA,Séries temporelles annuelles,181,Zimbabwe,511,Population totale,3010,Population-Estimations,2014,2014,1000 personnes,13586.707,X,Sources internationales sûres,
4407,OA,Séries temporelles annuelles,181,Zimbabwe,511,Population totale,3010,Population-Estimations,2015,2015,1000 personnes,13814.629,X,Sources internationales sûres,
4408,OA,Séries temporelles annuelles,181,Zimbabwe,511,Population totale,3010,Population-Estimations,2016,2016,1000 personnes,14030.331,X,Sources internationales sûres,
4409,OA,Séries temporelles annuelles,181,Zimbabwe,511,Population totale,3010,Population-Estimations,2017,2017,1000 personnes,14236.595,X,Sources internationales sûres,


# II) Nettoyage du jeu de données <a class="anchor" id="ii"></a>

In [4]:
#Beaucoup de colonnes inutiles
dispo_drop = dispo_alim.drop(columns=['Code Domaine','Domaine','Code zone','Code Élément','Code Produit','Code année','Symbole','Description du Symbole'])
dispo_drop

Unnamed: 0,Zone,Élément,Produit,Année,Unité,Valeur
0,Afghanistan,Production,Blé et produits,2017,Milliers de tonnes,4281.00
1,Afghanistan,Importations - Quantité,Blé et produits,2017,Milliers de tonnes,2302.00
2,Afghanistan,Variation de stock,Blé et produits,2017,Milliers de tonnes,-119.00
3,Afghanistan,Exportations - Quantité,Blé et produits,2017,Milliers de tonnes,0.00
4,Afghanistan,Disponibilité intérieure,Blé et produits,2017,Milliers de tonnes,6701.00
...,...,...,...,...,...,...
176595,Zimbabwe,Nourriture,Miscellanees,2017,Milliers de tonnes,19.00
176596,Zimbabwe,Disponibilité alimentaire en quantité (kg/pers...,Miscellanees,2017,kg,1.33
176597,Zimbabwe,Disponibilité alimentaire (Kcal/personne/jour),Miscellanees,2017,Kcal/personne/jour,1.00
176598,Zimbabwe,Disponibilité de protéines en quantité (g/pers...,Miscellanees,2017,g/personne/jour,0.04


In [5]:
#Tous les produits ne sont pas utiles non plus, c'est la viande de volailles qui nous intéresse
dispo_volaille = dispo_drop.loc[dispo_drop['Produit'] == "Viande de Volailles"].reset_index(drop=True)
dispo_volaille

Unnamed: 0,Zone,Élément,Produit,Année,Unité,Valeur
0,Afghanistan,Production,Viande de Volailles,2017,Milliers de tonnes,28.00
1,Afghanistan,Importations - Quantité,Viande de Volailles,2017,Milliers de tonnes,29.00
2,Afghanistan,Variation de stock,Viande de Volailles,2017,Milliers de tonnes,0.00
3,Afghanistan,Disponibilité intérieure,Viande de Volailles,2017,Milliers de tonnes,57.00
4,Afghanistan,Pertes,Viande de Volailles,2017,Milliers de tonnes,2.00
...,...,...,...,...,...,...
2056,Zimbabwe,Nourriture,Viande de Volailles,2017,Milliers de tonnes,67.00
2057,Zimbabwe,Disponibilité alimentaire en quantité (kg/pers...,Viande de Volailles,2017,kg,4.68
2058,Zimbabwe,Disponibilité alimentaire (Kcal/personne/jour),Viande de Volailles,2017,Kcal/personne/jour,16.00
2059,Zimbabwe,Disponibilité de protéines en quantité (g/pers...,Viande de Volailles,2017,g/personne/jour,1.59


In [6]:
#Travailler avec des Kcal ou des grammes de protéines ou de matières grasses ne semble pas pertinent
dispo_volaille_drop = dispo_volaille.loc[dispo_volaille['Unité'].isin(["Milliers de tonnes", "kg"])].reset_index(drop=True)
dispo_volaille_drop

Unnamed: 0,Zone,Élément,Produit,Année,Unité,Valeur
0,Afghanistan,Production,Viande de Volailles,2017,Milliers de tonnes,28.00
1,Afghanistan,Importations - Quantité,Viande de Volailles,2017,Milliers de tonnes,29.00
2,Afghanistan,Variation de stock,Viande de Volailles,2017,Milliers de tonnes,0.00
3,Afghanistan,Disponibilité intérieure,Viande de Volailles,2017,Milliers de tonnes,57.00
4,Afghanistan,Pertes,Viande de Volailles,2017,Milliers de tonnes,2.00
...,...,...,...,...,...,...
1540,Zimbabwe,Traitement,Viande de Volailles,2017,Milliers de tonnes,6.00
1541,Zimbabwe,Alimentation pour touristes,Viande de Volailles,2017,Milliers de tonnes,0.00
1542,Zimbabwe,Résidus,Viande de Volailles,2017,Milliers de tonnes,0.00
1543,Zimbabwe,Nourriture,Viande de Volailles,2017,Milliers de tonnes,67.00


In [7]:
#On peut drop les colonnes Produits et Année car on sait qu'il s'agit de Viande de volailles en 2017
#On sait que la seule colonne non exprimée en milliers de tonnes est disponibilité alimentaire
#On va donc drop la colonne Unité et déplacer la colonne concernée à la fin du dataframe.
dispo_pivot = dispo_volaille_drop.pivot_table(index='Zone', columns='Élément', values='Valeur')
dispo_pivot

Élément,Alimentation pour touristes,Aliments pour animaux,Autres utilisations (non alimentaire),Disponibilité alimentaire en quantité (kg/personne/an),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Résidus,Semences,Traitement,Variation de stock
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Afghanistan,,,,1.53,57.0,,29.0,55.0,2.0,28.0,0.0,,,0.0
Afrique du Sud,0.0,,,35.69,2118.0,63.0,514.0,2035.0,83.0,1667.0,0.0,,,0.0
Albanie,,,,16.36,47.0,0.0,38.0,47.0,,13.0,0.0,,,4.0
Algérie,0.0,,,6.38,277.0,0.0,2.0,264.0,13.0,275.0,0.0,,,0.0
Allemagne,,,,19.47,1739.0,646.0,842.0,1609.0,,1514.0,-38.0,,167.0,-29.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Émirats arabes unis,,,,43.47,412.0,94.0,433.0,412.0,,48.0,0.0,,,-26.0
Équateur,0.0,,,19.31,341.0,0.0,0.0,324.0,17.0,340.0,0.0,,,-1.0
États-Unis d'Amérique,,,89.0,55.68,18266.0,3692.0,123.0,18100.0,,21914.0,0.0,,77.0,80.0
Éthiopie,0.0,,,0.13,14.0,,1.0,14.0,1.0,14.0,0.0,,,0.0


In [8]:
dispo_pivot.reset_index(drop=False, inplace=True)
dispo_pivot

Élément,Zone,Alimentation pour touristes,Aliments pour animaux,Autres utilisations (non alimentaire),Disponibilité alimentaire en quantité (kg/personne/an),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Résidus,Semences,Traitement,Variation de stock
0,Afghanistan,,,,1.53,57.0,,29.0,55.0,2.0,28.0,0.0,,,0.0
1,Afrique du Sud,0.0,,,35.69,2118.0,63.0,514.0,2035.0,83.0,1667.0,0.0,,,0.0
2,Albanie,,,,16.36,47.0,0.0,38.0,47.0,,13.0,0.0,,,4.0
3,Algérie,0.0,,,6.38,277.0,0.0,2.0,264.0,13.0,275.0,0.0,,,0.0
4,Allemagne,,,,19.47,1739.0,646.0,842.0,1609.0,,1514.0,-38.0,,167.0,-29.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,Émirats arabes unis,,,,43.47,412.0,94.0,433.0,412.0,,48.0,0.0,,,-26.0
168,Équateur,0.0,,,19.31,341.0,0.0,0.0,324.0,17.0,340.0,0.0,,,-1.0
169,États-Unis d'Amérique,,,89.0,55.68,18266.0,3692.0,123.0,18100.0,,21914.0,0.0,,77.0,80.0
170,Éthiopie,0.0,,,0.13,14.0,,1.0,14.0,1.0,14.0,0.0,,,0.0


In [9]:
dispoalim = dispo_pivot.pop('Disponibilité alimentaire en quantité (kg/personne/an)')
dispo_pivot.insert(14, 'Disponibilité alimentaire en quantité (kg/personne/an)', dispoalim)
dispo_pivot

Élément,Zone,Alimentation pour touristes,Aliments pour animaux,Autres utilisations (non alimentaire),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Résidus,Semences,Traitement,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an)
0,Afghanistan,,,,57.0,,29.0,55.0,2.0,28.0,0.0,,,0.0,1.53
1,Afrique du Sud,0.0,,,2118.0,63.0,514.0,2035.0,83.0,1667.0,0.0,,,0.0,35.69
2,Albanie,,,,47.0,0.0,38.0,47.0,,13.0,0.0,,,4.0,16.36
3,Algérie,0.0,,,277.0,0.0,2.0,264.0,13.0,275.0,0.0,,,0.0,6.38
4,Allemagne,,,,1739.0,646.0,842.0,1609.0,,1514.0,-38.0,,167.0,-29.0,19.47
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,Émirats arabes unis,,,,412.0,94.0,433.0,412.0,,48.0,0.0,,,-26.0,43.47
168,Équateur,0.0,,,341.0,0.0,0.0,324.0,17.0,340.0,0.0,,,-1.0,19.31
169,États-Unis d'Amérique,,,89.0,18266.0,3692.0,123.0,18100.0,,21914.0,0.0,,77.0,80.0,55.68
170,Éthiopie,0.0,,,14.0,,1.0,14.0,1.0,14.0,0.0,,,0.0,0.13


In [10]:
#On passe à la table population et encore une fois, on se débarasse des colonnes inutiles
population_drop = population[['Zone', 'Année', 'Valeur']]
population_drop

Unnamed: 0,Zone,Année,Valeur
0,Afghanistan,2000,20779.953
1,Afghanistan,2001,21606.988
2,Afghanistan,2002,22600.770
3,Afghanistan,2003,23680.871
4,Afghanistan,2004,24726.684
...,...,...,...
4406,Zimbabwe,2014,13586.707
4407,Zimbabwe,2015,13814.629
4408,Zimbabwe,2016,14030.331
4409,Zimbabwe,2017,14236.595


In [11]:
#On va joindre nos 2 tables, mais on ne s'intéressera qu'à 2017
population_2017 = population_drop.loc[population_drop['Année'] == 2017].reset_index(drop=True)
population_2017

Unnamed: 0,Zone,Année,Valeur
0,Afghanistan,2017,36296.113
1,Afrique du Sud,2017,57009.756
2,Albanie,2017,2884.169
3,Algérie,2017,41389.189
4,Allemagne,2017,82658.409
...,...,...,...
231,Venezuela (République bolivarienne du),2017,29402.484
232,Viet Nam,2017,94600.648
233,Yémen,2017,27834.819
234,Zambie,2017,16853.599


In [12]:
#On multiplie par 1000 la population qui était en millier de personnes
population_2017 = population_2017.copy()
population_2017['Valeur'] *= 1000
population_2017['Valeur'] = population_2017['Valeur'].astype(int)
population_2017

Unnamed: 0,Zone,Année,Valeur
0,Afghanistan,2017,36296113
1,Afrique du Sud,2017,57009756
2,Albanie,2017,2884169
3,Algérie,2017,41389189
4,Allemagne,2017,82658409
...,...,...,...
231,Venezuela (République bolivarienne du),2017,29402484
232,Viet Nam,2017,94600648
233,Yémen,2017,27834819
234,Zambie,2017,16853599


In [13]:
#Jointure à gauche, car pas besoin de voir les pays pour lesquels on a pas de données
data = dispo_pivot.merge(population_2017[['Zone', 'Valeur']], how="left")
data

Unnamed: 0,Zone,Alimentation pour touristes,Aliments pour animaux,Autres utilisations (non alimentaire),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Résidus,Semences,Traitement,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Valeur
0,Afghanistan,,,,57.0,,29.0,55.0,2.0,28.0,0.0,,,0.0,1.53,36296113
1,Afrique du Sud,0.0,,,2118.0,63.0,514.0,2035.0,83.0,1667.0,0.0,,,0.0,35.69,57009756
2,Albanie,,,,47.0,0.0,38.0,47.0,,13.0,0.0,,,4.0,16.36,2884169
3,Algérie,0.0,,,277.0,0.0,2.0,264.0,13.0,275.0,0.0,,,0.0,6.38,41389189
4,Allemagne,,,,1739.0,646.0,842.0,1609.0,,1514.0,-38.0,,167.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,Émirats arabes unis,,,,412.0,94.0,433.0,412.0,,48.0,0.0,,,-26.0,43.47,9487203
168,Équateur,0.0,,,341.0,0.0,0.0,324.0,17.0,340.0,0.0,,,-1.0,19.31,16785361
169,États-Unis d'Amérique,,,89.0,18266.0,3692.0,123.0,18100.0,,21914.0,0.0,,77.0,80.0,55.68,325084756
170,Éthiopie,0.0,,,14.0,,1.0,14.0,1.0,14.0,0.0,,,0.0,0.13,106399924


In [14]:
data.rename(columns={'Valeur':'Population'}, inplace=True)
data

Unnamed: 0,Zone,Alimentation pour touristes,Aliments pour animaux,Autres utilisations (non alimentaire),Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Pertes,Production,Résidus,Semences,Traitement,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
0,Afghanistan,,,,57.0,,29.0,55.0,2.0,28.0,0.0,,,0.0,1.53,36296113
1,Afrique du Sud,0.0,,,2118.0,63.0,514.0,2035.0,83.0,1667.0,0.0,,,0.0,35.69,57009756
2,Albanie,,,,47.0,0.0,38.0,47.0,,13.0,0.0,,,4.0,16.36,2884169
3,Algérie,0.0,,,277.0,0.0,2.0,264.0,13.0,275.0,0.0,,,0.0,6.38,41389189
4,Allemagne,,,,1739.0,646.0,842.0,1609.0,,1514.0,-38.0,,167.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
167,Émirats arabes unis,,,,412.0,94.0,433.0,412.0,,48.0,0.0,,,-26.0,43.47,9487203
168,Équateur,0.0,,,341.0,0.0,0.0,324.0,17.0,340.0,0.0,,,-1.0,19.31,16785361
169,États-Unis d'Amérique,,,89.0,18266.0,3692.0,123.0,18100.0,,21914.0,0.0,,77.0,80.0,55.68,325084756
170,Éthiopie,0.0,,,14.0,,1.0,14.0,1.0,14.0,0.0,,,0.0,0.13,106399924


In [15]:
#Recherhce erreurs de type
data.dtypes

Zone                                                       object
Alimentation pour touristes                               float64
Aliments pour animaux                                     float64
Autres utilisations (non alimentaire)                     float64
Disponibilité intérieure                                  float64
Exportations - Quantité                                   float64
Importations - Quantité                                   float64
Nourriture                                                float64
Pertes                                                    float64
Production                                                float64
Résidus                                                   float64
Semences                                                  float64
Traitement                                                float64
Variation de stock                                        float64
Disponibilité alimentaire en quantité (kg/personne/an)    float64
Population

In [16]:
#Recherche de valeurs manquantes
data.isnull().sum()

Zone                                                        0
Alimentation pour touristes                                94
Aliments pour animaux                                     171
Autres utilisations (non alimentaire)                     138
Disponibilité intérieure                                    2
Exportations - Quantité                                    37
Importations - Quantité                                     2
Nourriture                                                  2
Pertes                                                    105
Production                                                  4
Résidus                                                     8
Semences                                                  171
Traitement                                                126
Variation de stock                                          3
Disponibilité alimentaire en quantité (kg/personne/an)      0
Population                                                  0
dtype: i

In [17]:
#Trop de valeurs manquantes = on drop la colonne
data_drop = data.drop(columns=['Alimentation pour touristes', 'Aliments pour animaux', 'Autres utilisations (non alimentaire)', 'Pertes', 'Semences', 'Traitement'])
data_drop

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
0,Afghanistan,57.0,,29.0,55.0,28.0,0.0,0.0,1.53,36296113
1,Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756
2,Albanie,47.0,0.0,38.0,47.0,13.0,0.0,4.0,16.36,2884169
3,Algérie,277.0,0.0,2.0,264.0,275.0,0.0,0.0,6.38,41389189
4,Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...,...
167,Émirats arabes unis,412.0,94.0,433.0,412.0,48.0,0.0,-26.0,43.47,9487203
168,Équateur,341.0,0.0,0.0,324.0,340.0,0.0,-1.0,19.31,16785361
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
170,Éthiopie,14.0,,1.0,14.0,14.0,0.0,0.0,0.13,106399924


In [18]:
#Recherche de doublons
data_drop.loc[data.duplicated(subset=['Zone'], keep=False)]

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population


In [19]:
#On cherche des outliers/valeurs aberrantes
data_drop.describe().round(2)

Unnamed: 0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
count,170.0,135.0,170.0,170.0,168.0,164.0,169.0,172.0,172.0
mean,687.59,132.19,89.53,657.05,725.19,-2.84,13.67,20.21,42841750.0
std,2187.18,513.78,186.67,2136.55,2501.46,13.58,75.36,15.86,153063700.0
min,2.0,0.0,0.0,2.0,0.0,-125.0,-119.0,0.13,52045.0
25%,30.5,0.0,3.0,28.5,13.75,0.0,0.0,6.44,2874480.0
50%,100.0,3.0,16.0,99.5,70.0,0.0,0.0,18.09,9757833.0
75%,368.25,32.0,81.25,365.25,409.75,0.0,7.0,30.04,30138740.0
max,18266.0,4223.0,1069.0,18100.0,21914.0,0.0,859.0,72.31,1421022000.0


In [20]:
data_drop.sort_values('Disponibilité intérieure').head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
80,Kiribati,2.0,,1.0,2.0,1.0,0.0,0.0,17.98,114158
137,Sao Tomé-et-Principe,2.0,,2.0,2.0,1.0,0.0,1.0,9.47,207089
171,Îles Salomon,3.0,0.0,6.0,3.0,0.0,0.0,3.0,4.45,636039
160,Vanuatu,3.0,,4.0,3.0,1.0,0.0,1.0,11.66,285510
43,Djibouti,3.0,,3.0,3.0,,0.0,0.0,2.68,944099
60,Guinée-Bissau,4.0,,4.0,4.0,3.0,0.0,3.0,2.16,1828145
133,Saint-Kitts-et-Nevis,4.0,0.0,4.0,3.0,0.0,0.0,0.0,55.77,52045
44,Dominique,4.0,0.0,4.0,3.0,0.0,0.0,0.0,35.19,71458
152,Timor-Leste,5.0,,11.0,5.0,1.0,0.0,7.0,4.24,1243258
48,Eswatini,7.0,0.0,2.0,7.0,6.0,0.0,0.0,6.46,1124805


In [21]:
data_drop.sort_values('Disponibilité intérieure', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
34,"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
21,Brésil,9982.0,4223.0,3.0,9982.0,14201.0,0.0,0.0,48.03,207833823
52,Fédération de Russie,4556.0,115.0,226.0,4509.0,4444.0,-7.0,-1.0,30.98,145530082
98,Mexique,4219.0,9.0,972.0,4058.0,3249.0,,-6.0,32.52,124777324
66,Inde,3661.0,4.0,0.0,2965.0,3545.0,0.0,-119.0,2.22,1338676785
75,Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
67,Indonésie,2323.0,0.0,1.0,1904.0,2301.0,,-21.0,7.19,264650963
124,Royaume-Uni de Grande-Bretagne et d'Irlande du...,2234.0,359.0,779.0,2131.0,1814.0,0.0,0.0,31.94,66727460
68,Iran (République islamique d'),2220.0,45.0,6.0,2220.0,2174.0,0.0,-86.0,27.52,80673883


In [22]:
data_drop.sort_values('Exportations - Quantité', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
21,Brésil,9982.0,4223.0,3.0,9982.0,14201.0,0.0,0.0,48.03,207833823
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
117,Pays-Bas,372.0,1418.0,608.0,346.0,1100.0,-78.0,-82.0,20.33,17021347
119,Pologne,1156.0,1025.0,55.0,1150.0,2351.0,-59.0,225.0,30.3,37953180
151,Thaïlande,881.0,796.0,2.0,896.0,1676.0,-48.0,1.0,12.95,69209810
31,Chine - RAS de Hong-Kong,280.0,663.0,907.0,391.0,24.0,-125.0,-12.0,53.51,7306322
16,Belgique,152.0,656.0,338.0,144.0,463.0,-25.0,-6.0,12.65,11419748
4,Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
34,"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
51,France,1573.0,501.0,506.0,1485.0,1750.0,-2.0,183.0,22.9,64842509


In [23]:
data_drop.sort_values('Importations - Quantité', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
75,Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
98,Mexique,4219.0,9.0,972.0,4058.0,3249.0,,-6.0,32.52,124777324
31,Chine - RAS de Hong-Kong,280.0,663.0,907.0,391.0,24.0,-125.0,-12.0,53.51,7306322
4,Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
124,Royaume-Uni de Grande-Bretagne et d'Irlande du...,2234.0,359.0,779.0,2131.0,1814.0,0.0,0.0,31.94,66727460
7,Arabie saoudite,1435.0,10.0,722.0,1435.0,616.0,0.0,-108.0,43.36,33101178
117,Pays-Bas,372.0,1418.0,608.0,346.0,1100.0,-78.0,-82.0,20.33,17021347
1,Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756
51,France,1573.0,501.0,506.0,1485.0,1750.0,-2.0,183.0,22.9,64842509
69,Iraq,566.0,0.0,470.0,561.0,96.0,0.0,0.0,14.95,37552781


In [24]:
data_drop.sort_values('Production', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
34,"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
21,Brésil,9982.0,4223.0,3.0,9982.0,14201.0,0.0,0.0,48.03,207833823
52,Fédération de Russie,4556.0,115.0,226.0,4509.0,4444.0,-7.0,-1.0,30.98,145530082
66,Inde,3661.0,4.0,0.0,2965.0,3545.0,0.0,-119.0,2.22,1338676785
98,Mexique,4219.0,9.0,972.0,4058.0,3249.0,,-6.0,32.52,124777324
119,Pologne,1156.0,1025.0,55.0,1150.0,2351.0,-59.0,225.0,30.3,37953180
67,Indonésie,2323.0,0.0,1.0,1904.0,2301.0,,-21.0,7.19,264650963
75,Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
157,Turquie,1674.0,429.0,3.0,1674.0,2192.0,0.0,92.0,20.64,81116450


In [25]:
data_drop.sort_values('Nourriture', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
34,"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
21,Brésil,9982.0,4223.0,3.0,9982.0,14201.0,0.0,0.0,48.03,207833823
52,Fédération de Russie,4556.0,115.0,226.0,4509.0,4444.0,-7.0,-1.0,30.98,145530082
98,Mexique,4219.0,9.0,972.0,4058.0,3249.0,,-6.0,32.52,124777324
66,Inde,3661.0,4.0,0.0,2965.0,3545.0,0.0,-119.0,2.22,1338676785
75,Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
68,Iran (République islamique d'),2220.0,45.0,6.0,2220.0,2174.0,0.0,-86.0,27.52,80673883
124,Royaume-Uni de Grande-Bretagne et d'Irlande du...,2234.0,359.0,779.0,2131.0,1814.0,0.0,0.0,31.94,66727460
1,Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756


In [26]:
#Seulement 19 des 172 pays ont des données différentes de 0 (ou NaN) dans la colonne résidus
data_drop.sort_values('Résidus').head(20)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
31,Chine - RAS de Hong-Kong,280.0,663.0,907.0,391.0,24.0,-125.0,-12.0,53.51,7306322
117,Pays-Bas,372.0,1418.0,608.0,346.0,1100.0,-78.0,-82.0,20.33,17021347
119,Pologne,1156.0,1025.0,55.0,1150.0,2351.0,-59.0,225.0,30.3,37953180
151,Thaïlande,881.0,796.0,2.0,896.0,1676.0,-48.0,1.0,12.95,69209810
4,Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
146,Suède,187.0,23.0,84.0,164.0,157.0,-37.0,31.0,16.6,9904896
16,Belgique,152.0,656.0,338.0,144.0,463.0,-25.0,-6.0,12.65,11419748
65,Hongrie,266.0,210.0,58.0,246.0,493.0,-13.0,74.0,25.27,9729823
70,Irlande,128.0,93.0,99.0,123.0,110.0,-11.0,-12.0,25.82,4753279
52,Fédération de Russie,4556.0,115.0,226.0,4509.0,4444.0,-7.0,-1.0,30.98,145530082


In [27]:
data_drop.sort_values('Variation de stock').head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
66,Inde,3661.0,4.0,0.0,2965.0,3545.0,0.0,-119.0,2.22,1338676785
7,Arabie saoudite,1435.0,10.0,722.0,1435.0,616.0,0.0,-108.0,43.36,33101178
68,Iran (République islamique d'),2220.0,45.0,6.0,2220.0,2174.0,0.0,-86.0,27.52,80673883
117,Pays-Bas,372.0,1418.0,608.0,346.0,1100.0,-78.0,-82.0,20.33,17021347
34,"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
4,Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
167,Émirats arabes unis,412.0,94.0,433.0,412.0,48.0,0.0,-26.0,43.47,9487203
166,Égypte,1250.0,1.0,110.0,1250.0,1118.0,-1.0,-23.0,12.96,96442591
67,Indonésie,2323.0,0.0,1.0,1904.0,2301.0,,-21.0,7.19,264650963
25,Bénin,161.0,0.0,123.0,161.0,18.0,0.0,-20.0,14.4,11175198


In [28]:
data_drop.sort_values('Variation de stock', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
75,Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
119,Pologne,1156.0,1025.0,55.0,1150.0,2351.0,-59.0,225.0,30.3,37953180
51,France,1573.0,501.0,506.0,1485.0,1750.0,-2.0,183.0,22.9,64842509
90,Malaisie,1621.0,44.0,68.0,1220.0,1724.0,-1.0,128.0,39.21,31104646
57,Grèce,178.0,29.0,79.0,162.0,246.0,0.0,118.0,15.32,10569450
127,République de Corée,854.0,6.0,137.0,854.0,838.0,0.0,115.0,16.7,51096415
18,Bolivie (État plurinational de),429.0,1.0,1.0,403.0,533.0,0.0,103.0,36.0,11192855
157,Turquie,1674.0,429.0,3.0,1674.0,2192.0,0.0,92.0,20.64,81116450
123,Roumanie,381.0,69.0,146.0,381.0,392.0,0.0,88.0,19.37,19653969
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756


In [29]:
data_drop.sort_values('Disponibilité alimentaire en quantité (kg/personne/an)').head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
170,Éthiopie,14.0,,1.0,14.0,14.0,0.0,0.0,0.13,106399924
149,Tchad,7.0,0.0,1.0,7.0,6.0,0.0,0.0,0.45,15016753
78,Kenya,35.0,0.0,0.0,34.0,35.0,0.0,0.0,0.67,50221142
105,Niger,21.0,0.0,3.0,20.0,19.0,0.0,0.0,0.94,21602382
106,Nigéria,202.0,0.0,0.0,192.0,201.0,0.0,0.0,1.01,190873244
125,Rwanda,19.0,0.0,0.0,18.0,19.0,,0.0,1.49,11980961
14,Bangladesh,250.0,,0.0,240.0,249.0,,0.0,1.5,159685424
112,Ouganda,66.0,0.0,0.0,62.0,65.0,0.0,0.0,1.52,41166588
0,Afghanistan,57.0,,29.0,55.0,28.0,0.0,0.0,1.53,36296113
142,Soudan,69.0,,2.0,65.0,67.0,0.0,0.0,1.6,40813397


In [30]:
data_drop.sort_values('Disponibilité alimentaire en quantité (kg/personne/an)', ascending=False).head(10)

Unnamed: 0,Zone,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
134,Saint-Vincent-et-les Grenadines,8.0,,9.0,8.0,0.0,0.0,1.0,72.31,109827
72,Israël,636.0,3.0,0.0,556.0,629.0,0.0,-10.0,67.39,8243848
136,Samoa,15.0,0.0,17.0,13.0,0.0,0.0,2.0,64.77,195352
135,Sainte-Lucie,11.0,,10.0,10.0,1.0,0.0,0.0,56.69,180954
133,Saint-Kitts-et-Nevis,4.0,0.0,4.0,3.0,0.0,0.0,0.0,55.77,52045
169,États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
154,Trinité-et-Tobago,76.0,0.0,23.0,75.0,61.0,0.0,8.0,54.54,1384059
6,Antigua-et-Barbuda,7.0,0.0,7.0,5.0,0.0,0.0,0.0,54.1,95426
31,Chine - RAS de Hong-Kong,280.0,663.0,907.0,391.0,24.0,-125.0,-12.0,53.51,7306322
74,Jamaïque,152.0,1.0,31.0,149.0,128.0,0.0,7.0,51.1,2920848


In [31]:
#Juste pour vérifier
8000000/109827

72.84183306472907

In [32]:
#On met la zone en index, ce sera plus pratique pour l'ACP
data_drop.set_index('Zone', inplace=True)
data_drop

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,57.0,,29.0,55.0,28.0,0.0,0.0,1.53,36296113
Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756
Albanie,47.0,0.0,38.0,47.0,13.0,0.0,4.0,16.36,2884169
Algérie,277.0,0.0,2.0,264.0,275.0,0.0,0.0,6.38,41389189
Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...
Émirats arabes unis,412.0,94.0,433.0,412.0,48.0,0.0,-26.0,43.47,9487203
Équateur,341.0,0.0,0.0,324.0,340.0,0.0,-1.0,19.31,16785361
États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
Éthiopie,14.0,,1.0,14.0,14.0,0.0,0.0,0.13,106399924


In [33]:
#On cherche les valeurs manquantes
data_drop.isna().sum()

Disponibilité intérieure                                   2
Exportations - Quantité                                   37
Importations - Quantité                                    2
Nourriture                                                 2
Production                                                 4
Résidus                                                    8
Variation de stock                                         3
Disponibilité alimentaire en quantité (kg/personne/an)     0
Population                                                 0
dtype: int64

In [34]:
data_drop.loc[data_drop['Disponibilité intérieure'].isna()]

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Ouzbékistan,,,,,,,,1.96,31959785
République démocratique populaire lao,,,,,,,,10.91,6953035


In [35]:
#On drop le Laos et l'Ouzbékistan, sans valeurs pour ces pays difficile d'analyser quoi que ce soit
data_drop = data_drop.copy()
data_drop = data_drop.drop(['Ouzbékistan', 'République démocratique populaire lao'], axis=0)
data_drop

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,57.0,,29.0,55.0,28.0,0.0,0.0,1.53,36296113
Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756
Albanie,47.0,0.0,38.0,47.0,13.0,0.0,4.0,16.36,2884169
Algérie,277.0,0.0,2.0,264.0,275.0,0.0,0.0,6.38,41389189
Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...
Émirats arabes unis,412.0,94.0,433.0,412.0,48.0,0.0,-26.0,43.47,9487203
Équateur,341.0,0.0,0.0,324.0,340.0,0.0,-1.0,19.31,16785361
États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
Éthiopie,14.0,,1.0,14.0,14.0,0.0,0.0,0.13,106399924


In [36]:
mask = data_drop.isna().any(axis=1)
valeurs_na = data_drop[mask]
valeurs_na

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,57.0,,29.0,55.0,28.0,0.0,0.0,1.53,36296113
Bahamas,26.0,,24.0,16.0,6.0,0.0,4.0,43.17,381755
Bangladesh,250.0,,0.0,240.0,249.0,,0.0,1.5,159685424
Burkina Faso,46.0,,0.0,44.0,46.0,0.0,0.0,2.27,19193234
Cabo Verde,10.0,,12.0,9.0,1.0,0.0,4.0,17.62,537498
Cambodge,38.0,,10.0,37.0,28.0,0.0,0.0,2.34,16009409
Cuba,342.0,,312.0,269.0,29.0,0.0,-1.0,23.72,11339254
Djibouti,3.0,,3.0,3.0,,0.0,0.0,2.68,944099
Gambie,8.0,,16.0,8.0,2.0,0.0,10.0,3.53,2213889
Grenade,8.0,,7.0,5.0,1.0,0.0,0.0,45.7,110874


In [37]:
#Remplacer les valeurs manquantes par la moyenne (ou la médiane) me semble hasardeux, notamment pour les exportations
#Alors je choisis de remplacer les valeurs manquantes par 0
data_drop.fillna(0, inplace=True)
data_drop.isna().sum()

Disponibilité intérieure                                  0
Exportations - Quantité                                   0
Importations - Quantité                                   0
Nourriture                                                0
Production                                                0
Résidus                                                   0
Variation de stock                                        0
Disponibilité alimentaire en quantité (kg/personne/an)    0
Population                                                0
dtype: int64

In [38]:
#On vérifie la dimension de nos données
data_drop.shape

(170, 9)

On triche un peu...
Marty, je viens du futur et l'ACP a fait apparaître des outliers, s'il faut en ignorer quelques uns plutôt que de tout recommencer voyons ça ici !

In [39]:
#La liste des suspects...
data_drop.loc[["Chine, continentale", "États-Unis d'Amérique", "Brésil", "Inde", "Chine - RAS de Hong-Kong", "Japon", "Allemagne", "Pologne", "Pays-Bas"], :]

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
"Chine, continentale",18161.0,576.0,452.0,17518.0,18236.0,-1.0,-50.0,12.33,1421021791
États-Unis d'Amérique,18266.0,3692.0,123.0,18100.0,21914.0,0.0,80.0,55.68,325084756
Brésil,9982.0,4223.0,3.0,9982.0,14201.0,0.0,0.0,48.03,207833823
Inde,3661.0,4.0,0.0,2965.0,3545.0,0.0,-119.0,2.22,1338676785
Chine - RAS de Hong-Kong,280.0,663.0,907.0,391.0,24.0,-125.0,-12.0,53.51,7306322
Japon,2415.0,10.0,1069.0,2359.0,2215.0,0.0,859.0,18.5,127502725
Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
Pologne,1156.0,1025.0,55.0,1150.0,2351.0,-59.0,225.0,30.3,37953180
Pays-Bas,372.0,1418.0,608.0,346.0,1100.0,-78.0,-82.0,20.33,17021347


Il est probable que certains de ces pays se distinguent par leur importation, leurs résidus, leur population ou encore leur variation de stock très élevés, mais cela ne les disqualifie pas de notre enquête. En revanche, on constate également que les Etats-Unis, le Brésil, l'Inde (et dans une moindre mesure la Pologne) produisent énormément de viande de volailles, et importent peu voire pas du tout.

L'ACP étant très sensible aux outliers, il me semble judicieux de retirer les Etats-Unis, l'Inde et le Brésil du jeu de données. Je note que la Chine a une production gargantuesque également, mais cela ne l'empêche pas d'importer beaucoup, donc je choisis de la conserver.

In [40]:
data_final = data_drop.drop(["États-Unis d'Amérique", "Brésil", "Inde"], axis=0)
data_final

Unnamed: 0_level_0,Disponibilité intérieure,Exportations - Quantité,Importations - Quantité,Nourriture,Production,Résidus,Variation de stock,Disponibilité alimentaire en quantité (kg/personne/an),Population
Zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,57.0,0.0,29.0,55.0,28.0,0.0,0.0,1.53,36296113
Afrique du Sud,2118.0,63.0,514.0,2035.0,1667.0,0.0,0.0,35.69,57009756
Albanie,47.0,0.0,38.0,47.0,13.0,0.0,4.0,16.36,2884169
Algérie,277.0,0.0,2.0,264.0,275.0,0.0,0.0,6.38,41389189
Allemagne,1739.0,646.0,842.0,1609.0,1514.0,-38.0,-29.0,19.47,82658409
...,...,...,...,...,...,...,...,...,...
Égypte,1250.0,1.0,110.0,1250.0,1118.0,-1.0,-23.0,12.96,96442591
Émirats arabes unis,412.0,94.0,433.0,412.0,48.0,0.0,-26.0,43.47,9487203
Équateur,341.0,0.0,0.0,324.0,340.0,0.0,-1.0,19.31,16785361
Éthiopie,14.0,0.0,1.0,14.0,14.0,0.0,0.0,0.13,106399924


In [41]:
data_final.shape

(167, 9)

In [42]:
#On exporte en csv, la suite sur le 2ème notebook
data_final.to_csv('data_final.csv')

To be continued...