## Satisfaction à l'égard de la vie et PIB par habitant

Satisfaction de la vie

# La source
Cet ensemble de données a été obtenu sur le site Web de l'OCDE à l'adresse: http://stats.oecd.org/index.aspx?DataSetCode=BLI

# Description des données

# Exemple d'utilisation à l'aide de python Pandas

# Objectif 

Le but de cette première partie du travail est de créer une base de données SQL compilant les données par pays de l'indicateur du vivre mieux et du PIB.

Le PIB sera exprimé en € et l'indicateur en unité définies par l'OCDE.

Le but est de
- créer la base de donnée
- extraire et convertir les données souhaités
- alimenter la base de donnée


# Ressource

Ensemble de données obtenu sur le site Web du FMI à: http://goo.gl/j1MSKe

# Description des données

# Exemple d'utilisation à l'aide de python Pandas

In [222]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
import statsmodels.api as sm
from sklearn import linear_model

!pip install pays
from pays import Countries

pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)
pd.options.display.max_rows = 9999999



In [223]:
df = pd.read_csv("BLI_24022020153204055.csv", index_col=None, na_values=['NA'])
df.head()

Unnamed: 0,LOCATION,Country,INDICATOR,Indicator,MEASURE,Measure,INEQUALITY,Inequality,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,Value,Flag Codes,Flags
0,AUS,Australia,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,5.4,,
1,AUT,Austria,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.5,,
2,BEL,Belgium,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.7,,
3,CAN,Canada,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,6.0,,
4,CZE,Czech Republic,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,,,3.1,,


In [224]:
df = df[df['INEQUALITY'] == "TOT"]
df = df.pivot(index="Country", columns="Indicator", values="Value")
df.head(5)

Indicator,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,Housing expenditure,Labour market insecurity,Life expectancy,Life satisfaction,Long-term unemployment rate,Personal earnings,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Australia,5.0,,81.0,13.04,73.0,63.5,1.1,32759.0,427064.0,20.0,5.4,82.5,7.3,1.31,49126.0,95.0,,85.0,2.7,502.0,14.35,91.0,93.0,21.0
Austria,16.0,0.9,85.0,6.66,72.0,80.6,0.5,33541.0,308325.0,21.0,3.5,81.7,7.1,1.84,50349.0,92.0,1.6,70.0,1.3,492.0,14.55,80.0,92.0,17.0
Belgium,15.0,1.9,77.0,4.75,63.0,70.1,1.0,30364.0,386006.0,21.0,3.7,81.5,6.9,3.54,49675.0,91.0,2.2,74.0,2.0,503.0,15.7,89.0,84.0,19.3
Brazil,10.0,6.7,49.0,7.13,61.0,35.6,26.7,,,,,74.8,6.4,,,90.0,,,2.2,395.0,,79.0,73.0,16.2
Canada,7.0,0.2,91.0,3.69,73.0,82.2,1.3,30854.0,423849.0,22.0,6.0,81.9,7.4,0.77,47622.0,93.0,2.6,88.0,2.9,523.0,14.56,68.0,91.0,17.3


In [225]:
df['Life satisfaction'].head()

Country
Australia    7.3
Austria      7.1
Belgium      6.9
Brazil       6.4
Canada       7.4
Name: Life satisfaction, dtype: float64

In [226]:
df_pib = pd.read_csv("WEO_Data.csv", encoding='latin1', na_values=['NA'])
df_pib .head(3)

Unnamed: 0,Country,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After,Unnamed: 7,Unnamed: 8,Unnamed: 9
0,Afghanistan,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,599.994,2013.0,
1,Albania,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,3.0,995.383,2010.0
2,Algeria,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,4.0,318.135,2014.0


In [227]:
df_pib.rename(columns={"Unnamed: 7": "GDP per Capita"}, inplace=True)
df_pib.set_index("Country", inplace=True)
df_pib.head(3)

Unnamed: 0_level_0,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After,GDP per Capita,Unnamed: 8,Unnamed: 9
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,599.994,2013.0,
Albania,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,3.0,995.383,2010.0
Algeria,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,4.0,318.135,2014.0


In [228]:
del df_pib['Unnamed: 8']
df_pib.head(3)

Unnamed: 0_level_0,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After,GDP per Capita,Unnamed: 9
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Afghanistan,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,599.994,
Albania,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,3.0,2010.0
Algeria,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,4.0,2014.0


In [229]:
del df_pib['Unnamed: 9']
df_pib.head(3)

Unnamed: 0_level_0,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After,GDP per Capita
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,599.994
Albania,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,3.0
Algeria,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,4.0


In [230]:
df_pib['GDP per Capita'].astype('float', inplace =True)

Country
Afghanistan                           599.994
Albania                                 3.000
Algeria                                 4.000
Angola                                  4.000
Antigua and Barbuda                    14.000
Argentina                              13.000
Armenia                                 3.000
Australia                              50.000
Austria                                43.000
Azerbaijan                              5.000
The Bahamas                            23.000
Bahrain                                23.000
Bangladesh                              1.000
Barbados                               15.000
Belarus                                 5.000
Belgium                                40.000
Belize                                  4.000
Benin                                 780.063
Bhutan                                  2.000
Bolivia                                 2.000
Bosnia and Herzegovina                  4.000
Botswana                  

In [231]:
full_country_set = pd.merge(left=df, right=df_pib, left_index=True, right_index=True)
full_country_set.sort_values(by="GDP per Capita", inplace=True)
full_country_set.head(3)

Unnamed: 0_level_0,Air pollution,Dwellings without basic facilities,Educational attainment,Employees working very long hours,Employment rate,Feeling safe walking alone at night,Homicide rate,Household net adjusted disposable income,Household net wealth,Housing expenditure,Labour market insecurity,Life expectancy,Life satisfaction,Long-term unemployment rate,Personal earnings,Quality of support network,Rooms per person,Self-reported health,Stakeholder engagement for developing regulations,Student skills,Time devoted to leisure and personal care,Voter turnout,Water quality,Years in education,Subject Descriptor,Units,Scale,Country/Series-specific Notes,2015,Estimates Start After,GDP per Capita
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
South Africa,22.0,37.0,73.0,18.12,43.0,36.1,13.7,,,18.0,,57.5,4.7,16.46,,88.0,,,,,14.92,73.0,67.0,,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,5.0
Colombia,10.0,23.9,54.0,26.56,67.0,44.4,24.5,,,17.0,,76.2,6.3,0.79,,89.0,1.2,,1.4,410.0,,53.0,75.0,14.1,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,6.0
Brazil,10.0,6.7,49.0,7.13,61.0,35.6,26.7,,,,,74.8,6.4,,,90.0,,,2.2,395.0,,79.0,73.0,16.2,Gross domestic product per capita,current prices,U.S. dollars,Units,See notes for: Gross domestic product,current prices (National currency) Population...,8.0


## ESSAI

In [232]:
full_country_set[['GDP per Capita', 'Life satisfaction']].loc['France']

GDP per Capita       37.0
Life satisfaction     6.5
Name: France, dtype: float64

In [233]:
full_country_set[['GDP per Capita', 'Life satisfaction']].loc['Mexico']

GDP per Capita       9.0
Life satisfaction    6.5
Name: Mexico, dtype: float64

In [234]:
#remove_indices = [0, 1, 6, 8, 33, 34, 35]
#keep_indices = list(set(range(36)) - set(remove_indices))
sample_data = full_country_set[["GDP per Capita", "Life satisfaction"]]

In [235]:
sample_data.to_csv('life_satisfaction_vs_gdp_per_capita.csv')

In [236]:
sample_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, South Africa to Luxembourg
Data columns (total 2 columns):
GDP per Capita       40 non-null float64
Life satisfaction    40 non-null float64
dtypes: float64(2)
memory usage: 2.2+ KB


In [237]:
columns = ["gdp_per_capita", "life_satisfaction"]
sample_data.columns = columns

In [238]:
sample_data

Unnamed: 0_level_0,gdp_per_capita,life_satisfaction
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
South Africa,5.0,4.7
Colombia,6.0,6.3
Brazil,8.0,6.4
Russia,9.0,5.8
Turkey,9.0,5.5
Mexico,9.0,6.5
Poland,12.0,6.1
Hungary,12.0,5.6
Latvia,13.0,5.9
Chile,13.0,6.5


In [239]:
sample_data.gdp_per_capita = sample_data.gdp_per_capita.replace(',', '')
sample_data.gdp_per_capita = sample_data.gdp_per_capita.astype(float)

In [240]:
sample_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, South Africa to Luxembourg
Data columns (total 2 columns):
gdp_per_capita       40 non-null float64
life_satisfaction    40 non-null float64
dtypes: float64(2)
memory usage: 2.2+ KB


In [241]:
sample_data['gdp_per_capita'] = sample_data['gdp_per_capita'].div(0.91)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [242]:
sample_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, South Africa to Luxembourg
Data columns (total 2 columns):
gdp_per_capita       40 non-null float64
life_satisfaction    40 non-null float64
dtypes: float64(2)
memory usage: 2.2+ KB


In [243]:
sample_data.loc['France']

gdp_per_capita       40.659341
life_satisfaction     6.500000
Name: France, dtype: float64

In [244]:
sample_data = sample_data.rename(columns={'gdp_per_capita': 'PIB par habitant en €'})


In [245]:
sample_data = sample_data.rename(columns={'life_satisfaction': 'indicateur du vivre mieux'})

In [246]:
#sample_data.index.name = 'Noms de Pays'

In [247]:
sample_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, South Africa to Luxembourg
Data columns (total 2 columns):
PIB par habitant en €        40 non-null float64
indicateur du vivre mieux    40 non-null float64
dtypes: float64(2)
memory usage: 960.0+ bytes


In [248]:
#sample_data = sample_data.pivot(index = 'Country').reset_index(); sample_data.columns.name = None

# Essai de traduction des pays

In [249]:
sample_data = sample_data.reset_index()

In [250]:
sample_data

Unnamed: 0,Country,PIB par habitant en €,indicateur du vivre mieux
0,South Africa,5.494505,4.7
1,Colombia,6.593407,6.3
2,Brazil,8.791209,6.4
3,Russia,9.89011,5.8
4,Turkey,9.89011,5.5
5,Mexico,9.89011,6.5
6,Poland,13.186813,6.1
7,Hungary,13.186813,5.6
8,Latvia,14.285714,5.9
9,Chile,14.285714,6.5


In [251]:
# Install librairy
%pip install country-list

Note: you may need to restart the kernel to use updated packages.


In [252]:
# Import librairy
from country_list import countries_for_language
# All countries in English
countries_en = dict(countries_for_language('en'))
#countries_en

In [253]:
# All countries in French
countries_fr = dict(countries_for_language('fr'))
#countries_fr

In [254]:
countries_en['p1'] = 'Czech Republic' 
countries_en['p2'] = 'Korea'
countries_en['p3'] = 'Slovak Republic' 

countries_fr['p1'] = 'République Tchèque' 
countries_fr['p2'] = 'Corée'
countries_fr['p3'] = 'République Slovaque' 

In [255]:
pays = []
for country in sample_data.Country:
    for k,v in countries_en.items():
        if country == v:
            country_key = k
            pays.append(countries_fr[country_key])

In [256]:
pays

['Afrique du Sud',
 'Colombie',
 'Brésil',
 'Russie',
 'Turquie',
 'Mexique',
 'Pologne',
 'Hongrie',
 'Lettonie',
 'Chili',
 'Lituanie',
 'République Slovaque',
 'Estonie',
 'République Tchèque',
 'Grèce',
 'Portugal',
 'Slovénie',
 'Espagne',
 'Corée',
 'Italie',
 'Japon',
 'Israël',
 'France',
 'Nouvelle-Zélande',
 'Allemagne',
 'Belgique',
 'Finlande',
 'Pays-Bas',
 'Canada',
 'Autriche',
 'Royaume-Uni',
 'Suède',
 'Australie',
 'Islande',
 'Irlande',
 'Danemark',
 'États-Unis',
 'Norvège',
 'Suisse',
 'Luxembourg']

In [257]:
sample_data['Country'] = pays

In [258]:
sample_data

Unnamed: 0,Country,PIB par habitant en €,indicateur du vivre mieux
0,Afrique du Sud,5.494505,4.7
1,Colombie,6.593407,6.3
2,Brésil,8.791209,6.4
3,Russie,9.89011,5.8
4,Turquie,9.89011,5.5
5,Mexique,9.89011,6.5
6,Pologne,13.186813,6.1
7,Hongrie,13.186813,5.6
8,Lettonie,14.285714,5.9
9,Chili,14.285714,6.5


In [259]:
sample_data = sample_data.rename(columns={'Country': 'Noms pays'})

In [260]:
sample_data

Unnamed: 0,Noms pays,PIB par habitant en €,indicateur du vivre mieux
0,Afrique du Sud,5.494505,4.7
1,Colombie,6.593407,6.3
2,Brésil,8.791209,6.4
3,Russie,9.89011,5.8
4,Turquie,9.89011,5.5
5,Mexique,9.89011,6.5
6,Pologne,13.186813,6.1
7,Hongrie,13.186813,5.6
8,Lettonie,14.285714,5.9
9,Chili,14.285714,6.5


In [261]:

'''#import important libraries
from urllib.request import urlopen   
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import datetime
import urllib.request as request
import json

import mysql.connector
from mysql.connector import errorcode
import pymysql
from sqlalchemy import create_engine'''

'#import important libraries\nfrom urllib.request import urlopen   \nimport pandas as pd\nimport re\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport requests\nimport time\nimport datetime\nimport urllib.request as request\nimport json\n\nimport mysql.connector\nfrom mysql.connector import errorcode\nimport pymysql\nfrom sqlalchemy import create_engine'

In [262]:
sample_data.to_csv('oecd_pib_db.csv')

In [108]:
#establish the database connection, create a database and handle exceptions/errors
#create try - catch to catch all errors using the errors.Error exception
'''def create_database_and_connect_to_mysql():
    
    try:
        con = mysql.connector.connect(host='localhost:3306', user='root', passwd='1234')
        
        db_cursor = con.cursor(buffered=True)

        #create a database
        db_cursor.execute('CREATE DATABASE IF NOT EXISTS oecd_pib')
    except mysql.connector.Error as err:
        if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
            print("Error with user name or password")
        elif err.errno == errorcode.ER_BAD_DB_ERROR:
            print("Database does not exist or database name Error")
        else:
            print(err)
    
    return db_cursor, con


create_database_and_connect_to_mysql()'''

'def create_database_and_connect_to_mysql():\n    \n    try:\n        con = mysql.connector.connect(host=\'localhost:3306\', user=\'root\', passwd=\'1234\')\n        \n        db_cursor = con.cursor(buffered=True)\n\n        #create a database\n        db_cursor.execute(\'CREATE DATABASE IF NOT EXISTS oecd_pib\')\n    except mysql.connector.Error as err:\n        if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:\n            print("Error with user name or password")\n        elif err.errno == errorcode.ER_BAD_DB_ERROR:\n            print("Database does not exist or database name Error")\n        else:\n            print(err)\n    \n    return db_cursor, con\n\n\ncreate_database_and_connect_to_mysql()'

In [109]:
#check if the database was created
'''def check_databases():
    cur, conn = create_database_and_connect_to_mysql()
   
    #show all databses 
    cur.execute("SHOW DATABASES")
    for x in cur:
        print(x)
                        
check_databases()'''

'def check_databases():\n    cur, conn = create_database_and_connect_to_mysql()\n   \n    #show all databses \n    cur.execute("SHOW DATABASES")\n    for x in cur:\n        print(x)\n                        \ncheck_databases()'