# Manipulation de données

Dans cette section, nous allons manipuler des données sur le PIB du fichier `country_gdp_per_capita.csv`.

Ces données proviennent de https://data.worldbank.org/indicator/NY.GDP.MKTP.CD.

Nous allons commencer pour ouvrir le fichier pour comprendre comment il est organisé.

In [42]:
#importer pandas
import pandas as pd

## read_csv
La méthode `read_csv` permet de transformer un fichier `.csv` en DataFrame. Elle comporte de nombreux arguments qui doivent être utilisés selon le format du fichier.

La documentation officielle est ici : 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [43]:
# transformer le fichier .csv en DataFrame, séprarateur ; et décimal ,
df = pd.read_csv('country_gdp_per_capita.csv', sep=';', decimal=',')

## read_excel

La méthode `read_excel` permet de transformer un fichier `Excel` en DataFrame. Elle comporte de nombreux arguments qui doivent être utilisés selon le format du fichier.

La documentation officielle est ici : 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

In [44]:
# transformer le fichier .xlsx en DataFrame
df_metadata = pd.read_excel('metadata_country.xlsx')

## head and tail

La méthode head/tail permet de voir les premières/dernières lignes du DataFrame. Elle est très utile pour avoir un aperçu du DataFrame.

In [45]:
#10 premières lignes
df.head(10)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,26940.26411,28419.26453,28449.71295,29329.08175,30918.48358,31902.80982,24008.12782,29127.75938,33300.83882,
1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554.0,144.342434,148.774835,157.04758,166.849791,177.769086,...,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829,
2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451.0,62.443703,60.950364,82.021738,85.511073,105.243196,...,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,,
3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061.0,112.128417,117.814663,122.370114,130.700278,137.301801,...,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219,
4,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,5011.984412,3217.339244,1809.709377,2439.374441,2540.508878,2191.347764,1450.905112,1927.474078,3000.444231,
5,Albania,ALB,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,4578.633208,3952.803574,4124.05539,4531.032207,5287.660801,5396.214243,5343.037704,6377.203096,6810.114041,
6,Andorra,AND,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,45680.54694,38885.54859,39931.23626,40632.20898,42904.81159,41328.61239,37207.23887,42072.31942,41992.77278,
7,Arab World,ARB,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,7306.051166,6287.314935,6117.871507,6230.762079,6579.417105,6504.789633,5644.14257,6419.161029,7625.252464,
8,United Arab Emirates,ARE,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,46865.9646,41525.1389,41054.53957,43063.96748,46722.26872,45376.17084,37629.17417,44332.34005,53707.98008,
9,Argentina,ARG,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,12334.79825,13789.06042,12790.26414,14613.03565,11795.16275,9963.674162,8500.837939,10650.86046,13650.60463,


In [46]:
# 5 dernières lignes
df.tail(5)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3902.530841,3520.782075,3759.472855,4009.353811,4384.18868,4416.029253,4310.934002,5269.783901,5340.268798,
262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.51201,543.637538,650.272218
263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,681.131111,...,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254,
264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,296.022427,...,1724.57622,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.90157,
265,Zimbabwe,ZWE,GDP per capita (current US$),NY.GDP.PCAP.CD,2766433633,279.332656,275.966139,277.532515,282.376856,294.893605,...,1407.034291,1410.329173,1421.787791,1192.107012,2269.177012,1421.868596,1372.696674,1773.920411,1676.821489,


## dtypes

La méthode `dtypes` donne des informations sur le type de données des colonnes.

In [47]:
# types des colonnes
df.dtypes

Country Name       object
Country Code       object
Indicator Name     object
Indicator Code     object
1960               object
                   ...   
2019              float64
2020              float64
2021              float64
2022              float64
2023              float64
Length: 68, dtype: object

## columns

La méthode `columns` renvoie le nom des colonnes

In [48]:
#nom des colonnes
df.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022',
       '2023'],
      dtype='object')

## describe

La méthode `describe` permet d'obtenir les statistiques descriptives des colonnes numériques.

In [49]:
# statistiques descriptives
df.describe()

Unnamed: 0,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
count,127.0,128.0,128.0,128.0,139.0,142.0,146.0,151.0,151.0,160.0,...,260.0,259.0,258.0,258.0,258.0,259.0,258.0,256.0,243.0,11.0
mean,502.976697,520.404631,553.468201,604.577825,673.803865,731.469716,745.022845,760.15128,820.672635,996.443665,...,17064.380632,15521.545235,15768.631337,16523.962131,17507.516485,17445.304475,16426.99865,18233.2753,18049.821164,14989.974736
std,651.69428,684.418041,727.363933,793.862993,874.272171,948.220722,979.174039,1006.42312,1083.502348,1493.251714,...,25905.040269,23524.477663,23699.741977,24412.973978,26002.720821,25805.132404,24825.62828,28549.377752,27221.317318,17964.502908
min,27.27463,27.953179,25.80701,17.331833,15.113227,11.792676,16.523528,21.500298,21.450314,20.655084,...,257.818557,289.359627,242.539527,244.145422,232.060617,216.972971,216.827417,221.157803,259.025031,650.272218
25%,107.718713,111.533599,112.351406,119.885298,139.029272,145.356579,139.935403,148.595671,150.691255,186.882516,...,2157.278578,2074.992015,2095.783356,2116.826848,2191.952298,2214.933292,2160.593455,2322.371311,2337.05615,1728.751652
50%,191.907441,188.794331,207.681429,223.273188,243.13056,253.550917,249.467378,264.227758,282.6816,326.386292,...,6904.585574,6204.929901,6108.914305,6443.380323,6929.960914,6955.880824,6188.234459,7151.905796,7182.265382,4295.407496
75%,555.974983,574.788985,603.571809,675.38774,784.041552,884.721621,857.226487,798.869953,876.480505,1014.516177,...,20314.151055,18168.196055,18688.78761,20264.31379,22054.13513,21649.08957,19460.492775,21139.835847,20831.15572,31940.43765
max,3066.562869,3243.843078,3374.515171,3573.941185,4081.424492,4228.745371,4336.426587,4695.92339,5032.144743,12077.76404,...,195772.7243,170338.6804,174412.4945,173611.6879,193968.0901,199382.8386,182537.3874,235132.7842,240862.1824,48983.62172


## set_index

La méthode `set_index` permet de changer l'index.

In [50]:
# l'index est la colonne Country Name, on garde la colonne avec drop = False
df.set_index(["Country Name"],drop=False)

Unnamed: 0_level_0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,26940.264110,28419.264530,28449.712950,29329.081750,30918.483580,31902.809820,24008.127820,29127.759380,33300.838820,
Africa Eastern and Southern,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554,144.342434,148.774835,157.047580,166.849791,177.769086,...,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829,
Afghanistan,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451,62.443703,60.950364,82.021738,85.511073,105.243196,...,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,,
Africa Western and Central,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061,112.128417,117.814663,122.370114,130.700278,137.301801,...,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219,
Angola,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,5011.984412,3217.339244,1809.709377,2439.374441,2540.508878,2191.347764,1450.905112,1927.474078,3000.444231,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Kosovo,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3902.530841,3520.782075,3759.472855,4009.353811,4384.188680,4416.029253,4310.934002,5269.783901,5340.268798,
Yemen,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538,650.272218
South Africa,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,681.131111,...,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254,
Zambia,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,296.022427,...,1724.576220,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.901570,


## reset_index

La méthode `reset_index` permet de redéfinir l'index numérique (de 0 à nombre de lignes -1).

In [51]:
#on redéfinit l'index et on garde l'ancien index comme une colonne avec drop = False
df.reset_index(drop=False)

Unnamed: 0,index,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,0,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,...,26940.264110,28419.264530,28449.712950,29329.081750,30918.483580,31902.809820,24008.127820,29127.759380,33300.838820,
1,1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554,144.342434,148.774835,157.047580,166.849791,...,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829,
2,2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451,62.443703,60.950364,82.021738,85.511073,...,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,,
3,3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061,112.128417,117.814663,122.370114,130.700278,...,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219,
4,4,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,...,5011.984412,3217.339244,1809.709377,2439.374441,2540.508878,2191.347764,1450.905112,1927.474078,3000.444231,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,...,3902.530841,3520.782075,3759.472855,4009.353811,4384.188680,4416.029253,4310.934002,5269.783901,5340.268798,
262,262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538,650.272218
263,263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,...,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254,
264,264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,...,1724.576220,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.901570,


## sort_values

La méthode `sort_values` trie les données par ordre ascendant ou descendant.

In [52]:
# tri par ordre ascendant 
df.sort_values(by='2022', ascending=True)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
16,Burundi,BDI,GDP per capita (current US$),NY.GDP.PCAP.CD,7136022425,72.088782,73.942008,78.948269,85.964725,50.990420,...,257.818557,289.359627,242.539527,244.145422,232.060617,216.972971,216.827417,221.157803,259.025031,
34,Central African Republic,CAF,GDP per capita (current US$),NY.GDP.PCAP.CD,6677009521,71.993203,71.439100,72.839829,78.393525,81.428270,...,394.856933,351.879755,372.135456,414.740322,435.932297,426.408753,435.469248,461.137511,427.058096,
210,Sierra Leone,SLE,GDP per capita (current US$),NY.GDP.PCAP.CD,1399861255,140.098309,143.887993,143.690390,150.448578,142.635576,...,702.338588,581.293412,515.447840,484.456129,519.649964,506.606914,493.432241,504.621288,475.795728,
151,Madagascar,MDG,GDP per capita (current US$),NY.GDP.PCAP.CD,1326702842,134.293094,138.362478,138.459331,142.536192,144.203832,...,517.136183,455.638035,464.616158,503.498059,512.543991,512.279666,462.404229,503.352081,516.592616,
262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538,650.272218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
227,Syrian Arab Republic,SYR,GDP per capita (current US$),NY.GDP.PCAP.CD,1860242872,198.934696,226.877838,237.936395,257.454145,247.714366,...,1071.234204,857.497867,664.341672,862.319063,1111.872092,1124.520554,537.090235,420.622705,,
239,Tonga,TON,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,4125.425454,4117.931047,3978.437686,4367.256603,4649.613301,4878.978686,4605.970841,4425.971492,,
254,Venezuela,RB,VEN,GDP per capita (current US$),"NY,GDP,PCAP,CD",939.560806,954.355361,1006.879977,1060.570324,874.199411,...,12433.980790,15975.729380,,,,,,,,
255,British Virgin Islands,VGB,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,,,,,,,,,,


In [53]:
# tri par ordre descendant 
df.sort_values(by='2022', ascending=False)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
149,Monaco,MCO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,195772.724300,170338.680400,174412.494500,173611.687900,193968.090100,199382.838600,182537.387400,235132.784200,240862.1824,
144,Luxembourg,LUX,GDP per capita (current US$),NY.GDP.PCAP.CD,2242015817,2222.366366,2311.798849,2441.038555,2755.633117,2780.092719,...,123678.702100,105462.012600,106899.293500,110193.213800,116786.511700,112726.439700,116905.370400,133711.794400,125006.0218,
27,Bermuda,BMU,GDP per capita (current US$),NY.GDP.PCAP.CD,1902402085,1961.538135,2020.385929,2020.265212,2199.726968,2282.216573,...,98467.683990,102005.625600,106885.878500,111820.581500,113050.736900,116153.166100,107791.886400,111774.669100,118774.7907,
177,Norway,NOR,GDP per capita (current US$),NY.GDP.PCAP.CD,144175566,1560.324931,1667.247430,1775.582655,1937.884614,2164.468823,...,97666.695180,74809.965800,70867.361000,76131.838400,82792.842710,76430.588950,68340.018100,93072.892510,108729.1869,
111,Ireland,IRL,GDP per capita (current US$),NY.GDP.PCAP.CD,6856147124,739.276406,797.006288,852.135302,965.135423,1023.773726,...,55752.764980,62179.264270,62784.065690,70150.737020,79446.939110,80848.301900,85973.088490,102001.798200,103983.2913,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
227,Syrian Arab Republic,SYR,GDP per capita (current US$),NY.GDP.PCAP.CD,1860242872,198.934696,226.877838,237.936395,257.454145,247.714366,...,1071.234204,857.497867,664.341672,862.319063,1111.872092,1124.520554,537.090235,420.622705,,
239,Tonga,TON,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,4125.425454,4117.931047,3978.437686,4367.256603,4649.613301,4878.978686,4605.970841,4425.971492,,
254,Venezuela,RB,VEN,GDP per capita (current US$),"NY,GDP,PCAP,CD",939.560806,954.355361,1006.879977,1060.570324,874.199411,...,12433.980790,15975.729380,,,,,,,,
255,British Virgin Islands,VGB,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,,,,,,,,,,


## Sélection d'une partie du DataFrame

On sélectionne les colonnes souhaitées en les ajoutant dans une liste.

In [54]:
# sélection des PIB 1960, 1980, 2000, 2020
df[['Country Name', '1960', '1980', '2000', '2020']]

Unnamed: 0,Country Name,1960,1980,2000,2020
0,Aruba,,,21026.167090,24008.127820
1,Africa Eastern and Southern,1413859554,720.771506,710.665706,1355.805923
2,Afghanistan,6236937451,291.649791,,512.055098
3,Africa Western and Central,1070537061,709.848876,519.757395,1688.075575
4,Angola,,712.369763,556.884244,1450.905112
...,...,...,...,...,...
261,Kosovo,,,,4310.934002
262,Yemen,"NY,GDP,PCAP,CD",,421.723115,693.816504
263,South Africa,5295619227,3034.660366,3241.661396,5753.066494
264,Zambia,2285673985,678.774900,364.026145,956.831729


## drop_duplicates

La méthode `drop_duplicates` supprime les valeurs en doubles et permet d'obtenir les enregistrents uniques.


In [55]:
#supprime les pays en double
df["Country Name"].drop_duplicates()

0                            Aruba
1      Africa Eastern and Southern
2                      Afghanistan
3       Africa Western and Central
4                           Angola
                  ...             
261                         Kosovo
262                          Yemen
263                   South Africa
264                         Zambia
265                       Zimbabwe
Name: Country Name, Length: 264, dtype: object

## drop_na

La méthode drop_na supprime les lignes qui ont des valeurs nulles.

In [56]:
#garde uniquement les pays qui ont tous les PIBs renseignés
df.dropna()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
23,Bahamas,The,BHS,GDP per capita (current US$),"NY,GDP,PCAP,CD",1483.003682,1581.30398,1680.494849,1792.422274,1921.437235,...,26955.78893,28203.35568,29724.95334,29675.53589,30708.98702,31483.97884,32279.01136,23998.26802,28260.43255,31458.30081
44,Congo,Rep.,COG,GDP per capita (current US$),"NY,GDP,PCAP,CD",124.782359,139.995753,149.702467,150.738168,158.135593,...,3719.651036,3623.80746,2455.34791,2107.503024,2227.720139,2715.243105,2508.944783,2011.269479,2540.473212,2649.225022
96,Hong Kong SAR,China,HKG,GDP per capita (current US$),"NY,GDP,PCAP,CD",424.056554,436.754412,487.821134,565.72781,629.591526,...,38403.77771,40315.37395,42432.16197,43734.19807,46160.42979,48537.56689,48359.0012,46109.22999,49764.79312,48983.62172
126,Korea,Rep.,KOR,GDP per capita (current US$),"NY,GDP,PCAP,CD",158.274136,93.831383,106.159703,146.302493,123.606375,...,27179.51701,29252.93124,28737.43917,29280.44032,31600.73587,33447.15628,31902.4169,31721.29891,35142.26427,32422.57449


## fillna

La méthode `fillna` permet de remplacer les données manquantes.

In [57]:
# Remplacer les données manquantes par 0
df.fillna(0)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,0,0.000000,0.000000,0.000000,0.000000,0.000000,...,26940.264110,28419.264530,28449.712950,29329.081750,30918.483580,31902.809820,24008.127820,29127.759380,33300.838820,0.000000
1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554,144.342434,148.774835,157.047580,166.849791,177.769086,...,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829,0.000000
2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451,62.443703,60.950364,82.021738,85.511073,105.243196,...,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,0.000000,0.000000
3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061,112.128417,117.814663,122.370114,130.700278,137.301801,...,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219,0.000000
4,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,0,0.000000,0.000000,0.000000,0.000000,0.000000,...,5011.984412,3217.339244,1809.709377,2439.374441,2540.508878,2191.347764,1450.905112,1927.474078,3000.444231,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,0,0.000000,0.000000,0.000000,0.000000,0.000000,...,3902.530841,3520.782075,3759.472855,4009.353811,4384.188680,4416.029253,4310.934002,5269.783901,5340.268798,0.000000
262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",0.000000,0.000000,0.000000,0.000000,0.000000,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538,650.272218
263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,681.131111,...,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254,0.000000
264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,296.022427,...,1724.576220,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.901570,0.000000


## drop

La méthode `drop` permet de supprimer des lignes (axis = 0) and supprimer des colonnes (axis = 1).

In [58]:
#supprime la premièreet 5ème ligne
df.drop([0, 4], axis= 0)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554,144.342434,148.774835,157.047580,166.849791,177.769086,...,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829,
2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451,62.443703,60.950364,82.021738,85.511073,105.243196,...,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,,
3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061,112.128417,117.814663,122.370114,130.700278,137.301801,...,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219,
5,Albania,ALB,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,4578.633208,3952.803574,4124.055390,4531.032207,5287.660801,5396.214243,5343.037704,6377.203096,6810.114041,
6,Andorra,AND,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,45680.546940,38885.548590,39931.236260,40632.208980,42904.811590,41328.612390,37207.238870,42072.319420,41992.772780,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3902.530841,3520.782075,3759.472855,4009.353811,4384.188680,4416.029253,4310.934002,5269.783901,5340.268798,
262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,,...,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538,650.272218
263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,681.131111,...,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254,
264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,296.022427,...,1724.576220,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.901570,


In [59]:
# supprime la colonne 2023
df.drop('2023', axis=1)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,26514.868980,26940.264110,28419.264530,28449.712950,29329.081750,30918.483580,31902.809820,24008.127820,29127.759380,33300.838820
1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,1413859554,144.342434,148.774835,157.047580,166.849791,177.769086,...,1736.849038,1725.332959,1554.167299,1444.003514,1625.286236,1558.307482,1507.982881,1355.805923,1545.613215,1644.062829
2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,6236937451,62.443703,60.950364,82.021738,85.511073,105.243196,...,638.733185,626.512931,566.881133,523.053012,526.140801,492.090632,497.741429,512.055098,355.777826,
3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,1070537061,112.128417,117.814663,122.370114,130.700278,137.301801,...,2154.150832,2248.316255,1882.264038,1648.762676,1590.277754,1735.374911,1812.446822,1688.075575,1766.943618,1785.312219
4,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,5061.349240,5011.984412,3217.339244,1809.709377,2439.374441,2540.508878,2191.347764,1450.905112,1927.474078,3000.444231
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3704.562803,3902.530841,3520.782075,3759.472855,4009.353811,4384.188680,4416.029253,4310.934002,5269.783901,5340.268798
262,Yemen,Rep.,YEM,GDP per capita (current US$),"NY,GDP,PCAP,CD",,,,,,...,1349.990610,1497.747941,1557.601406,1488.416269,1069.816998,893.716494,701.714869,693.816504,578.512010,543.637538
263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,5295619227,543.042224,560.699395,601.599951,642.688431,681.131111,...,7441.230854,6965.137897,6204.929901,5735.066787,6734.475153,7067.724165,6702.526617,5753.066494,7073.612754,6766.481254
264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,2285673985,216.274674,208.562685,209.453362,236.941713,296.022427,...,1840.320553,1724.576220,1307.909649,1249.923143,1495.752138,1475.199883,1268.120941,956.831729,1134.713454,1456.901570


## insert

La méthode `insert` permet d'ajouter une colonne.

In [60]:
# On insère la colonne 2024 à la fin du DataFrame
df.insert(len(df.columns), '2024', 100)

## Filter avec loc

La méthode `loc` permet de filtrer les données et obtenir un DataFrame avec les informations recherchées. Les opérations sur les nombres et les chaînes de texte fonctionnent avec `loc`.

In [61]:
#sélectionne la France
df.loc[df["Country Name"] == 'France']

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
77,France,FRA,GDP per capita (current US$),NY.GDP.PCAP.CD,1333881573,1430.434624,1585.735311,1758.856659,1928.999402,2060.299715,...,36652.92231,37062.53357,38781.04949,41557.85486,40494.89829,39179.74426,43671.30841,40886.25327,,100


In [62]:
#sélectionne la France, el Japon et l'Argentine
df.loc[df["Country Name"].isin(['France', 'Japan', 'Argentina'])]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
9,Argentina,ARG,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,13789.06042,12790.26414,14613.03565,11795.16275,9963.674162,8500.837939,10650.86046,13650.60463,,100
77,France,FRA,GDP per capita (current US$),NY.GDP.PCAP.CD,1333881573.0,1430.434624,1585.735311,1758.856659,1928.999402,2060.299715,...,36652.92231,37062.53357,38781.04949,41557.85486,40494.89829,39179.74426,43671.30841,40886.25327,,100
119,Japan,JPN,GDP per capita (current US$),NY.GDP.PCAP.CD,4753190756.0,568.907743,639.640785,724.693762,843.616878,928.518849,...,34960.63938,39375.47316,38834.05293,39751.1331,40415.95676,40040.76551,40058.53733,34017.27181,,100


In [63]:
# Séection du PIB de la France par habitant en 1960
df.loc[df["Country Name"] == 'France', ['1960']]

Unnamed: 0,1960
77,1333881573


In [64]:
# sélection des pays avec un PIB par habitant supérieur à 50K$
df.loc[df['2022'] > 50000 ]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
8,United Arab Emirates,ARE,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,41525.1389,41054.53957,43063.96748,46722.26872,45376.17084,37629.17417,44332.34005,53707.98008,,100
13,Australia,AUS,GDP per capita (current US$),NY.GDP.PCAP.CD,1810597443.0,1877.600224,1854.64189,1967.108991,2131.3803,2281.011956,...,56758.8692,49918.79393,53954.55349,57273.52048,55049.57192,51868.24756,60697.24544,65099.84591,,100
14,Austria,AUT,GDP per capita (current US$),NY.GDP.PCAP.CD,9354604269.0,1031.815004,1087.834243,1167.000532,1269.412583,1374.53214,...,44195.81759,45307.58786,47429.15846,51466.55656,50067.58573,48789.49785,53517.89045,52084.6812,,100
27,Bermuda,BMU,GDP per capita (current US$),NY.GDP.PCAP.CD,1902402085.0,1961.538135,2020.385929,2020.265212,2199.726968,2282.216573,...,102005.6256,106885.8785,111820.5815,113050.7369,116153.1661,107791.8864,111774.6691,118774.7907,,100
35,Canada,CAN,GDP per capita (current US$),NY.GDP.PCAP.CD,2259250511.0,2240.433039,2268.585346,2374.498448,2555.111146,2770.361804,...,43596.13554,42315.60371,45129.4293,46548.63841,46374.15275,43562.43583,52515.19984,55522.44569,,100
37,Switzerland,CHE,GDP per capita (current US$),NY.GDP.PCAP.CD,1787360348.0,1971.316323,2131.391652,2294.182847,2501.29319,2620.475547,...,83806.4476,82153.07454,82254.37693,85217.36915,84121.93103,85897.78433,93446.43445,93259.90572,,100
52,Cayman Islands,CYM,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,77295.84529,78858.28026,81255.11246,85231.77425,89846.32122,83897.50544,88475.60047,99624.88544,,100
58,Denmark,DNK,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,53254.85637,54663.99837,57610.09818,61591.92887,59592.98069,60836.59241,69268.6518,67790.05399,,100
75,Finland,FIN,GDP per capita (current US$),NY.GDP.PCAP.CD,1179353011.0,1327.427224,1411.702398,1522.319242,1707.503938,1882.086858,...,42801.90812,43814.02651,46412.13648,49987.62616,48629.85823,49169.71934,53504.69365,50871.93045,,100
78,Faroe Islands,FRO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,1594.52271,...,52726.68738,56833.91654,59328.23766,62576.80164,63203.74478,62234.96868,69108.20658,66979.27732,,100


In [65]:
#Sélection des pays avec un PIB supérieur à 50K$ en 2021 et qui commencent par la lettre A
df.loc[(df['2021']> 50000) & (df['Country Name'].str.startswith('A'))]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
13,Australia,AUS,GDP per capita (current US$),NY.GDP.PCAP.CD,1810597443,1877.600224,1854.64189,1967.108991,2131.3803,2281.011956,...,56758.8692,49918.79393,53954.55349,57273.52048,55049.57192,51868.24756,60697.24544,65099.84591,,100
14,Austria,AUT,GDP per capita (current US$),NY.GDP.PCAP.CD,9354604269,1031.815004,1087.834243,1167.000532,1269.412583,1374.53214,...,44195.81759,45307.58786,47429.15846,51466.55656,50067.58573,48789.49785,53517.89045,52084.6812,,100
