First, let's upload the data to the dataframe.

In [1]:
import pandas as pd

In [2]:
world_inflation = pd.read_csv('world_inflation/Data set for inflation.csv')
world_inflation.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,0.627472,-2.372065,0.421441,0.474764,-0.931196,-1.028282,3.626041,4.257462,,
1,Africa Eastern and Southern,AFE,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,9.158707,5.750981,5.37029,5.250171,6.571396,6.399343,4.720811,4.120246,5.404815,7.240978
2,Afghanistan,AFG,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,6.441213,7.385772,4.673996,-0.661709,4.383892,4.975952,0.626149,2.302373,,
3,Africa Western and Central,AFW,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,4.578375,2.439201,1.758052,2.130268,1.494564,1.764635,1.78405,1.758565,2.492522,3.925603
4,Angola,AGO,"Inflation, consumer prices (annual %)",FP.CPI.TOTL.ZG,,,,,,,...,10.277905,8.777814,7.280387,9.150372,30.695313,29.843587,19.628608,17.081215,,


We can see that we do not need columns "Indicator Name" and "Indicator code". These two columns are common to all observations and we already know that the data concerns annual inflation rate (CPI).

In [3]:
world_inflation = world_inflation.drop(columns=["Indicator Name", "Indicator Code"])
world_inflation.head()

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,ABW,,,,,,,,,...,0.627472,-2.372065,0.421441,0.474764,-0.931196,-1.028282,3.626041,4.257462,,
1,Africa Eastern and Southern,AFE,,,,,,,,,...,9.158707,5.750981,5.37029,5.250171,6.571396,6.399343,4.720811,4.120246,5.404815,7.240978
2,Afghanistan,AFG,,,,,,,,,...,6.441213,7.385772,4.673996,-0.661709,4.383892,4.975952,0.626149,2.302373,,
3,Africa Western and Central,AFW,,,,,,,,,...,4.578375,2.439201,1.758052,2.130268,1.494564,1.764635,1.78405,1.758565,2.492522,3.925603
4,Angola,AGO,,,,,,,,,...,10.277905,8.777814,7.280387,9.150372,30.695313,29.843587,19.628608,17.081215,,


Now, let's see what kind of data we have. It seems that we have 266 observations with 66 features of which 62 refer to years 1960-2021.

In [4]:
world_inflation.shape

(266, 64)

However, it seems that lots of data is missing, i.e. we have limited observations of inflation in particular countries depending on year. For example, in 1960 we have just 70 non-null cells and in 2020 we have 213 non-null cells.

In [5]:
world_inflation.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 64 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country Name  266 non-null    object 
 1   Country Code  266 non-null    object 
 2   1960          70 non-null     float64
 3   1961          72 non-null     float64
 4   1962          74 non-null     float64
 5   1963          75 non-null     float64
 6   1964          79 non-null     float64
 7   1965          86 non-null     float64
 8   1966          93 non-null     float64
 9   1967          100 non-null    float64
 10  1968          101 non-null    float64
 11  1969          102 non-null    float64
 12  1970          107 non-null    float64
 13  1971          111 non-null    float64
 14  1972          114 non-null    float64
 15  1973          117 non-null    float64
 16  1974          120 non-null    float64
 17  1975          124 non-null    float64
 18  1976          125 non-null    

We can see that the number of observations increased in time, but this still leaves us with the problem of missing values. Basically, you can do three things with missing values: keep the missing data, drop the missing data or fill the missing data. In our case the best strategy is to keep the missing data, as we would be left with very limited data if we dropped columns or rows with missing data and it would make no sense to fill values for inflation, which is a distinct and sometimes volatile variable. However, we need to keep in mind that lots of data is missing.

To finish up this introductory part, we can have a look at basic statistical summary for our data.

In [6]:
world_inflation.describe()

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
count,70.0,72.0,74.0,75.0,79.0,86.0,93.0,100.0,101.0,102.0,...,237.0,235.0,233.0,232.0,232.0,227.0,224.0,222.0,213.0,192.0
mean,3.55498,3.491289,4.630783,5.768657,6.26387,8.192841,17.765764,5.469524,6.309716,4.446334,...,5.499223,4.031436,3.61261,3.516273,5.891873,4.594103,3.971099,4.589566,6.696483,7.829945
std,6.795697,4.508248,15.209955,17.049591,13.749261,33.376181,117.536433,14.150802,18.30969,4.14057,...,6.521095,5.05175,5.600188,9.659254,30.205721,13.055816,7.484516,18.529245,39.906358,30.515852
min,-5.030042,-3.9,-3.846154,-2.694655,-4.535654,-3.878976,-1.361868,-8.422486,-10.033895,-4.339051,...,-3.045863,-4.294873,-1.509245,-3.749145,-3.078218,-1.5371,-2.814698,-3.233389,-2.595243,-0.772844
25%,0.87137,1.468978,1.147323,1.695327,1.870349,1.940405,2.479008,1.564937,1.588785,2.347005,...,2.577182,1.461883,0.906798,0.3052,0.388838,1.429107,1.659275,1.114597,0.574163,2.333678
50%,1.945749,2.102977,2.669962,2.876322,3.328408,3.410026,3.815659,3.020244,3.161937,3.388412,...,3.85216,2.767897,2.559749,1.548692,1.65214,2.380236,2.527344,2.216776,2.002412,3.789169
75%,4.037155,3.601606,4.614353,4.997767,4.822426,4.93817,6.951872,4.500319,4.697428,5.811237,...,6.314745,5.273786,4.625551,4.03125,4.17139,4.436118,3.961816,3.101341,3.681813,5.329709
max,39.590444,22.747264,131.39785,145.910781,108.994709,306.76311,1136.254112,106.0,128.843042,21.763295,...,59.219736,40.639428,62.16865,121.738085,379.999586,187.85163,83.501529,255.304991,557.201817,382.815998


Now, you might have noticed that the dataframe consist not only of data for countries, but also for geographical regions (continental areas) or groups of countries (organizations like OECD). Let's divide the data into more consistent tables.

In [7]:
continents_inflation =

SyntaxError: invalid syntax (2422227326.py, line 1)