## Pre-Processing

This is the pre-processing notebook file for our Final Project.

The end goal of this pre-processing will be to clean out the data so that we only have the countries that we want to classify (countries of the G20) and only have economic indicators from the World Bank.

In [1]:
import pandas as pd
import numpy as np

dataset = pd.read_csv("WDIData.csv")
dataset

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Unnamed: 66
0,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,,,,,,...,16.936004,17.337896,17.687093,18.140971,18.491344,18.825520,19.272212,19.628009,,
1,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.RU.ZS,,,,,,,...,6.499471,6.680066,6.859110,7.016238,7.180364,7.322294,7.517191,7.651598,,
2,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.UR.ZS,,,,,,,...,37.855399,38.046781,38.326255,38.468426,38.670044,38.722783,38.927016,39.042839,,
3,Africa Eastern and Southern,AFE,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,...,31.794160,32.001027,33.871910,38.880173,40.261358,43.061877,44.270860,45.803485,,
4,Africa Eastern and Southern,AFE,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,,,,,,,...,18.663502,17.633986,16.464681,24.531436,25.345111,27.449908,29.641760,30.404935,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
383567,Zimbabwe,ZWE,Women who believe a husband is justified in be...,SG.VAW.REFU.ZS,,,,,,,...,,,14.500000,,,,,,,
383568,Zimbabwe,ZWE,Women who were first married by age 15 (% of w...,SP.M15.2024.FE.ZS,,,,,,,...,,,3.700000,,,,5.418352,,,
383569,Zimbabwe,ZWE,Women who were first married by age 18 (% of w...,SP.M18.2024.FE.ZS,,,,,,,...,,33.500000,32.400000,,,,33.658057,,,
383570,Zimbabwe,ZWE,Women's share of population ages 15+ living wi...,SH.DYN.AIDS.FE.ZS,,,,,,,...,59.100000,59.400000,59.500000,59.700000,59.900000,60.100000,60.300000,60.500000,60.7,


Now that the data is loaded, I'm going to first pull out all non-G20 countries.

The countries that will remain in the dataset that are in the G20 (including members and countries represented by the European Union at G20, so more like G41) are:

<ul>
    <li>Argentina</li>
    <li>Austria</li>
    <li>Australia</li>
    <li>Belgium</li>
    <li>Brazil</li>
    <li>Bulgaria</li>
    <li>Canada</li>
    <li>China</li>
    <li>Croatia</li>
    <li>Cyprus</li>
    <li>Czechia</li>
    <li>Denmark</li>
    <li>Estonia</li>
    <li>Finland</li>
    <li>France</li>
    <li>Germany</li>
    <li>Greece</li>
    <li>Hungary</li>
    <li>India</li>
    <li>Indonesia</li>
    <li>Ireland</li>
    <li>Italy</li>
    <li>Japan</li>
    <li>Latvia</li>
    <li>Lithuania</li>
    <li>Malta</li>
    <li>Mexico</li>
    <li>Netherlands</li>
    <li>Poland</li>
    <li>Portugal</li>
    <li>Slovakia</li>
    <li>Slovenia</li>
    <li>South Korea</li>
    <li>Spain</li>
    <li>Sweden</li>
    <li>Romania</li>
    <li>Russia</li>
    <li>Saudi Arabia</li>
    <li>South Africa</li>
    <li>Turkey</li>
    <li>United Kingdom</li>
    <li>United States</li>
</ul>

In [2]:
countries = ['Argentina', 'Austria', 'Australia', 'Belgium', 'Brazil', 'Bulgaria', 'Canada', 'China', 'Croatia',
            'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'India',
            'Indonesia', 'Ireland', 'Italy', 'Japan', 'Latvia', 'Lithuania', 'Malta', 'Mexico', 'Netherlands',
            'Poland', 'Portugal', 'Slovak Republic', 'Slovenia', 'South Korea', 'Spain', 'Sweden', 'Romania', 
             'Russian Federation', 'Saudi Arabia', 'South Africa', 'Turkiye', 'United Kingdom', 'United States']

dataset = dataset[dataset['Country Name'].isin(countries)]


In [3]:
dataset

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Unnamed: 66
80752,Argentina,ARG,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,,,,,,...,99.600000,99.7,99.700000,99.800000,99.8,99.800000,99.9,99.9,,
80753,Argentina,ARG,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.RU.ZS,,,,,,,...,94.300000,95.0,95.500000,96.000000,96.3,96.600000,97.0,97.2,,
80754,Argentina,ARG,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.UR.ZS,,,,,,,...,99.900000,99.9,99.900000,99.900000,99.9,99.900000,99.9,99.9,,
80755,Argentina,ARG,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,...,99.342674,100.0,99.625389,99.849579,100.0,99.989578,100.0,100.0,,
80756,Argentina,ARG,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,,,,,,,...,94.687187,100.0,97.091499,98.836769,100.0,99.871811,100.0,100.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369147,United States,USA,Women who believe a husband is justified in be...,SG.VAW.REFU.ZS,,,,,,,...,,,,,,,,,,
369148,United States,USA,Women who were first married by age 15 (% of w...,SP.M15.2024.FE.ZS,,,,,,,...,,,,,,,,,,
369149,United States,USA,Women who were first married by age 18 (% of w...,SP.M18.2024.FE.ZS,,,,,,,...,,,,,,,,,,
369150,United States,USA,Women's share of population ages 15+ living wi...,SH.DYN.AIDS.FE.ZS,,,,,,,...,23.000000,22.8,22.700000,22.500000,22.4,22.300000,22.3,22.2,22.1,


Next, I'm going to remove the rows that we don't want related to indicators. This essentially means I'm going to be only including the economic indicators that we decide we are going to use as a group to classify countries into their covid responses. 

The economic indicators that we are going to be using for classification come from an article from the World Bank (which is where the dataset is from) and are "Featured Indicators" when looking at the economy of countries, so it's not every single indicator of a country's economy.

The indicators that will be included are: 

<ul>
    <li>GDP (current US&#36;)</li>
    <li>GDP growth (annual %)</li>
    <li>Agriculture, value added (annual % growth)</li>
    <li>Industry, value added (annual % growth)</li>
    <li>Manufacturing, value added (annual % growth)</li>
    <li>Services, value added (annual % growth)</li>
    <li>Final consumption expenditure (annual % growth)</li>
    <li>Gross capital formation (annual % growth)</li>
    <li>Exports of goods and services (annual % growth)</li>
    <li>Imports of goods and services (annual % growth)</li>
    <li>Agriculture, value added (% of GDP)</li>
    <li>Industry, value added (% of GDP)</li>
    <li>Services, value added (% of GDP)</li>
    <li>Final consumption expenditure (% of GDP)</li>
    <li>Gross capital formation (% of GDP)</li>
    <li>Exports of goods and services (% of GDP)</li>
    <li>Imports of goods and services (% of GDP)</li>
</ul>

Luckily the article provides the code of the indicator, so I can just filter by that then remove that column later.

In [4]:
codes = ['NY.GDP.MKTP.CD', 'NY.GDP.MKTP.KD.ZG', 'NV.AGR.TOTL.KD.ZG', 'NV.IND.TOTL.KD.ZG', 'NV.IND.MANF.KD.ZG',
        'NV.SRV.TOTL.KD.ZG', 'NE.CON.TOTL.KD.ZG', 'NE.GDI.TOTL.KD.ZG', 'NE.EXP.GNFS.KD.ZG', 'NE.IMP.GNFS.KD.ZG',
        'NV.AGR.TOTL.ZS', 'NV.IND.TOTL.ZS', 'NV.SRV.TOTL.ZS', 'NE.CON.TOTL.ZS', 'NE.GDI.TOTL.ZS', 'NE.EXP.GNFS.ZS',
        'NE.IMP.GNFS.ZS']

dataset = dataset[dataset['Indicator Code'].isin(codes)]

In [5]:
dataset

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Unnamed: 66
80822,Argentina,ARG,"Agriculture, forestry, and fishing, value adde...",NV.AGR.TOTL.ZS,,,,,,12.904163,...,6.052918,6.712704,5.156686,6.264566,5.231622,4.537879,5.111017,5.932404,6.869856,
80823,Argentina,ARG,"Agriculture, forestry, and fishing, value adde...",NV.AGR.TOTL.KD.ZG,,,,,,,...,11.476620,3.104253,7.541962,-4.716943,3.445065,-14.581522,21.265569,-7.135723,0.295629,
81148,Argentina,ARG,Exports of goods and services (% of GDP),NE.EXP.GNFS.ZS,7.604049,5.994947,4.691843,7.890454,5.563716,6.225874,...,14.617173,14.405479,10.705652,12.527095,11.320283,14.436686,17.695944,16.591817,18.300899,
81149,Argentina,ARG,Exports of goods and services (annual % growth),NE.EXP.GNFS.KD.ZG,,-10.638592,45.238463,0.000000,-8.196661,8.928500,...,-3.520240,-6.979167,-2.777373,5.317354,2.615391,0.645996,9.102187,-17.326848,8.988938,
81175,Argentina,ARG,Final consumption expenditure (% of GDP),NE.CON.TOTL.ZS,79.036479,70.100693,79.246140,82.013374,75.900751,75.497032,...,82.793803,82.332955,84.004217,83.376470,84.436467,85.275279,81.936276,79.051086,76.351268,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
368330,United States,USA,"Industry (including construction), value added...",NV.IND.TOTL.ZS,,,,,,,...,19.241626,19.331515,18.587132,18.042221,18.436601,18.616777,18.295603,18.438755,,
368331,United States,USA,"Industry (including construction), value added...",NV.IND.TOTL.KD.ZG,,,,,,,...,2.389576,2.275465,2.367074,0.608765,3.231599,2.974208,2.378995,0.757656,,
368438,United States,USA,"Manufacturing, value added (annual % growth)",NV.IND.MANF.KD.ZG,,,,,,,...,2.809850,1.771980,1.015219,-0.296072,3.357970,4.072325,1.738677,-0.038291,,
368952,United States,USA,"Services, value added (% of GDP)",NV.SRV.TOTL.ZS,,,,,,,...,75.816054,75.827541,76.742396,77.439693,77.031393,76.739738,77.199503,80.136410,,


Then, I'm going to remove the columns that we don't want. For example, we only care about the years 2020 and 2021 due to that being the target timeframe of our problem - the COVID-19 pandemic. I'm only going to keep Country Name, Indicator Name, 2020, and 2021.

In [6]:
keep = ['Country Name', 'Indicator Name', '2020', '2021']

dataset = dataset[keep]

In [7]:
dataset

Unnamed: 0,Country Name,Indicator Name,2020,2021
80822,Argentina,"Agriculture, forestry, and fishing, value adde...",5.932404,6.869856
80823,Argentina,"Agriculture, forestry, and fishing, value adde...",-7.135723,0.295629
81148,Argentina,Exports of goods and services (% of GDP),16.591817,18.300899
81149,Argentina,Exports of goods and services (annual % growth),-17.326848,8.988938
81175,Argentina,Final consumption expenditure (% of GDP),79.051086,76.351268
...,...,...,...,...
368330,United States,"Industry (including construction), value added...",18.438755,
368331,United States,"Industry (including construction), value added...",0.757656,
368438,United States,"Manufacturing, value added (annual % growth)",-0.038291,
368952,United States,"Services, value added (% of GDP)",80.136410,


Now, I'll combine indicators into one row so that one country has one row of data. Before that though, I'm going to look and see if there is a significant number of 2021 data missing and if just the 2020 data would be more viable.

In [8]:
dataset.isna().sum()

Country Name       0
Indicator Name     0
2020               9
2021              43
dtype: int64

Based on that, I will simply just include the 2020 data.

In [9]:
dataset = dataset.drop('2021', axis = 1)

In [10]:
dataset

Unnamed: 0,Country Name,Indicator Name,2020
80822,Argentina,"Agriculture, forestry, and fishing, value adde...",5.932404
80823,Argentina,"Agriculture, forestry, and fishing, value adde...",-7.135723
81148,Argentina,Exports of goods and services (% of GDP),16.591817
81149,Argentina,Exports of goods and services (annual % growth),-17.326848
81175,Argentina,Final consumption expenditure (% of GDP),79.051086
...,...,...,...
368330,United States,"Industry (including construction), value added...",18.438755
368331,United States,"Industry (including construction), value added...",0.757656
368438,United States,"Manufacturing, value added (annual % growth)",-0.038291
368952,United States,"Services, value added (% of GDP)",80.136410


Now onto merging rows. The final product should look like: "Country Name, Indicator1, Indicator2, " and so on.

In [11]:
dataset_pivot = dataset.pivot(index = 'Country Name', columns = 'Indicator Name', values = '2020')

In [12]:
dataset_pivot

Indicator Name,"Agriculture, forestry, and fishing, value added (% of GDP)","Agriculture, forestry, and fishing, value added (annual % growth)",Exports of goods and services (% of GDP),Exports of goods and services (annual % growth),Final consumption expenditure (% of GDP),Final consumption expenditure (annual % growth),GDP (current US$),GDP growth (annual %),Gross capital formation (% of GDP),Gross capital formation (annual % growth),Imports of goods and services (% of GDP),Imports of goods and services (annual % growth),"Industry (including construction), value added (% of GDP)","Industry (including construction), value added (annual % growth)","Manufacturing, value added (annual % growth)","Services, value added (% of GDP)","Services, value added (annual % growth)"
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Argentina,5.932404,-7.135723,16.591817,-17.326848,79.051086,-12.051155,389591000000.0,-9.895269,14.01973,-11.016885,13.556325,-17.865058,23.311218,-9.369053,-7.76065,54.612674,-10.592109
Australia,2.006092,-10.068572,23.983854,-1.741033,73.802872,-0.416539,1327836000000.0,-0.003837,22.268947,,20.055673,-7.666706,25.462096,0.049462,-1.370961,66.280808,0.643603
Austria,1.099909,-3.060587,51.435008,-10.759174,71.324373,-6.2733,433258500000.0,-6.734514,25.901203,-4.646383,48.563935,-9.376541,25.457853,-5.659447,-6.997076,63.145665,-7.30338
Belgium,0.638362,-6.824667,80.042038,-5.459022,74.395963,-5.796638,521676900000.0,-5.680734,24.197411,-6.928552,78.635413,-5.944309,19.4793,-3.093288,-3.687455,69.588938,-5.888087
Brazil,5.893246,3.753746,16.795068,-1.842905,83.369693,-5.223752,1448566000000.0,-3.878676,15.930622,-4.026255,16.095384,-9.840618,17.701653,-3.400928,-4.357614,62.795211,-4.344701
Bulgaria,3.505993,-3.301805,55.323289,-12.066986,78.191183,1.533585,69889350000.0,-4.38715,20.339766,-5.279527,53.854238,-5.393745,21.923347,-8.218678,,61.253206,-3.338591
Canada,,3.328808,29.363448,-8.692684,79.806948,-4.559653,1645423000000.0,-5.233024,22.25707,-9.818021,31.419082,-10.858232,,-6.310691,-9.685379,,-4.967
China,7.698643,3.132128,18.54106,,55.330442,-0.273954,14687670000000.0,2.239702,43.366674,4.310804,16.04819,,37.842787,2.464846,,54.458561,1.947954
Croatia,3.216507,3.626613,42.024894,-22.725535,82.849174,-2.819567,57203780000.0,-8.099761,23.908878,-2.813907,48.782821,-12.260688,21.155324,-1.523108,-3.863769,59.349699,-8.371669
Cyprus,1.930842,-1.384491,75.780512,-5.106329,83.355085,-0.942781,24692100000.0,-4.984293,19.169241,-11.300714,78.304838,-2.547918,12.606087,-6.383702,-5.435013,74.286844,-4.432524


From here, we can start visualizing and modeling the data.

In [13]:
dataset_pivot.isna().sum()

Indicator Name
Agriculture, forestry, and fishing, value added (% of GDP)           1
Agriculture, forestry, and fishing, value added (annual % growth)    0
Exports of goods and services (% of GDP)                             0
Exports of goods and services (annual % growth)                      1
Final consumption expenditure (% of GDP)                             0
Final consumption expenditure (annual % growth)                      0
GDP (current US$)                                                    0
GDP growth (annual %)                                                0
Gross capital formation (% of GDP)                                   0
Gross capital formation (annual % growth)                            2
Imports of goods and services (% of GDP)                             0
Imports of goods and services (annual % growth)                      1
Industry (including construction), value added (% of GDP)            1
Industry (including construction), value added (annual % growt

In [14]:
dataset_pivot.to_csv('Country_Indicators.csv')

Loading the csv back into notebook to fix the headers

In [15]:
dataset_processed = pd.read_csv('Country_Indicators.csv')

In [16]:
dataset_processed

Unnamed: 0,Country Name,"Agriculture, forestry, and fishing, value added (% of GDP)","Agriculture, forestry, and fishing, value added (annual % growth)",Exports of goods and services (% of GDP),Exports of goods and services (annual % growth),Final consumption expenditure (% of GDP),Final consumption expenditure (annual % growth),GDP (current US$),GDP growth (annual %),Gross capital formation (% of GDP),Gross capital formation (annual % growth),Imports of goods and services (% of GDP),Imports of goods and services (annual % growth),"Industry (including construction), value added (% of GDP)","Industry (including construction), value added (annual % growth)","Manufacturing, value added (annual % growth)","Services, value added (% of GDP)","Services, value added (annual % growth)"
0,Argentina,5.932404,-7.135723,16.591817,-17.326848,79.051086,-12.051155,389591000000.0,-9.895269,14.01973,-11.016885,13.556325,-17.865058,23.311218,-9.369053,-7.76065,54.612674,-10.592109
1,Australia,2.006092,-10.068572,23.983854,-1.741033,73.802872,-0.416539,1327836000000.0,-0.003837,22.268947,,20.055673,-7.666706,25.462096,0.049462,-1.370961,66.280808,0.643603
2,Austria,1.099909,-3.060587,51.435008,-10.759174,71.324373,-6.2733,433258500000.0,-6.734514,25.901203,-4.646383,48.563935,-9.376541,25.457853,-5.659447,-6.997076,63.145665,-7.30338
3,Belgium,0.638362,-6.824667,80.042038,-5.459022,74.395963,-5.796638,521676900000.0,-5.680734,24.197411,-6.928552,78.635413,-5.944309,19.4793,-3.093288,-3.687455,69.588938,-5.888087
4,Brazil,5.893246,3.753746,16.795068,-1.842905,83.369693,-5.223752,1448566000000.0,-3.878676,15.930622,-4.026255,16.095384,-9.840618,17.701653,-3.400928,-4.357614,62.795211,-4.344701
5,Bulgaria,3.505993,-3.301805,55.323289,-12.066986,78.191183,1.533585,69889350000.0,-4.38715,20.339766,-5.279527,53.854238,-5.393745,21.923347,-8.218678,,61.253206,-3.338591
6,Canada,,3.328808,29.363448,-8.692684,79.806948,-4.559653,1645423000000.0,-5.233024,22.25707,-9.818021,31.419082,-10.858232,,-6.310691,-9.685379,,-4.967
7,China,7.698643,3.132128,18.54106,,55.330442,-0.273954,14687670000000.0,2.239702,43.366674,4.310804,16.04819,,37.842787,2.464846,,54.458561,1.947954
8,Croatia,3.216507,3.626613,42.024894,-22.725535,82.849174,-2.819567,57203780000.0,-8.099761,23.908878,-2.813907,48.782821,-12.260688,21.155324,-1.523108,-3.863769,59.349699,-8.371669
9,Cyprus,1.930842,-1.384491,75.780512,-5.106329,83.355085,-0.942781,24692100000.0,-4.984293,19.169241,-11.300714,78.304838,-2.547918,12.606087,-6.383702,-5.435013,74.286844,-4.432524


Now to create a final column on the data to give it the classification number for our modeling phase.

The way we are defining a country's COVID respond will be based on the COVID-19 Stringency Index, created from the project "Oxford Coronavirus Government Response Tracker (OxCGRT)" that takes nine metrics to calculate a score from 0-100 on government strictness during the COVID-19 pandemic. These metrics include things such as: restrictions on public gatherings, school closures, workplace closures, and so on. The webpage I'm getting the data from is <a href="https://ourworldindata.org/metrics-explained-covid19-stringency-index">here</a> but the source will also be below.

Using the tool embedded into the webpage to provide the Stringency Index, I set the starting time as Jan 1st, 2020, and the ending time at Dec 31, 2020, to only include measures during 2020, as that is the time that our economic data is covering. 

From there, we wanted distinct categories for the classification modeling section of our project, so I converted the 0-100 scale into three different groups.

A score of 0-50 will give a 1, meaning that the country's government responded the least out of the three groups.

A score of 51-75 will give a 2, meaning that there were more government policies than the first group, but less than the third group.

And finally, a score of 76-100 will give a 3, meaning that these countries had the most policies compared to the other two groups.

For reference, the score is calculated by government policies, not how well they worked or how hard a country was hit with infection. This could lead to some interesting surprises in countries that could possibly be explained for other reasons. One that surprised me was Japan with a low score of 45.37, but it could be explained by the fact that they shut their borders and were able to isolate from the world better due to their geography as an island nation. Another island nation in our group, Malta, had a close score of 52.78, which might lead to a potential pattern, but looking at other islands like Cyprus and Australia, which had scores in the 70s, shows that not every island behaved the same. (Even looking at New Zealand, which didn't make the cut in our dataset, the country had a score of 22.22!)

I decided to group the classification groups that way due to the fact that there aren't many countries in our selected group that faltered less than 50, and most were in the 50-80 area. I didn't want them all to be in the same group, as that would defeat the purpose of our classification.

Now, to implement this data and add it to our dataframe. For reference, I'm just reading the numbers off of the webpage and manually adding them as the data download they provide uses the raw metrics that the tool on the webpage uses to caluclate the index.

In [17]:
index_classification = np.array([3, 2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 3, 3, 1, 2, 
                        3, 2, 2, 3, 3, 3, 3, 1, 2, 2, 3, 1, 3, 2, 3, 3, 2])

In [18]:
len(index_classification)

40

In [19]:
dataset_processed['Response Group'] = index_classification.tolist()

In [20]:
dataset_processed

Unnamed: 0,Country Name,"Agriculture, forestry, and fishing, value added (% of GDP)","Agriculture, forestry, and fishing, value added (annual % growth)",Exports of goods and services (% of GDP),Exports of goods and services (annual % growth),Final consumption expenditure (% of GDP),Final consumption expenditure (annual % growth),GDP (current US$),GDP growth (annual %),Gross capital formation (% of GDP),Gross capital formation (annual % growth),Imports of goods and services (% of GDP),Imports of goods and services (annual % growth),"Industry (including construction), value added (% of GDP)","Industry (including construction), value added (annual % growth)","Manufacturing, value added (annual % growth)","Services, value added (% of GDP)","Services, value added (annual % growth)",Response Group
0,Argentina,5.932404,-7.135723,16.591817,-17.326848,79.051086,-12.051155,389591000000.0,-9.895269,14.01973,-11.016885,13.556325,-17.865058,23.311218,-9.369053,-7.76065,54.612674,-10.592109,3
1,Australia,2.006092,-10.068572,23.983854,-1.741033,73.802872,-0.416539,1327836000000.0,-0.003837,22.268947,,20.055673,-7.666706,25.462096,0.049462,-1.370961,66.280808,0.643603,2
2,Austria,1.099909,-3.060587,51.435008,-10.759174,71.324373,-6.2733,433258500000.0,-6.734514,25.901203,-4.646383,48.563935,-9.376541,25.457853,-5.659447,-6.997076,63.145665,-7.30338,3
3,Belgium,0.638362,-6.824667,80.042038,-5.459022,74.395963,-5.796638,521676900000.0,-5.680734,24.197411,-6.928552,78.635413,-5.944309,19.4793,-3.093288,-3.687455,69.588938,-5.888087,2
4,Brazil,5.893246,3.753746,16.795068,-1.842905,83.369693,-5.223752,1448566000000.0,-3.878676,15.930622,-4.026255,16.095384,-9.840618,17.701653,-3.400928,-4.357614,62.795211,-4.344701,2
5,Bulgaria,3.505993,-3.301805,55.323289,-12.066986,78.191183,1.533585,69889350000.0,-4.38715,20.339766,-5.279527,53.854238,-5.393745,21.923347,-8.218678,,61.253206,-3.338591,2
6,Canada,,3.328808,29.363448,-8.692684,79.806948,-4.559653,1645423000000.0,-5.233024,22.25707,-9.818021,31.419082,-10.858232,,-6.310691,-9.685379,,-4.967,2
7,China,7.698643,3.132128,18.54106,,55.330442,-0.273954,14687670000000.0,2.239702,43.366674,4.310804,16.04819,,37.842787,2.464846,,54.458561,1.947954,3
8,Croatia,3.216507,3.626613,42.024894,-22.725535,82.849174,-2.819567,57203780000.0,-8.099761,23.908878,-2.813907,48.782821,-12.260688,21.155324,-1.523108,-3.863769,59.349699,-8.371669,2
9,Cyprus,1.930842,-1.384491,75.780512,-5.106329,83.355085,-0.942781,24692100000.0,-4.984293,19.169241,-11.300714,78.304838,-2.547918,12.606087,-6.383702,-5.435013,74.286844,-4.432524,2


Now to save this csv again, and send it to the modeling and visualization phases.

In [21]:
dataset_processed.to_csv('Country_Indicators_Response_Groups.csv')

## Sources

<ul>
    <li>https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators</li>
    <li>https://ourworldindata.org/metrics-explained-covid19-stringency-index</li>
</ul>