# Country Comparison Australia - New Zealand

In the following the data of Australia and New Zealand will be cleaned, preprocessed and compared. We chose these two countries for its geographical proximity, historical, political and economic ties as well as for our domain knowledge of the area. Moreover, the data we see in our Master's program often focuses on the US an Europe, therefore we decided to chang our focus on another important part of the world. 

Hence, we obtained the complete SDG (Sustainable Development Goals) dataset for both countries for the period of 2006 to 2020 from the World Bank Database.  

## Data Preprocessing

This step is necessary because of the high prevalence of missing data. By the end of this script a new dataset with cleaned data will be create whose variables will be used for visual analysis in Tableau.

#### Importing Libraries and Dataset

In [2]:
import pandas as pd
import numpy as np

In [3]:
raw_csv_data = pd.read_csv("AUS_NZ_Complete_2006-2020.csv")

raw_csv_data

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
0,Australia,AUS,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,..,..,..,..
1,Australia,AUS,Access to electricity (% of population),EG.ELC.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
2,Australia,AUS,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
3,Australia,AUS,"Access to electricity, urban (% of urban popul...",EG.ELC.ACCS.UR.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
4,Australia,AUS,Account ownership at a financial institution o...,FX.OWN.TOTL.ZS,..,..,..,..,..,99.06484222,..,..,98.85956573,..,..,99.51937103,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2881,New Zealand,NZL,Women who believe a husband is justified in be...,SG.VAW.REFU.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2882,New Zealand,NZL,Women who were first married by age 15 (% of w...,SP.M15.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2883,New Zealand,NZL,Women who were first married by age 18 (% of w...,SP.M18.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2884,New Zealand,NZL,Women's share of population ages 15+ living wi...,SH.DYN.AIDS.FE.ZS,19.1,19.2,19.2,19.2,19.1,18.9,18.7,18.5,18.2,17.8,17.5,17.1,16.6,16.3,15.9


#### Restructuring Dataset

In [4]:
#Create subsets of data for each country

raw_csv_AUS = raw_csv_data[raw_csv_data["Country Code"]=="AUS"]

raw_csv_NZL = raw_csv_data[raw_csv_data["Country Code"]=="NZL"]

In [5]:
raw_csv_AUS

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
0,Australia,AUS,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,..,..,..,..
1,Australia,AUS,Access to electricity (% of population),EG.ELC.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
2,Australia,AUS,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
3,Australia,AUS,"Access to electricity, urban (% of urban popul...",EG.ELC.ACCS.UR.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
4,Australia,AUS,Account ownership at a financial institution o...,FX.OWN.TOTL.ZS,..,..,..,..,..,99.06484222,..,..,98.85956573,..,..,99.51937103,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1438,Australia,AUS,Women who believe a husband is justified in be...,SG.VAW.REFU.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1439,Australia,AUS,Women who were first married by age 15 (% of w...,SP.M15.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1440,Australia,AUS,Women who were first married by age 18 (% of w...,SP.M18.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1441,Australia,AUS,Women's share of population ages 15+ living wi...,SH.DYN.AIDS.FE.ZS,11.7,11.9,12.1,12.3,12.6,12.7,12.9,13.1,13.2,13.3,13.4,13.5,13.6,13.7,13.8


In [6]:
raw_csv_NZL

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
1443,New Zealand,NZL,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,..,..,..,..
1444,New Zealand,NZL,Access to electricity (% of population),EG.ELC.ACCS.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1445,New Zealand,NZL,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1446,New Zealand,NZL,"Access to electricity, urban (% of urban popul...",EG.ELC.ACCS.UR.ZS,100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1447,New Zealand,NZL,Account ownership at a financial institution o...,FX.OWN.TOTL.ZS,..,..,..,..,..,99.43672943,..,..,99.52553558,..,..,99.17844391,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2881,New Zealand,NZL,Women who believe a husband is justified in be...,SG.VAW.REFU.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2882,New Zealand,NZL,Women who were first married by age 15 (% of w...,SP.M15.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2883,New Zealand,NZL,Women who were first married by age 18 (% of w...,SP.M18.2024.FE.ZS,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2884,New Zealand,NZL,Women's share of population ages 15+ living wi...,SH.DYN.AIDS.FE.ZS,19.1,19.2,19.2,19.2,19.1,18.9,18.7,18.5,18.2,17.8,17.5,17.1,16.6,16.3,15.9


In [7]:
#drop columns: Country Name, Country Code, Series Code  

raw_csv_AUS.drop(["Country Name", "Country Code", "Series Code"], axis=1, inplace=True)

raw_csv_NZL.drop(["Country Name", "Country Code", "Series Code"], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [8]:
raw_csv_AUS

Unnamed: 0,Series Name,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
0,Access to clean fuels and technologies for coo...,100,100,100,100,100,100,100,100,100,100,100,..,..,..,..
1,Access to electricity (% of population),100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
2,"Access to electricity, rural (% of rural popul...",100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
3,"Access to electricity, urban (% of urban popul...",100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
4,Account ownership at a financial institution o...,..,..,..,..,..,99.06484222,..,..,98.85956573,..,..,99.51937103,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1438,Women who believe a husband is justified in be...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1439,Women who were first married by age 15 (% of w...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1440,Women who were first married by age 18 (% of w...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
1441,Women's share of population ages 15+ living wi...,11.7,11.9,12.1,12.3,12.6,12.7,12.9,13.1,13.2,13.3,13.4,13.5,13.6,13.7,13.8


In [9]:
raw_csv_NZL

Unnamed: 0,Series Name,2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
1443,Access to clean fuels and technologies for coo...,100,100,100,100,100,100,100,100,100,100,100,..,..,..,..
1444,Access to electricity (% of population),100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1445,"Access to electricity, rural (% of rural popul...",100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1446,"Access to electricity, urban (% of urban popul...",100,100,100,100,100,100,100,100,100,100,100,100,100,100,..
1447,Account ownership at a financial institution o...,..,..,..,..,..,99.43672943,..,..,99.52553558,..,..,99.17844391,..,..,..
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2881,Women who believe a husband is justified in be...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2882,Women who were first married by age 15 (% of w...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2883,Women who were first married by age 18 (% of w...,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..
2884,Women's share of population ages 15+ living wi...,19.1,19.2,19.2,19.2,19.1,18.9,18.7,18.5,18.2,17.8,17.5,17.1,16.6,16.3,15.9


In [10]:
#transpose data 

transposed_csv_AUS = raw_csv_AUS.transpose().reset_index()
transposed_csv_AUS.columns = np.arange(len(transposed_csv_AUS.columns))

transposed_csv_NZL = raw_csv_NZL.transpose().reset_index()
transposed_csv_NZL.columns = np.arange(len(transposed_csv_NZL.columns))

In [11]:
#rename column headers with first row & change series name to year

transposed_csv_AUS.rename(columns=transposed_csv_AUS.iloc[0], inplace = True)
transposed_csv_AUS.drop(index=transposed_csv_AUS.index[0], axis=0, inplace=True)

transposed_csv_NZL.rename(columns=transposed_csv_NZL.iloc[0], inplace = True)
transposed_csv_NZL.drop(index=transposed_csv_NZL.index[0], axis=0, inplace=True)

transposed_csv_AUS.rename(columns={"Series Name": "Year"}, inplace=True)
transposed_csv_NZL.rename(columns={"Series Name": "Year"}, inplace=True)

In [12]:
transposed_csv_AUS

Unnamed: 0,Year,Access to clean fuels and technologies for cooking (% of population),Access to electricity (% of population),"Access to electricity, rural (% of rural population)","Access to electricity, urban (% of urban population)",Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+)","Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+)",...,Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006 [YR2006],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,11.7,100
2,2007 [YR2007],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,11.9,100
3,2008 [YR2008],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.1,100
4,2009 [YR2009],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.3,100
5,2010 [YR2010],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.6,100
6,2011 [YR2011],100,100,100,100,99.06484222,98.59228516,99.59469604,99.11392212,97.95354462,...,..,..,..,..,..,..,..,..,12.7,100
7,2012 [YR2012],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.9,100
8,2013 [YR2013],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,13.1,100
9,2014 [YR2014],100,100,100,100,98.85956573,99.03623199,98.670578,99.61399078,100,...,..,..,..,..,..,..,..,..,13.2,100
10,2015 [YR2015],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,13.3,100


In [13]:
transposed_csv_NZL

Unnamed: 0,Year,Access to clean fuels and technologies for cooking (% of population),Access to electricity (% of population),"Access to electricity, rural (% of rural population)","Access to electricity, urban (% of urban population)",Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+)","Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+)",...,Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006 [YR2006],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.1,100
2,2007 [YR2007],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
3,2008 [YR2008],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
4,2009 [YR2009],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
5,2010 [YR2010],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.1,100
6,2011 [YR2011],100,100,100,100,99.43672943,99.44830322,99.42394257,99.60158539,99.37693787,...,..,..,..,..,..,..,..,..,18.9,100
7,2012 [YR2012],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,18.7,100
8,2013 [YR2013],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,18.5,100
9,2014 [YR2014],100,100,100,100,99.52553558,99.22291565,99.86442566,99.60870361,99.36186981,...,..,..,..,..,..,..,..,..,18.2,100
10,2015 [YR2015],100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,17.8,100


In [14]:
#Clean year column data

type(transposed_csv_AUS["Year"][1])

transposed_csv_AUS["Year"][1][:4]

'2006'

In [15]:
transposed_csv_AUS.Year=transposed_csv_AUS.Year.str[:4]

transposed_csv_NZL.Year=transposed_csv_NZL.Year.str[:4]

In [16]:
transposed_csv_AUS

Unnamed: 0,Year,Access to clean fuels and technologies for cooking (% of population),Access to electricity (% of population),"Access to electricity, rural (% of rural population)","Access to electricity, urban (% of urban population)",Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+)","Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+)",...,Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,11.7,100
2,2007,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,11.9,100
3,2008,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.1,100
4,2009,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.3,100
5,2010,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.6,100
6,2011,100,100,100,100,99.06484222,98.59228516,99.59469604,99.11392212,97.95354462,...,..,..,..,..,..,..,..,..,12.7,100
7,2012,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,12.9,100
8,2013,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,13.1,100
9,2014,100,100,100,100,98.85956573,99.03623199,98.670578,99.61399078,100,...,..,..,..,..,..,..,..,..,13.2,100
10,2015,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,13.3,100


In [17]:
transposed_csv_NZL

Unnamed: 0,Year,Access to clean fuels and technologies for cooking (% of population),Access to electricity (% of population),"Access to electricity, rural (% of rural population)","Access to electricity, urban (% of urban population)",Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+)","Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+)","Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+)",...,Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.1,100
2,2007,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
3,2008,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
4,2009,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.2,100
5,2010,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,19.1,100
6,2011,100,100,100,100,99.43672943,99.44830322,99.42394257,99.60158539,99.37693787,...,..,..,..,..,..,..,..,..,18.9,100
7,2012,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,18.7,100
8,2013,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,18.5,100
9,2014,100,100,100,100,99.52553558,99.22291565,99.86442566,99.60870361,99.36186981,...,..,..,..,..,..,..,..,..,18.2,100
10,2015,100,100,100,100,..,..,..,..,..,...,..,..,..,..,..,..,..,..,17.8,100


In [18]:
#Replace ".." with Nan

transposed_csv_AUS=transposed_csv_AUS.replace('..', np.nan)

transposed_csv_NZL=transposed_csv_NZL.replace('..', np.nan)

#### Data Exploration

In [19]:
#dataset checkpoint

AUS_structured = transposed_csv_AUS.copy()

NZL_structured = transposed_csv_NZL.copy()

In [20]:
AUS_structured.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15 entries, 1 to 15
Columns: 1444 entries, Year to Young people (ages 15-24) newly infected with HIV
dtypes: float64(389), object(1055)
memory usage: 169.9+ KB


In [21]:
NZL_structured.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 15 entries, 1 to 15
Columns: 1444 entries, Year to Young people (ages 15-24) newly infected with HIV
dtypes: float64(402), object(1042)
memory usage: 169.3+ KB


In [22]:
AUS_structured.describe()

Unnamed: 0,Adequacy of social insurance programs (% of total welfare of beneficiary households),Adequacy of social protection and labor programs (% of total welfare of beneficiary households),Adequacy of social safety net programs (% of total welfare of beneficiary households),Adequacy of unemployment benefits and ALMP (% of total welfare of beneficiary households),Adjusted net national income (annual % growth),Adjusted net national income per capita (annual % growth),"Agricultural machinery, tractors","Agricultural machinery, tractors per 100 sq. km of arable land","Annualized average growth rate in per capita real survey mean consumption or income, bottom 40% of population (%)","Annualized average growth rate in per capita real survey mean consumption or income, total population (%)",...,"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)","Women participating in the three decisions (own health care, major household purchases, and visiting family) (% of women age 15-49)",Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24)
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,,,,,,,,,,,...,,,,,,,,,,
std,,,,,,,,,,,...,,,,,,,,,,
min,,,,,,,,,,,...,,,,,,,,,,
25%,,,,,,,,,,,...,,,,,,,,,,
50%,,,,,,,,,,,...,,,,,,,,,,
75%,,,,,,,,,,,...,,,,,,,,,,
max,,,,,,,,,,,...,,,,,,,,,,


In [23]:
NZL_structured.describe()

Unnamed: 0,Adequacy of social insurance programs (% of total welfare of beneficiary households),Adequacy of social protection and labor programs (% of total welfare of beneficiary households),Adequacy of social safety net programs (% of total welfare of beneficiary households),Adequacy of unemployment benefits and ALMP (% of total welfare of beneficiary households),"Agricultural machinery, tractors","Agricultural machinery, tractors per 100 sq. km of arable land","Annualized average growth rate in per capita real survey mean consumption or income, bottom 40% of population (%)","Annualized average growth rate in per capita real survey mean consumption or income, total population (%)",Antiretroviral therapy coverage for PMTCT (% of pregnant women living with HIV),ARI treatment (% of children under 5 taken to a health provider),...,"Women making their own informed decisions regarding sexual relations, contraceptive use and reproductive health care (% of women age 15-49)","Women participating in the three decisions (own health care, major household purchases, and visiting family) (% of women age 15-49)",Women who believe a husband is justified in beating his wife (any of five reasons) (%),Women who believe a husband is justified in beating his wife when she argues with him (%),Women who believe a husband is justified in beating his wife when she burns the food (%),Women who believe a husband is justified in beating his wife when she goes out without telling him (%),Women who believe a husband is justified in beating his wife when she neglects the children (%),Women who believe a husband is justified in beating his wife when she refuses sex with him (%),Women who were first married by age 15 (% of women ages 20-24),Women who were first married by age 18 (% of women ages 20-24)
count,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,,,,,,,,,,,...,,,,,,,,,,
std,,,,,,,,,,,...,,,,,,,,,,
min,,,,,,,,,,,...,,,,,,,,,,
25%,,,,,,,,,,,...,,,,,,,,,,
50%,,,,,,,,,,,...,,,,,,,,,,
75%,,,,,,,,,,,...,,,,,,,,,,
max,,,,,,,,,,,...,,,,,,,,,,


In [24]:
# select non numeric columns AUS
al_non_numeric = AUS_structured.select_dtypes(exclude=[np.number])
non_numeric_cols = al_non_numeric.columns.values
print(non_numeric_cols)

['Year'
 'Access to clean fuels and technologies for cooking (% of population)'
 'Access to electricity (% of population)' ...
 'Women Business and the Law Index Score (scale 1-100)'
 "Women's share of population ages 15+ living with HIV (%)"
 'Young people (ages 15-24) newly infected with HIV']


In [25]:
# select non numeric columns NZL
al_non_numeric = NZL_structured.select_dtypes(exclude=[np.number])
non_numeric_cols = al_non_numeric.columns.values
print(non_numeric_cols)

['Year'
 'Access to clean fuels and technologies for cooking (% of population)'
 'Access to electricity (% of population)' ...
 'Women Business and the Law Index Score (scale 1-100)'
 "Women's share of population ages 15+ living with HIV (%)"
 'Young people (ages 15-24) newly infected with HIV']


In [26]:
# % of missing AUS
for col in AUS_structured.columns:
    pct_missing = np.mean(AUS_structured[col].isnull())
    print('{} - {}%'.format(col, round(pct_missing*100)))

Year - 0%
Access to clean fuels and technologies for cooking (% of population) - 27%
Access to electricity (% of population) - 7%
Access to electricity, rural (% of rural population) - 7%
Access to electricity, urban (% of urban population) - 7%
Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, primary education or less (% of popul

Food production index (2014-2016 = 100) - 13%
Food, beverages and tobacco (% of value added in manufacturing) - 13%
Foreign direct investment, net (BoP, current US$) - 0%
Foreign direct investment, net inflows (% of GDP) - 7%
Foreign direct investment, net inflows (BoP, current US$) - 7%
Foreign direct investment, net outflows (% of GDP) - 0%
Foreign direct investment, net outflows (BoP, current US$) - 0%
Forest area (% of land area) - 0%
Forest area (sq. km) - 0%
Forest rents (% of GDP) - 7%
Fossil fuel energy consumption (% of total) - 33%
Fuel exports (% of merchandise exports) - 7%
Fuel imports (% of merchandise imports) - 7%
GDP (constant 2015 US$) - 0%
GDP (constant LCU) - 0%
GDP (current LCU) - 0%
GDP (current US$) - 0%
GDP deflator (base year varies by country) - 0%
GDP deflator: linked series (base year varies by country) - 0%
GDP growth (annual %) - 0%
GDP per capita (constant 2015 US$) - 0%
GDP per capita (constant LCU) - 0%
GDP per capita (current LCU) - 0%
GDP per capita (

Insurance and financial services (% of commercial service exports) - 0%
Insurance and financial services (% of commercial service imports) - 0%
Insurance and financial services (% of service exports, BoP) - 0%
Insurance and financial services (% of service imports, BoP) - 0%
Intentional homicides (per 100,000 people) - 13%
Intentional homicides, female (per 100,000 female) - 13%
Intentional homicides, male (per 100,000 male) - 13%
Interest payments (% of expense) - 7%
Interest payments (% of revenue) - 7%
Interest payments (current LCU) - 7%
Interest rate spread (lending rate minus deposit rate, %) - 7%
Internally displaced persons, new displacement associated with conflict and violence (number of cases) - 100%
Internally displaced persons, new displacement associated with disasters (number of cases) - 13%
Internally displaced persons, total displaced by conflict and violence (number of people) - 100%
International migrant stock (% of population) - 87%
International migrant stock, tota

Population ages 40-44, female (% of female population) - 0%
Population ages 40-44, male (% of male population) - 0%
Population ages 45-49, female (% of female population) - 0%
Population ages 45-49, male (% of male population) - 0%
Population ages 50-54, female (% of female population) - 0%
Population ages 50-54, male (% of male population) - 0%
Population ages 55-59, female (% of female population) - 0%
Population ages 55-59, male (% of male population) - 0%
Population ages 60-64, female (% of female population) - 0%
Population ages 60-64, male (% of male population) - 0%
Population ages 65 and above (% of total population) - 0%
Population ages 65 and above, female - 0%
Population ages 65 and above, female (% of female population) - 0%
Population ages 65 and above, male - 0%
Population ages 65 and above, male (% of male population) - 0%
Population ages 65 and above, total - 0%
Population ages 65-69, female (% of female population) - 0%
Population ages 65-69, male (% of male population

School enrollment, tertiary (gross), gender parity index (GPI) - 67%
School enrollment, tertiary, female (% gross) - 67%
School enrollment, tertiary, male (% gross) - 67%
Scientific and technical journal articles - 13%
Secondary education, duration (years) - 0%
Secondary education, general pupils - 20%
Secondary education, general pupils (% female) - 20%
Secondary education, pupils - 67%
Secondary education, pupils (% female) - 80%
Secondary education, teachers - 100%
Secondary education, teachers (% female) - 100%
Secondary education, teachers, female - 100%
Secondary education, vocational pupils - 80%
Secondary education, vocational pupils (% female) - 80%
Secondary income receipts (BoP, current US$) - 0%
Secondary income, other sectors, payments (BoP, current US$) - 0%
Secure Internet servers - 27%
Secure Internet servers (per 1 million people) - 27%
Self-employed, female (% of female employment) (modeled ILO estimate) - 7%
Self-employed, male (% of male employment) (modeled ILO est

In [27]:
# % of missing NZL
for col in NZL_structured.columns:
    pct_missing = np.mean(NZL_structured[col].isnull())
    print('{} - {}%'.format(col, round(pct_missing*100)))

Year - 0%
Access to clean fuels and technologies for cooking (% of population) - 27%
Access to electricity (% of population) - 7%
Access to electricity, rural (% of rural population) - 7%
Access to electricity, urban (% of urban population) - 7%
Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, poorest 40% (% of population ages 15+) - 80%
Account ownership at a financial institution or with a mobile-money-service provider, primary education or less (% of popul

Child employment in agriculture, male (% of male economically active children ages 7-14) - 100%
Child employment in manufacturing (% of economically active children ages 7-14) - 100%
Child employment in manufacturing, female (% of female economically active children ages 7-14) - 100%
Child employment in manufacturing, male (% of male economically active children ages 7-14) - 100%
Child employment in services (% of economically active children ages 7-14) - 100%
Child employment in services, female (% of female economically active children ages 7-14) - 100%
Child employment in services, male (% of male economically active children ages 7-14) - 100%
Children (0-14) living with HIV - 100%
Children (ages 0-14) newly infected with HIV - 100%
Children in employment, female (% of female children ages 7-14) - 100%
Children in employment, male (% of male children ages 7-14) - 100%
Children in employment, self-employed (% of children in employment, ages 7-14) - 100%
Children in employment, self-e

Electricity production from hydroelectric sources (% of total) - 33%
Electricity production from natural gas sources (% of total) - 33%
Electricity production from nuclear sources (% of total) - 33%
Electricity production from oil sources (% of total) - 33%
Electricity production from oil, gas and coal sources (% of total) - 33%
Electricity production from renewable sources, excluding hydroelectric (% of total) - 33%
Electricity production from renewable sources, excluding hydroelectric (kWh) - 33%
Employers, female (% of female employment) (modeled ILO estimate) - 7%
Employers, male (% of male employment) (modeled ILO estimate) - 7%
Employers, total (% of total employment) (modeled ILO estimate) - 7%
Employment in agriculture (% of total employment) (modeled ILO estimate) - 7%
Employment in agriculture, female (% of female employment) (modeled ILO estimate) - 7%
Employment in agriculture, male (% of male employment) (modeled ILO estimate) - 7%
Employment in industry (% of total employ

Gross fixed capital formation (% of GDP) - 7%
Gross fixed capital formation (annual % growth) - 7%
Gross fixed capital formation (constant 2015 US$) - 7%
Gross fixed capital formation (constant LCU) - 7%
Gross fixed capital formation (current LCU) - 7%
Gross fixed capital formation (current US$) - 7%
Gross fixed capital formation, private sector (% of GDP) - 100%
Gross fixed capital formation, private sector (current LCU) - 100%
Gross intake ratio in first grade of primary education, female (% of relevant age group) - 67%
Gross intake ratio in first grade of primary education, male (% of relevant age group) - 67%
Gross intake ratio in first grade of primary education, total (% of relevant age group) - 67%
Gross national expenditure (% of GDP) - 7%
Gross national expenditure (constant 2015 US$) - 7%
Gross national expenditure (constant LCU) - 7%
Gross national expenditure (current LCU) - 7%
Gross national expenditure (current US$) - 7%
Gross national expenditure deflator (base year vari

Mortality rate, infant (per 1,000 live births) - 7%
Mortality rate, infant, female (per 1,000 live births) - 7%
Mortality rate, infant, male (per 1,000 live births) - 7%
Mortality rate, neonatal (per 1,000 live births) - 7%
Mortality rate, under-5 (per 1,000 live births) - 7%
Mortality rate, under-5, female (per 1,000 live births) - 7%
Mortality rate, under-5, male (per 1,000 live births) - 7%
Multidimensional poverty headcount ratio (% of total population) - 100%
Multidimensional poverty headcount ratio, children (% of population ages 0-17) - 100%
Multidimensional poverty headcount ratio, female (% of female population) - 100%
Multidimensional poverty headcount ratio, household (% of total households) - 100%
Multidimensional poverty headcount ratio, male (% of male population) - 100%
Multidimensional poverty index (scale 0-1) - 100%
Multidimensional poverty index, children (population ages 0-17) (scale 0-1) - 100%
Multidimensional poverty intensity (average share of deprivations exper

Poverty headcount ratio at national poverty lines (% of population) - 100%
Power outages in firms in a typical month (number) - 100%
PPG, bonds (NFL, current US$) - 100%
PPG, commercial banks (NFL, current US$) - 100%
PPG, IBRD (DOD, current US$) - 100%
PPG, IDA (DOD, current US$) - 100%
PPG, official creditors (NFL, current US$) - 100%
PPG, other private creditors (NFL, current US$) - 100%
PPG, private creditors (NFL, current US$) - 100%
PPP conversion factor, GDP (LCU per international $) - 0%
PPP conversion factor, private consumption (LCU per international $) - 0%
Pregnant women receiving prenatal care (%) - 100%
Preprimary education, duration (years) - 0%
Present value of external debt (% of exports of goods, services and primary income) - 100%
Present value of external debt (% of GNI) - 100%
Present value of external debt (current US$) - 100%
Prevalence of anemia among children (% of children ages 6-59 months) - 7%
Prevalence of anemia among non-pregnant women (% of women ages 15

Total greenhouse gas emissions (% change from 1990) - 53%
Total greenhouse gas emissions (kt of CO2 equivalent) - 13%
Total natural resources rents (% of GDP) - 7%
Total reserves (% of total external debt) - 100%
Total reserves (includes gold, current US$) - 0%
Total reserves in months of imports - 0%
Total reserves minus gold (current US$) - 0%
Total tax and contribution rate (% of profit) - 7%
Trade (% of GDP) - 7%
Trade in services (% of GDP) - 0%
Trademark applications, direct nonresident - 7%
Trademark applications, direct resident - 7%
Trademark applications, nonresident, by count - 7%
Trademark applications, resident, by count - 7%
Trademark applications, total - 7%
Trained teachers in lower secondary education (% of total teachers) - 100%
Trained teachers in lower secondary education, female (% of female teachers) - 100%
Trained teachers in lower secondary education, male (% of male teachers) - 100%
Trained teachers in preprimary education (% of total teachers) - 100%
Trained t

#### Data Cleaning

In [28]:
for col in AUS_structured.columns:
    if round((np.mean(AUS_structured[col].isnull()))*100) > 0:
        AUS_structured.drop(col, axis=1, inplace=True)
    else:
        pass

In [29]:
AUS_structured

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,Age dependency ratio (% of working-age population),"Age dependency ratio, old (% of working-age population)","Age dependency ratio, young (% of working-age population)","Agriculture, forestry, and fishing, value added (% of GDP)","Agriculture, forestry, and fishing, value added (annual % growth)","Agriculture, forestry, and fishing, value added (constant 2015 US$)","Agriculture, forestry, and fishing, value added (constant LCU)",...,"Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)",Urban population,Urban population (% of total population),Urban population growth (annual %),Women Business and the Law Index Score (scale 1-100),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006,820,1000,48.18525523,19.15531705,29.02993818,2.726734442,2.857285829,28245301766,41110000000,...,4.78,9.47,10.55,10.03,17531121,84.7,1.614637068,81.25,11.7,100
2,2007,880,1000,47.89079729,19.19907657,28.69172073,2.201100136,-15.04013622,23997169905,34927000000,...,4.38,9.25,9.48,9.37,17666387,84.822,0.768615016,81.25,11.9,100
3,2008,930,1000,47.74597554,19.29376434,28.45221121,2.338630424,8.09402468,25939506760,37754000000,...,4.23,8.6,9.02,8.82,18049708,84.943,2.146571279,81.25,12.1,100
4,2009,970,1000,47.77266022,19.48479413,28.28786609,2.291720397,17.08163373,30370398297,44203000000,...,5.56,10.35,12.48,11.46,18451611,85.063,2.202217639,83.75,12.3,100
5,2010,1000,1000,47.94845437,19.79040532,28.15804905,2.197568265,-0.748817954,30142979302,43872000000,...,5.21,11.13,11.95,11.55,18767085,85.182,1.695285378,83.75,12.6,100
6,2011,1000,1000,48.51375386,20.24288155,28.27087231,2.276542366,3.46690372,31188007373,45393000000,...,5.08,10.78,11.94,11.38,19056040,85.3,1.527957416,91.25,12.7,100
7,2012,1000,1000,49.06719336,20.7654893,28.30170406,2.254748049,0.973718415,31491690744,45835000000,...,5.22,10.96,12.46,11.73,19414834,85.402,1.865330137,91.25,12.9,100
8,2013,1000,1000,49.64057799,21.32775898,28.31281901,2.278906495,-0.728700775,31262210549,45501000000,...,5.66,11.36,13.04,12.22,19775013,85.502,1.838175799,93.75,13.1,100
9,2014,1000,1000,50.25970006,21.8858011,28.37389896,2.216223655,1.14502978,31620172170,46022000000,...,6.08,12.47,14.11,13.31,20095657,85.602,1.608455102,96.875,13.2,100
10,2015,1000,1000,50.92505621,22.41932738,28.50572883,2.372644114,1.425405241,32070887761,46678000000,...,6.05,11.91,14.24,13.11,20410546,85.701,1.554800599,96.875,13.3,100


In [30]:
for col in NZL_structured.columns:
    if round((np.mean(NZL_structured[col].isnull()))*100) > 0:
        NZL_structured.drop(col, axis=1, inplace=True)
    else:
        pass

In [31]:
NZL_structured

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,Age dependency ratio (% of working-age population),"Age dependency ratio, old (% of working-age population)","Age dependency ratio, young (% of working-age population)",Agricultural raw materials exports (% of merchandise exports),Agricultural raw materials imports (% of merchandise imports),Antiretroviral therapy coverage (% of people living with HIV),"Automated teller machines (ATMs) (per 100,000 adults)",...,"Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)",Urban population,Urban population (% of total population),Urban population growth (annual %),Women Business and the Law Index Score (scale 1-100),Women's share of population ages 15+ living with HIV (%),Young people (ages 15-24) newly infected with HIV
1,2006,200,200,50.45287588,18.38602024,32.06685564,10.90948702,0.781938301,50,69.45640267,...,3.86,10.46,9.48,9.94,3615494,86.4,1.287272925,91.875,19.1,100
2,2007,200,200,50.32725494,18.61107236,31.71618259,10.09632547,0.701945152,51,69.644089,...,3.66,10.17,10.04,10.1,3646829,86.34,0.862952376,91.875,19.2,100
3,2008,200,200,50.2703229,18.87032332,31.39999958,8.912394894,0.64838844,51,71.7871042,...,4.17,10.96,11.83,11.41,3675355,86.28,0.779170376,91.875,19.2,100
4,2009,200,200,50.31760727,19.20558005,31.11202722,9.865896213,0.704765091,53,72.56596293,...,6.12,17.37,16.19,16.76,3709702,86.22,0.93018233,91.875,19.2,100
5,2010,200,200,50.48961789,19.6430338,30.84658409,10.70372623,0.623351158,56,72.21915768,...,6.56,17.79,17.06,17.41,3748563,86.16,1.04210166,91.875,19.1,100
6,2011,200,200,51.02720177,20.06990518,30.95729659,10.87970337,0.616814037,60,76.37628663,...,6.49,16.35,18.52,17.49,3774624,86.1,0.692820855,91.875,18.9,100
7,2012,200,200,51.50952722,20.59441393,30.91511328,11.08356517,0.643127267,61,74.81533438,...,6.93,18.38,17.63,17.99,3798063,86.161,0.619042484,91.875,18.7,100
8,2013,200,200,51.97083813,21.18686484,30.78397329,12.41178227,0.675342566,62,72.38263652,...,5.840000153,15.60999966,15.0,15.28999996,3830023,86.221,0.837960847,91.875,18.5,100
9,2014,200,200,52.44643363,21.7989118,30.64752184,11.53734646,0.685766425,64,70.61082241,...,5.429999828,14.75,13.52000046,14.10000038,3896881,86.281,1.730568118,91.875,18.2,100
10,2015,200,200,52.94728969,22.40242783,30.54486186,11.87077685,0.848069383,68,69.07275519,...,5.41,13.86,13.86,13.86,3979802,86.341,2.105557973,91.875,17.8,100


#### Feature Selection

In [32]:
# out of the remainin column 85 have been identified based on domain knowledge which will be used for furter analysis in Tableau


AUS_preprocessed = AUS_structured[["Year","Adults (ages 15+) and children (ages 0-14) newly infected with HIV","Adults (ages 15-49) newly infected with HIV","Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)","Women's share of population ages 15+ living with HIV (%)","Young people (ages 15-24) newly infected with HIV","Population ages 00-04, female (% of female population)","Population ages 00-04, male (% of male population)","Population ages 0-14 (% of total population)","Population ages 0-14, female","Population ages 0-14, female (% of female population)","Population ages 0-14, male","Population ages 0-14, male (% of male population)","Population ages 0-14, total","Population ages 05-09, female (% of female population)","Population ages 05-09, male (% of male population)","Population ages 10-14, female (% of female population)","Population ages 10-14, male (% of male population)","Population ages 15-19, female (% of female population)","Population ages 15-19, male (% of male population)","Population ages 15-64 (% of total population)","Population ages 15-64, female","Population ages 15-64, female (% of female population)","Population ages 15-64, male","Population ages 15-64, male (% of male population)","Population ages 15-64, total","Population ages 20-24, female (% of female population)","Population ages 20-24, male (% of male population)","Population ages 25-29, female (% of female population)","Population ages 25-29, male (% of male population)","Population ages 30-34, female (% of female population)","Population ages 30-34, male (% of male population)","Population ages 35-39, female (% of female population)","Population ages 35-39, male (% of male population)","Population ages 40-44, female (% of female population)","Population ages 40-44, male (% of male population)","Population ages 45-49, female (% of female population)","Population ages 45-49, male (% of male population)","Population ages 50-54, female (% of female population)","Population ages 50-54, male (% of male population)","Population ages 55-59, female (% of female population)","Population ages 55-59, male (% of male population)","Population ages 60-64, female (% of female population)","Population ages 60-64, male (% of male population)","Population ages 65 and above (% of total population)","Population ages 65 and above, female","Population ages 65 and above, female (% of female population)","Population ages 65 and above, male","Population ages 65 and above, male (% of male population)","Population ages 65 and above, total","Population ages 65-69, female (% of female population)","Population ages 65-69, male (% of male population)","Population ages 70-74, female (% of female population)","Population ages 70-74, male (% of male population)","Population ages 75-79, female (% of female population)","Population ages 75-79, male (% of male population)","Population ages 80 and above, female (% of female population)","Population ages 80 and above, male (% of male population)","Population, female","Population, female (% of total population)","Population, male","Population, male (% of total population)","Population, total","Proportion of seats held by women in national parliaments (%)","Women Business and the Law Index Score (scale 1-100)","Employment to population ratio, 15+, female (%) (national estimate)","Employment to population ratio, 15+, male (%) (national estimate)","Employment to population ratio, 15+, total (%) (modeled ILO estimate)","Employment to population ratio, 15+, total (%) (national estimate)","Employment to population ratio, ages 15-24, female (%) (national estimate)","Employment to population ratio, ages 15-24, male (%) (national estimate)","Employment to population ratio, ages 15-24, total (%) (national estimate)","GDP per person employed (constant 2017 PPP $)","Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)"]]

NZL_preprocessed = NZL_structured[["Year","Adults (ages 15+) and children (ages 0-14) newly infected with HIV","Adults (ages 15-49) newly infected with HIV","Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)","Women's share of population ages 15+ living with HIV (%)","Young people (ages 15-24) newly infected with HIV","Population ages 00-04, female (% of female population)","Population ages 00-04, male (% of male population)","Population ages 0-14 (% of total population)","Population ages 0-14, female","Population ages 0-14, female (% of female population)","Population ages 0-14, male","Population ages 0-14, male (% of male population)","Population ages 0-14, total","Population ages 05-09, female (% of female population)","Population ages 05-09, male (% of male population)","Population ages 10-14, female (% of female population)","Population ages 10-14, male (% of male population)","Population ages 15-19, female (% of female population)","Population ages 15-19, male (% of male population)","Population ages 15-64 (% of total population)","Population ages 15-64, female","Population ages 15-64, female (% of female population)","Population ages 15-64, male","Population ages 15-64, male (% of male population)","Population ages 15-64, total","Population ages 20-24, female (% of female population)","Population ages 20-24, male (% of male population)","Population ages 25-29, female (% of female population)","Population ages 25-29, male (% of male population)","Population ages 30-34, female (% of female population)","Population ages 30-34, male (% of male population)","Population ages 35-39, female (% of female population)","Population ages 35-39, male (% of male population)","Population ages 40-44, female (% of female population)","Population ages 40-44, male (% of male population)","Population ages 45-49, female (% of female population)","Population ages 45-49, male (% of male population)","Population ages 50-54, female (% of female population)","Population ages 50-54, male (% of male population)","Population ages 55-59, female (% of female population)","Population ages 55-59, male (% of male population)","Population ages 60-64, female (% of female population)","Population ages 60-64, male (% of male population)","Population ages 65 and above (% of total population)","Population ages 65 and above, female","Population ages 65 and above, female (% of female population)","Population ages 65 and above, male","Population ages 65 and above, male (% of male population)","Population ages 65 and above, total","Population ages 65-69, female (% of female population)","Population ages 65-69, male (% of male population)","Population ages 70-74, female (% of female population)","Population ages 70-74, male (% of male population)","Population ages 75-79, female (% of female population)","Population ages 75-79, male (% of male population)","Population ages 80 and above, female (% of female population)","Population ages 80 and above, male (% of male population)","Population, female","Population, female (% of total population)","Population, male","Population, male (% of total population)","Population, total","Proportion of seats held by women in national parliaments (%)","Women Business and the Law Index Score (scale 1-100)","Employment to population ratio, 15+, female (%) (national estimate)","Employment to population ratio, 15+, male (%) (national estimate)","Employment to population ratio, 15+, total (%) (modeled ILO estimate)","Employment to population ratio, 15+, total (%) (national estimate)","Employment to population ratio, ages 15-24, female (%) (national estimate)","Employment to population ratio, ages 15-24, male (%) (national estimate)","Employment to population ratio, ages 15-24, total (%) (national estimate)","GDP per person employed (constant 2017 PPP $)","Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)"]]


In [33]:
AUS_preprocessed

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,"Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)",Women's share of population ages 15+ living with HIV (%),...,"Employment to population ratio, ages 15-24, male (%) (national estimate)","Employment to population ratio, ages 15-24, total (%) (national estimate)",GDP per person employed (constant 2017 PPP $),"Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)"
1,2006,820,1000,0.03,0.07,0.04,0.1,0.1,0.1,11.7,...,64.47,63.7,86036.90687,4.92,4.67,4.78,4.78,9.47,10.55,10.03
2,2007,880,1000,0.03,0.08,0.04,0.1,0.1,0.1,11.9,...,64.99,64.13,87587.10641,4.78,4.04,4.38,4.38,9.25,9.48,9.37
3,2008,930,1000,0.03,0.08,0.04,0.1,0.1,0.1,12.1,...,65.48,64.51,88309.22179,4.57,3.96,4.23,4.23,8.6,9.02,8.82
4,2009,970,1000,0.03,0.08,0.05,0.1,0.1,0.1,12.3,...,61.62,61.18,89394.68955,5.4,5.69,5.56,5.56,10.35,12.48,11.46
5,2010,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.6,...,61.29,60.52,89365.70721,5.38,5.07,5.21,5.21,11.13,11.95,11.55
6,2011,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.7,...,60.57,60.39,90147.48017,5.3,4.89,5.08,5.08,10.78,11.94,11.38
7,2012,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.9,...,59.73,59.6,92551.26906,5.33,5.14,5.22,5.22,10.96,12.46,11.73
8,2013,1000,1000,0.03,0.08,0.05,0.1,0.1,0.1,13.1,...,58.58,58.66,94022.70698,5.61,5.71,5.66,5.66,11.36,13.04,12.22
9,2014,1000,1000,0.03,0.08,0.04,0.1,0.1,0.1,13.2,...,57.44,57.7,95697.03461,6.17,6.0,6.08,6.08,12.47,14.11,13.31
10,2015,1000,1000,0.03,0.08,0.04,0.1,0.1,0.1,13.3,...,58.15,58.45,95946.07078,6.07,6.04,6.05,6.05,11.91,14.24,13.11


In [34]:
NZL_preprocessed

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,"Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)",Women's share of population ages 15+ living with HIV (%),...,"Employment to population ratio, ages 15-24, male (%) (national estimate)","Employment to population ratio, ages 15-24, total (%) (national estimate)",GDP per person employed (constant 2017 PPP $),"Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)"
1,2006,200,200,0.02,0.06,0.04,0.1,0.1,0.1,19.1,...,60.92,58.0,73981.51254,4.2,3.56,3.86,3.86,10.46,9.48,9.94
2,2007,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,60.31,57.96,74870.73634,3.97,3.4,3.66,3.66,10.17,10.04,10.1
3,2008,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,58.06,55.77,73740.30139,4.26,4.08,4.17,4.17,10.96,11.83,11.41
4,2009,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,53.43,50.97,74654.53926,6.14,6.1,6.12,6.12,17.37,16.19,16.76
5,2010,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.1,...,50.96,49.22,75423.5376,6.93,6.23,6.56,6.56,17.79,17.06,17.41
6,2011,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.9,...,50.56,49.0,76127.9796,6.76,6.25,6.49,6.49,16.35,18.52,17.49
7,2012,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.7,...,50.18,48.45,77909.95667,7.39,6.52,6.93,6.93,18.38,17.63,17.99
8,2013,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.5,...,50.74,49.02,77993.10993,6.420000076,5.309999943,5.77,5.840000153,15.60999966,15.0,15.28999996
9,2014,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.2,...,53.91,51.6,78180.06698,6.130000114,4.800000191,5.37,5.429999828,14.75,13.52000046,14.10000038
10,2015,200,200,0.02,0.06,0.03,0.1,0.1,0.1,17.8,...,54.86,52.91,79373.7439,5.9,4.97,5.35,5.41,13.86,13.86,13.86


In [35]:
#joined dataset

AUS_preprocessed["Country"]="Australia"

NZL_preprocessed["Country"]="New Zealand"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)


In [37]:
NZL_preprocessed

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,"Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)",Women's share of population ages 15+ living with HIV (%),...,"Employment to population ratio, ages 15-24, total (%) (national estimate)",GDP per person employed (constant 2017 PPP $),"Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)",Country
1,2006,200,200,0.02,0.06,0.04,0.1,0.1,0.1,19.1,...,58.0,73981.51254,4.2,3.56,3.86,3.86,10.46,9.48,9.94,New Zealand
2,2007,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,57.96,74870.73634,3.97,3.4,3.66,3.66,10.17,10.04,10.1,New Zealand
3,2008,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,55.77,73740.30139,4.26,4.08,4.17,4.17,10.96,11.83,11.41,New Zealand
4,2009,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.2,...,50.97,74654.53926,6.14,6.1,6.12,6.12,17.37,16.19,16.76,New Zealand
5,2010,200,200,0.02,0.07,0.04,0.1,0.1,0.1,19.1,...,49.22,75423.5376,6.93,6.23,6.56,6.56,17.79,17.06,17.41,New Zealand
6,2011,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.9,...,49.0,76127.9796,6.76,6.25,6.49,6.49,16.35,18.52,17.49,New Zealand
7,2012,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.7,...,48.45,77909.95667,7.39,6.52,6.93,6.93,18.38,17.63,17.99,New Zealand
8,2013,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.5,...,49.02,77993.10993,6.420000076,5.309999943,5.77,5.840000153,15.60999966,15.0,15.28999996,New Zealand
9,2014,200,200,0.02,0.06,0.03,0.1,0.1,0.1,18.2,...,51.6,78180.06698,6.130000114,4.800000191,5.37,5.429999828,14.75,13.52000046,14.10000038,New Zealand
10,2015,200,200,0.02,0.06,0.03,0.1,0.1,0.1,17.8,...,52.91,79373.7439,5.9,4.97,5.35,5.41,13.86,13.86,13.86,New Zealand


In [38]:
joint_preprocessed = AUS_preprocessed

joint_preprocessed = joint_preprocessed.append(NZL_preprocessed)

joint_preprocessed

Unnamed: 0,Year,Adults (ages 15+) and children (ages 0-14) newly infected with HIV,Adults (ages 15-49) newly infected with HIV,"Incidence of HIV, ages 15-24 (per 1,000 uninfected population ages 15-24)","Incidence of HIV, ages 15-49 (per 1,000 uninfected population ages 15-49)","Incidence of HIV, all (per 1,000 uninfected population)","Prevalence of HIV, female (% ages 15-24)","Prevalence of HIV, male (% ages 15-24)","Prevalence of HIV, total (% of population ages 15-49)",Women's share of population ages 15+ living with HIV (%),...,"Employment to population ratio, ages 15-24, total (%) (national estimate)",GDP per person employed (constant 2017 PPP $),"Unemployment, female (% of female labor force) (national estimate)","Unemployment, male (% of male labor force) (national estimate)","Unemployment, total (% of total labor force) (modeled ILO estimate)","Unemployment, total (% of total labor force) (national estimate)","Unemployment, youth female (% of female labor force ages 15-24) (national estimate)","Unemployment, youth male (% of male labor force ages 15-24) (national estimate)","Unemployment, youth total (% of total labor force ages 15-24) (national estimate)",Country
1,2006,820,1000,0.03,0.07,0.04,0.1,0.1,0.1,11.7,...,63.7,86036.90687,4.92,4.67,4.78,4.78,9.47,10.55,10.03,Australia
2,2007,880,1000,0.03,0.08,0.04,0.1,0.1,0.1,11.9,...,64.13,87587.10641,4.78,4.04,4.38,4.38,9.25,9.48,9.37,Australia
3,2008,930,1000,0.03,0.08,0.04,0.1,0.1,0.1,12.1,...,64.51,88309.22179,4.57,3.96,4.23,4.23,8.6,9.02,8.82,Australia
4,2009,970,1000,0.03,0.08,0.05,0.1,0.1,0.1,12.3,...,61.18,89394.68955,5.4,5.69,5.56,5.56,10.35,12.48,11.46,Australia
5,2010,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.6,...,60.52,89365.70721,5.38,5.07,5.21,5.21,11.13,11.95,11.55,Australia
6,2011,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.7,...,60.39,90147.48017,5.3,4.89,5.08,5.08,10.78,11.94,11.38,Australia
7,2012,1000,1000,0.03,0.09,0.05,0.1,0.1,0.1,12.9,...,59.6,92551.26906,5.33,5.14,5.22,5.22,10.96,12.46,11.73,Australia
8,2013,1000,1000,0.03,0.08,0.05,0.1,0.1,0.1,13.1,...,58.66,94022.70698,5.61,5.71,5.66,5.66,11.36,13.04,12.22,Australia
9,2014,1000,1000,0.03,0.08,0.04,0.1,0.1,0.1,13.2,...,57.7,95697.03461,6.17,6.0,6.08,6.08,12.47,14.11,13.31,Australia
10,2015,1000,1000,0.03,0.08,0.04,0.1,0.1,0.1,13.3,...,58.45,95946.07078,6.07,6.04,6.05,6.05,11.91,14.24,13.11,Australia


#### Save Preprocessed Datasets as CSV

In [39]:
AUS_preprocessed.to_csv("AUS_preprocessed.csv", index=False)

In [40]:
NZL_preprocessed.to_csv("NZL_preprocessed.csv", index=False)

In [41]:
joint_preprocessed.to_csv("joint_preprocessed.csv", index=False)