## Introduction
This notebook series propose a data preprocessing script to visualize the correlation between access to clean drinking waterand child mortality accross the world.
The dataset was downloaded from WHO open data.


### Step 1. Load and clean the first dataset about drinking water access
Check what our datasets look like and only keep the rows and columns we need, saved as a new csv file

In [3]:
## Load 1st CSV file
import pandas as pd
import numpy as np
import scipy.stats as stats

url1 = 'drinking_water.csv'
df1 = pd.read_csv(url1, delimiter=',')
print(f"df1 type: {type(df1)}")   ## df type: <class 'pandas.core.frame.DataFrame'>
print(f"df1 shape: {df1.shape}")  ## df shape: (198,127)'

df1 type: <class 'pandas.core.frame.DataFrame'>
df1 shape: (198, 127)


In [7]:
## Check the details of the dataset 
df1.info()

## DETAILS

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 198 entries, 0 to 197
Columns: 127 entries, Unnamed: 0 to 2000.5
dtypes: object(127)
memory usage: 196.6+ KB


In [8]:
## Check the fisrt rows of the dataset
df1.head(8)

Unnamed: 0.1,Unnamed: 0,2020,2020.1,2020.2,2020.3,2020.4,2020.5,2019,2019.1,2019.2,...,2001.2,2001.3,2001.4,2001.5,2000,2000.1,2000.2,2000.3,2000.4,2000.5
0,,Population using at least basic drinking-water...,Population using at least basic drinking-water...,Population using at least basic drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...,Population using at least basic drinking-water...,Population using at least basic drinking-water...,Population using at least basic drinking-water...,...,Population using at least basic drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...,Population using at least basic drinking-water...,Population using at least basic drinking-water...,Population using at least basic drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...,Population using safely managed drinking-water...
1,"Countries, territories and areas",Total,Urban,Rural,Total,Urban,Rural,Total,Urban,Rural,...,Rural,Total,Urban,Rural,Total,Urban,Rural,Total,Urban,Rural
2,Afghanistan,75,100,66,28,36,24,72,98,64,...,21,11,21,8,28,52,21,11,21,8
3,Albania,95,96,94,71,,,95,96,93,...,81,49,,,87,96,80,49,,
4,Algeria,94,96,90,72,74,69,94,96,90,...,84,71,82,53,90,94,83,70,82,52
5,Andorra,100,100,100,91,,,100,100,100,...,100,91,,,100,100,100,91,,
6,Angola,57,72,28,,,,57,72,28,...,22,,,,41,61,21,,,
7,Antigua and Barbuda,,,,,,,,,,...,,,,,98,,,,,


In [9]:
df1.columns

Index(['Unnamed: 0', '2020', '2020.1', '2020.2', '2020.3', '2020.4', '2020.5',
       '2019', '2019.1', '2019.2',
       ...
       '2001.2', '2001.3', '2001.4', '2001.5', '2000', '2000.1', '2000.2',
       '2000.3', '2000.4', '2000.5'],
      dtype='object', length=127)

In [11]:
# Rename columns to be more descriptive
old_column_name = 'Unnamed: 0'
new_column_name = 'Countries'
df1.rename(columns={old_column_name: new_column_name}, inplace=True)

# Display the updated column names
print("\nUpdated column names:")
print(df1.columns)


Updated column names:
Index(['Countries', '2020', '2020.1', '2020.2', '2020.3', '2020.4', '2020.5',
       '2019', '2019.1', '2019.2',
       ...
       '2001.2', '2001.3', '2001.4', '2001.5', '2000', '2000.1', '2000.2',
       '2000.3', '2000.4', '2000.5'],
      dtype='object', length=127)


In [14]:
# Delete a specific row by index
row_index = 0
df1 = df1.drop(row_index)

# Display the updated DataFrame
print("\nUpdated DataFrame:")
print(df1)



Updated DataFrame:
                              Countries   2020 2020.1 2020.2 2020.3 2020.4  \
1      Countries, territories and areas  Total  Urban  Rural  Total  Urban   
2                           Afghanistan     75    100     66     28     36   
3                               Albania     95     96     94     71    NaN   
4                               Algeria     94     96     90     72     74   
5                               Andorra    100    100    100     91    NaN   
..                                  ...    ...    ...    ...    ...    ...   
193  Venezuela (Bolivarian Republic of)     94    NaN    NaN    NaN    NaN   
194                            Viet Nam     97     99     96    NaN    NaN   
195                               Yemen     61     77     51    NaN    NaN   
196                              Zambia     65     87     48    NaN     50   
197                            Zimbabwe     63     93     48     30     65   

    2020.5   2019 2019.1 2019.2  ... 2001.2

In [19]:
# Delete a specific row by index
row_index1 = 1
df1 = df1.drop(row_index1)

# Display the updated DataFrame
print("\nUpdated DataFrame:")
print(df1)


Updated DataFrame:
                              Countries 2020 2020.1 2020.2 2020.3 2020.4  \
2                           Afghanistan   75    100     66     28     36   
3                               Albania   95     96     94     71    NaN   
4                               Algeria   94     96     90     72     74   
5                               Andorra  100    100    100     91    NaN   
6                                Angola   57     72     28    NaN    NaN   
..                                  ...  ...    ...    ...    ...    ...   
193  Venezuela (Bolivarian Republic of)   94    NaN    NaN    NaN    NaN   
194                            Viet Nam   97     99     96    NaN    NaN   
195                               Yemen   61     77     51    NaN    NaN   
196                              Zambia   65     87     48    NaN     50   
197                            Zimbabwe   63     93     48     30     65   

    2020.5 2019 2019.1 2019.2  ... 2001.2 2001.3 2001.4 2001.5 2000

In [20]:
## Handpicking the columns we want to keep
## For each year, we want the total Population using at least basic drinking-water, per country
my_columns = ['Countries', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2000']
df1[my_columns]

## Dataframe shape change from [197 rows x 127 columns] to [197 rows × 22 columns]

Unnamed: 0,Countries,2020,2019,2018,2017,2016,2015,2014,2013,2012,...,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000
2,Afghanistan,75,72,70,67,64,61,59,56,53,...,46,43,41,38,36,34,32,30,28,28
3,Albania,95,95,94,94,94,93,93,93,92,...,91,90,90,90,89,89,88,87,87,87
4,Algeria,94,94,94,94,94,93,93,93,93,...,92,92,92,91,91,91,91,90,90,90
5,Andorra,100,100,100,100,100,100,100,100,100,...,100,100,100,100,100,100,100,100,100,100
6,Angola,57,57,57,56,55,54,54,53,52,...,50,49,48,47,46,45,44,43,42,41
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,Venezuela (Bolivarian Republic of),94,94,94,94,94,95,95,95,95,...,96,96,96,96,96,97,97,97,97,97
194,Viet Nam,97,96,96,95,94,93,92,91,90,...,88,87,86,86,85,84,83,82,81,81
195,Yemen,61,60,59,58,57,56,55,54,53,...,49,48,47,46,45,44,43,41,41,41
196,Zambia,65,65,64,63,62,61,61,60,59,...,56,55,54,54,53,52,51,50,49,48


In [26]:
# Save the modified DataFrame to a new CSV file
# df1[my_columns].to_csv('new_drinking_water.csv', index=False)

### Step 2 - Load and clean the second dataset about mortality for children under-five
The original data is separated by WHO region: Region of the Americas, African region, Eastern mediterranean region, European region, South-east asian region, Western pacific region. We will preprocess the data one region at a time, starting with the Region of the Americas

In [41]:
## Load 2nd CSV file containing data for Region of the Americas

url2 = 'child_mortality_americas.csv'
df2 = pd.read_csv(url2, delimiter=',')
print(f"df2 type: {type(df2)}")   ## df type: <class 'pandas.core.frame.DataFrame'>
print(f"df2 shape: {df2.shape}")  ## df shape: (2587, 8)'

df2 type: <class 'pandas.core.frame.DataFrame'>
df2 shape: (2587, 8)


In [42]:
## Check the details of the dataset 
df2.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2587 entries, 0 to 2586
Data columns (total 8 columns):
 #   Column                                                          Non-Null Count  Dtype 
---  ------                                                          --------------  ----- 
 0   Unnamed: 0                                                      2587 non-null   object
 1   Unnamed: 1                                                      2587 non-null   object
 2   Under-five mortality rate (per 1000 live births) (SDG 3.2.1)    2587 non-null   object
 3   Under-five mortality rate (per 1000 live births) (SDG 3.2.1).1  2561 non-null   object
 4   Under-five mortality rate (per 1000 live births) (SDG 3.2.1).2  2561 non-null   object
 5   Number of deaths among children under-five                      2366 non-null   object
 6   Number of deaths among children under-five.1                    2366 non-null   object
 7   Number of deaths among children under-five.2                 

In [43]:
## Check the fisrt rows of the dataset
df2.head(8)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Under-five mortality rate (per 1000 live births) (SDG 3.2.1),Under-five mortality rate (per 1000 live births) (SDG 3.2.1).1,Under-five mortality rate (per 1000 live births) (SDG 3.2.1).2,Number of deaths among children under-five,Number of deaths among children under-five.1,Number of deaths among children under-five.2
0,"Countries, territories and areas",Year,Both sexes,Male,Female,Both sexes,Male,Female
1,British Virgin Islands,2021,10.46 [4.75-22.78],11.35 [5.15-24.98],9.49 [4.31-20.63],2 [1-5],1 [1-3],1 [0-2]
2,British Virgin Islands,2020,10.85 [5.05-22.86],11.73 [5.49-24.99],9.85 [4.59-20.78],2 [1-5],1 [1-3],1 [1-2]
3,British Virgin Islands,2019,11.18 [5.39-22.97],12.16 [5.86-25.04],10.14 [4.87-20.86],2 [1-5],1 [1-3],1 [1-2]
4,British Virgin Islands,2018,11.55 [5.71-23.09],12.55 [6.24-25.18],10.49 [5.19-21.02],2 [1-5],1 [1-3],1 [1-2]
5,British Virgin Islands,2017,11.89 [6.05-23.36],12.94 [6.57-25.36],10.86 [5.5-21.29],3 [1-5],1 [1-3],2 [1-2]
6,British Virgin Islands,2016,12.29 [6.4-23.42],13.37 [6.94-25.52],11.15 [5.82-21.38],3 [2-6],2 [1-3],1 [1-3]
7,British Virgin Islands,2015,12.69 [6.82-23.42],13.77 [7.38-25.51],11.51 [6.19-21.36],3 [2-6],2 [1-4],1 [1-3]


In [44]:
# Rename columns to be more descriptive
old_column0 = 'Unnamed: 0'
new_column0 = 'Countries'
old_column1 = 'Unnamed: 1'
new_column1 = 'Year'
old_column2 = 'Under-five mortality rate (per 1000 live births) (SDG 3.2.1)'
new_column2 = 'Under-five mortality rate per 1000'
old_column3 = 'Number of deaths among children under-five'
new_column3 = 'Under-five number of deaths'
df2.rename(columns={old_column0: new_column0}, inplace=True)
df2.rename(columns={old_column1: new_column1}, inplace=True)
df2.rename(columns={old_column2: new_column2}, inplace=True)
df2.rename(columns={old_column3: new_column3}, inplace=True)

# Display the updated column names
print("\nUpdated column names:")
print(df2.columns)



Updated column names:
Index(['Countries', 'Year', 'Under-five mortality rate per 1000',
       'Under-five mortality rate (per 1000 live births) (SDG 3.2.1).1',
       'Under-five mortality rate (per 1000 live births) (SDG 3.2.1).2',
       'Under-five number of deaths',
       'Number of deaths among children under-five.1',
       'Number of deaths among children under-five.2'],
      dtype='object')


In [45]:
# Delete a specific row by index
my_row_index = 0
df2 = df2.drop(my_row_index)

# Display the updated DataFrame
print("\nUpdated DataFrame:")
print(df2)


Updated DataFrame:
                               Countries  Year  \
1                 British Virgin Islands  2021   
2                 British Virgin Islands  2020   
3                 British Virgin Islands  2019   
4                 British Virgin Islands  2018   
5                 British Virgin Islands  2017   
...                                  ...   ...   
2582  Venezuela (Bolivarian Republic of)  1955   
2583  Venezuela (Bolivarian Republic of)  1954   
2584  Venezuela (Bolivarian Republic of)  1953   
2585  Venezuela (Bolivarian Republic of)  1952   
2586  Venezuela (Bolivarian Republic of)  1951   

     Under-five mortality rate per 1000  \
1                    10.46 [4.75-22.78]   
2                    10.85 [5.05-22.86]   
3                    11.18 [5.39-22.97]   
4                    11.55 [5.71-23.09]   
5                    11.89 [6.05-23.36]   
...                                 ...   
2582                97.5 [78.45-122.21]   
2583              101.92 [79.64-132

In [46]:
## Handpicking the columns we want to keep
## For each year, we want the rate of under-five children mortality, per country
our_columns = ['Countries', 'Year', 'Under-five mortality rate per 1000']
df2[our_columns]
## Dataframe shape change from [2586 rows x 8 columns] to [2586 rows × 3 columns]

Unnamed: 0,Countries,Year,Under-five mortality rate per 1000
1,British Virgin Islands,2021,10.46 [4.75-22.78]
2,British Virgin Islands,2020,10.85 [5.05-22.86]
3,British Virgin Islands,2019,11.18 [5.39-22.97]
4,British Virgin Islands,2018,11.55 [5.71-23.09]
5,British Virgin Islands,2017,11.89 [6.05-23.36]
...,...,...,...
2582,Venezuela (Bolivarian Republic of),1955,97.5 [78.45-122.21]
2583,Venezuela (Bolivarian Republic of),1954,101.92 [79.64-132.42]
2584,Venezuela (Bolivarian Republic of),1953,106.68 [80.47-143.83]
2585,Venezuela (Bolivarian Republic of),1952,111.61 [80.98-157.68]


In [47]:
## Pivot the DataFrame so that the second dataframe has the same shape as the first one

# Pivot the DataFrame
df2_pivot = df2.pivot(index='Countries', columns='Year', values='Under-five mortality rate per 1000')
# Reset the index and rename the columns
df2_pivot = df2_pivot.reset_index().rename_axis(None, axis=1)

# Display the updated DataFrame
df2_pivot.head(10)

Unnamed: 0,Countries,1934,1935,1936,1937,1938,1939,1940,1941,1942,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Anguilla,,,,,,,,,,...,5.55 [3.31-9.42],5.33 [3.09-9.28],5.14 [2.91-9.19],4.97 [2.72-9.1],4.8 [2.56-9.08],4.64 [2.41-8.96],4.49 [2.27-8.83],4.33 [2.13-8.72],4.18 [2.03-8.62],4.05 [1.92-8.52]
1,Antigua and Barbuda,,,,,,,,,,...,9.02 [7.78-10.43],8.55 [7.23-10.06],8.13 [6.72-9.78],7.75 [6.24-9.59],7.43 [5.79-9.49],7.12 [5.37-9.43],6.84 [5.01-9.43],6.59 [4.65-9.43],6.36 [4.31-9.47],6.14 [4-9.52]
2,Argentina,,,,,,,,,,...,13.34 [13.09-13.6],12.75 [12.51-12.99],12.17 [11.94-12.4],11.59 [11.38-11.82],10.97 [10.76-11.19],10.25 [10.06-10.46],9.45 [9.27-9.63],8.58 [8.4-8.77],7.7 [7.46-7.96],6.92 [6.58-7.29]
3,Bahamas,,,,,,,,,,...,14.72 [13.75-15.71],14.41 [13.44-15.41],14.15 [13.21-15.16],13.93 [13-14.95],13.76 [12.8-14.8],13.63 [12.62-14.72],13.53 [12.4-14.72],23.42 [21.09-25.89],13.32 [11.72-15.11],13.16 [11.29-15.36]
4,Barbados,,,,,,,,,,...,14.74 [13.12-16.5],14.55 [12.66-16.73],14.35 [12.16-16.92],14.07 [11.53-17.12],13.77 [10.88-17.29],13.43 [10.24-17.51],13.04 [9.58-17.79],12.66 [8.98-17.97],12.29 [8.38-18.19],11.91 [7.82-18.27]
5,Belize,,,,,,,,,,...,17.68 [16.75-18.64],16.93 [16.02-17.89],16.12 [15.22-17.11],15.32 [14.38-16.34],14.51 [13.43-15.65],13.73 [12.48-15.07],12.98 [11.52-14.59],12.3 [10.62-14.23],11.7 [9.79-13.93],11.19 [9.04-13.75]
6,Bolivia (Plurinational State of),,,,,,,,,,...,37.3 [32.36-42.83],35.22 [30.1-41.01],33.33 [28.05-39.31],31.64 [26.13-37.92],30.2 [24.45-36.84],28.92 [22.87-36.11],27.72 [21.42-35.65],26.59 [19.97-35.33],25.63 [18.65-35.18],24.69 [17.38-35.16]
7,Brazil,266.51 [195.85-364.17],263.67 [202.16-343.62],260.95 [208.39-326.18],258.44 [212.53-313.25],255.46 [214.83-301.59],252.91 [216.39-293.95],250.12 [216.02-288.87],247.06 [213.45-286.1],244.08 [210.01-283.63],...,17.24 [16.9-17.59],16.72 [16.36-17.08],16.3 [15.87-16.74],15.95 [15.39-16.51],16.75 [15.99-17.53],15.39 [14.49-16.34],15.16 [14-16.41],14.94 [13.48-16.58],14.7 [12.91-16.75],14.41 [12.28-16.87]
8,British Virgin Islands,,,,,,,,,,...,13.8 [8.08-23.45],13.44 [7.64-23.54],13.06 [7.23-23.43],12.69 [6.82-23.42],12.29 [6.4-23.42],11.89 [6.05-23.36],11.55 [5.71-23.09],11.18 [5.39-22.97],10.85 [5.05-22.86],10.46 [4.75-22.78]
9,Canada,,,,,,,,,,...,5.58 [5.48-5.68],5.51 [5.42-5.6],5.45 [5.35-5.54],5.39 [5.29-5.48],5.33 [5.24-5.42],5.28 [5.19-5.37],5.23 [5.13-5.32],5.18 [5.06-5.29],5.11 [4.97-5.27],5.04 [4.82-5.26]


In [48]:
## Handpicking the columns we want to keep
## For each year, we want the total Population using at least basic drinking-water, per country
keep_columns = ['Countries', '2020', '2019', '2018', '2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2000']
df2_pivot[keep_columns]

## Dataframe shape change from [2586 rows × 3 columns] to [39 rows × 22 columns]

Unnamed: 0,Countries,2020,2019,2018,2017,2016,2015,2014,2013,2012,...,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000
0,Anguilla,4.18 [2.03-8.62],4.33 [2.13-8.72],4.49 [2.27-8.83],4.64 [2.41-8.96],4.8 [2.56-9.08],4.97 [2.72-9.1],5.14 [2.91-9.19],5.33 [3.09-9.28],5.55 [3.31-9.42],...,6.34 [4.03-9.94],6.62 [4.3-10.28],6.91 [4.55-10.63],7.23 [4.78-11],7.56 [5.03-11.4],7.92 [5.28-11.88],8.31 [5.55-12.35],8.71 [5.86-12.87],9.12 [6.2-13.41],9.57 [6.57-13.97]
1,Antigua and Barbuda,6.36 [4.31-9.47],6.59 [4.65-9.43],6.84 [5.01-9.43],7.12 [5.37-9.43],7.43 [5.79-9.49],7.75 [6.24-9.59],8.13 [6.72-9.78],8.55 [7.23-10.06],9.02 [7.78-10.43],...,10.6 [9.33-12.04],11.16 [9.85-12.69],11.75 [10.4-13.3],12.34 [10.96-13.93],12.94 [11.52-14.56],13.53 [12.09-15.17],14.1 [12.65-15.79],14.65 [13.15-16.37],15.12 [13.62-16.85],15.53 [14.02-17.29]
2,Argentina,7.7 [7.46-7.96],8.58 [8.4-8.77],9.45 [9.27-9.63],10.25 [10.06-10.46],10.97 [10.76-11.19],11.59 [11.38-11.82],12.17 [11.94-12.4],12.75 [12.51-12.99],13.34 [13.09-13.6],...,14.96 [14.67-15.25],15.37 [15.08-15.66],15.72 [15.43-16.03],16.11 [15.8-16.43],16.59 [16.27-16.9],17.13 [16.8-17.46],17.72 [17.38-18.05],18.34 [17.99-18.69],18.99 [18.63-19.37],19.68 [19.3-20.04]
3,Bahamas,13.32 [11.72-15.11],23.42 [21.09-25.89],13.53 [12.4-14.72],13.63 [12.62-14.72],13.76 [12.8-14.8],13.93 [13-14.95],14.15 [13.21-15.16],14.41 [13.44-15.41],14.72 [13.75-15.71],...,15.82 [14.85-16.81],16.2 [15.21-17.23],16.53 [15.54-17.57],16.79 [15.79-17.82],16.92 [15.93-17.95],16.91 [15.93-17.95],16.79 [15.82-17.85],16.6 [15.62-17.66],16.41 [15.44-17.47],16.3 [15.33-17.35]
4,Barbados,12.29 [8.38-18.19],12.66 [8.98-17.97],13.04 [9.58-17.79],13.43 [10.24-17.51],13.77 [10.88-17.29],14.07 [11.53-17.12],14.35 [12.16-16.92],14.55 [12.66-16.73],14.74 [13.12-16.5],...,15.14 [13.99-16.4],15.31 [14.17-16.54],15.47 [14.34-16.68],15.63 [14.49-16.83],15.74 [14.62-16.93],15.78 [14.66-16.97],15.72 [14.62-16.89],15.59 [14.49-16.73],15.38 [14.31-16.51],15.16 [14.1-16.3]
5,Belize,11.7 [9.79-13.93],12.3 [10.62-14.23],12.98 [11.52-14.59],13.73 [12.48-15.07],14.51 [13.43-15.65],15.32 [14.38-16.34],16.12 [15.22-17.11],16.93 [16.02-17.89],17.68 [16.75-18.64],...,19.45 [18.48-20.44],19.88 [18.91-20.88],20.26 [19.29-21.27],20.6 [19.64-21.63],20.92 [19.94-21.97],21.28 [20.28-22.34],21.7 [20.66-22.79],22.21 [21.08-23.4],22.85 [21.59-24.2],23.61 [22.13-25.21]
6,Bolivia (Plurinational State of),25.63 [18.65-35.18],26.59 [19.97-35.33],27.72 [21.42-35.65],28.92 [22.87-36.11],30.2 [24.45-36.84],31.64 [26.13-37.92],33.33 [28.05-39.31],35.22 [30.1-41.01],37.3 [32.36-42.83],...,44.44 [39.7-49.5],47.13 [42.41-52.15],50 [45.29-54.97],53.07 [48.47-57.99],56.34 [51.82-61.19],59.79 [55.29-64.62],63.46 [58.92-68.26],67.32 [62.74-72.1],71.37 [66.74-76.24],75.63 [70.94-80.62]
7,Brazil,14.7 [12.91-16.75],14.94 [13.48-16.58],15.16 [14-16.41],15.39 [14.49-16.34],16.75 [15.99-17.53],15.95 [15.39-16.51],16.3 [15.87-16.74],16.72 [16.36-17.08],17.24 [16.9-17.59],...,19.55 [18.86-20.27],20.61 [19.68-21.6],21.84 [20.67-23.07],23.24 [21.85-24.7],24.8 [23.21-26.49],26.53 [24.8-28.42],28.43 [26.53-30.43],30.41 [28.39-32.57],32.52 [30.36-34.8],34.73 [32.47-37.14]
8,British Virgin Islands,10.85 [5.05-22.86],11.18 [5.39-22.97],11.55 [5.71-23.09],11.89 [6.05-23.36],12.29 [6.4-23.42],12.69 [6.82-23.42],13.06 [7.23-23.43],13.44 [7.64-23.54],13.8 [8.08-23.45],...,14.8 [9.45-23.05],15.05 [9.93-22.76],15.25 [10.39-22.34],15.33 [10.84-21.72],15.4 [11.24-21.11],15.45 [11.62-20.53],15.47 [11.94-20.11],15.55 [12.22-19.81],15.63 [12.49-19.55],15.74 [12.71-19.42]
9,Canada,5.11 [4.97-5.27],5.18 [5.06-5.29],5.23 [5.13-5.32],5.28 [5.19-5.37],5.33 [5.24-5.42],5.39 [5.29-5.48],5.45 [5.35-5.54],5.51 [5.42-5.6],5.58 [5.48-5.68],...,5.8 [5.7-5.9],5.88 [5.78-5.98],5.96 [5.86-6.07],6.03 [5.93-6.15],6.1 [5.99-6.21],6.14 [6.03-6.25],6.17 [6.06-6.27],6.18 [6.08-6.29],6.2 [6.09-6.3],6.23 [6.12-6.34]


In [49]:
### Delete part after the split character "[" for all columns
final_df2 = df2_pivot[keep_columns]
# Split columns
for column in final_df2.columns:
    final_df2[column] = final_df2[column].str.split('[').str[0]
    
print(final_df2)

                             Countries    2020    2019    2018    2017  \
0                             Anguilla   4.18    4.33    4.49    4.64    
1                  Antigua and Barbuda   6.36    6.59    6.84    7.12    
2                            Argentina    7.7    8.58    9.45   10.25    
3                              Bahamas  13.32   23.42   13.53   13.63    
4                             Barbados  12.29   12.66   13.04   13.43    
5                               Belize   11.7    12.3   12.98   13.73    
6     Bolivia (Plurinational State of)  25.63   26.59   27.72   28.92    
7                               Brazil   14.7   14.94   15.16   15.39    
8               British Virgin Islands  10.85   11.18   11.55   11.89    
9                               Canada   5.11    5.18    5.23    5.28    
10                               Chile   6.85    7.15     7.4    7.61    
11                            Colombia  13.25   13.72   14.14   14.62    
12                          Costa Rica

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df2[column] = final_df2[column].str.split('[').str[0]


In [50]:
## Check the dataset after all the changes
final_df2.head(10)

Unnamed: 0,Countries,2020,2019,2018,2017,2016,2015,2014,2013,2012,...,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000
0,Anguilla,4.18,4.33,4.49,4.64,4.8,4.97,5.14,5.33,5.55,...,6.34,6.62,6.91,7.23,7.56,7.92,8.31,8.71,9.12,9.57
1,Antigua and Barbuda,6.36,6.59,6.84,7.12,7.43,7.75,8.13,8.55,9.02,...,10.6,11.16,11.75,12.34,12.94,13.53,14.1,14.65,15.12,15.53
2,Argentina,7.7,8.58,9.45,10.25,10.97,11.59,12.17,12.75,13.34,...,14.96,15.37,15.72,16.11,16.59,17.13,17.72,18.34,18.99,19.68
3,Bahamas,13.32,23.42,13.53,13.63,13.76,13.93,14.15,14.41,14.72,...,15.82,16.2,16.53,16.79,16.92,16.91,16.79,16.6,16.41,16.3
4,Barbados,12.29,12.66,13.04,13.43,13.77,14.07,14.35,14.55,14.74,...,15.14,15.31,15.47,15.63,15.74,15.78,15.72,15.59,15.38,15.16
5,Belize,11.7,12.3,12.98,13.73,14.51,15.32,16.12,16.93,17.68,...,19.45,19.88,20.26,20.6,20.92,21.28,21.7,22.21,22.85,23.61
6,Bolivia (Plurinational State of),25.63,26.59,27.72,28.92,30.2,31.64,33.33,35.22,37.3,...,44.44,47.13,50.0,53.07,56.34,59.79,63.46,67.32,71.37,75.63
7,Brazil,14.7,14.94,15.16,15.39,16.75,15.95,16.3,16.72,17.24,...,19.55,20.61,21.84,23.24,24.8,26.53,28.43,30.41,32.52,34.73
8,British Virgin Islands,10.85,11.18,11.55,11.89,12.29,12.69,13.06,13.44,13.8,...,14.8,15.05,15.25,15.33,15.4,15.45,15.47,15.55,15.63,15.74
9,Canada,5.11,5.18,5.23,5.28,5.33,5.39,5.45,5.51,5.58,...,5.8,5.88,5.96,6.03,6.1,6.14,6.17,6.18,6.2,6.23


In [51]:
# Save the modified DataFrame to a new CSV file
# final_df2.to_csv('new_child_mortality_americas.csv', index=False)

### Step 3 - Repeat the data transformations under Step 2 for all the other WHO regions
We've prepared the data for the Region of the Americas. 
We need to do it for the African region, Eastern mediterranean region, European region, South-east asian region, Western pacific region. Go ahead and preprocess the data one region at a time.

In [None]:
## See the next Notebook