# Importing python packages

In [1]:
import pandas as pd

# Importing the data sets
## Births registration

First we need to import the dataset. As the dataset is in CSV format the easiest way is to use 'read_csv' pandas function for this file format. 

Birth registration data (2022) Birth registration data - UNICEF DATA data.unicef.org. UNICEF. Available at: https://data.unicef.org/resources/dataset/percentage-children-age-5-whose-births-registered-sex-place-residence-household-wealth-quintile/ (Accessed: December 30, 2022).

NOTE: File name for originally generated CSV file had to be renamed as Git was throwing the ‘Filename too long’ error when tried to add it to repository.

In [2]:
CB_percent = pd.read_csv('Percent of children_UNICEF_1.0_all.csv',low_memory=False)

### Exploring the data set

In [13]:
CB_percent.shape

(1648747, 22)

In [15]:
CB_percent.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1648747 entries, 0 to 1648746
Data columns (total 22 columns):
 #   Column                                                                          Non-Null Count    Dtype  
---  ------                                                                          --------------    -----  
 0   DATAFLOW                                                                        1648747 non-null  object 
 1   REF_AREA:Geographic area                                                        1648747 non-null  object 
 2   INDICATOR:Indicator                                                             1648747 non-null  object 
 3   SEX:Sex                                                                         1648747 non-null  object 
 4   TIME_PERIOD:Time period                                                         1648747 non-null  object 
 5   OBS_VALUE:Observation Value                                                     1648747 non-null  object 

We can see that there are a lot of null data values, however no columns are totally empty, hence need to analyse to detemine what can be discarded.

#### Analysis what columns can be discarded from the original data set

In [None]:
CB_percent.groupby('DATAFLOW').nunique()

In [None]:
CB_percent.groupby('REF_AREA:Geographic area').nunique()

In [None]:
CB_percent.groupby('INDICATOR:Indicator').nunique()

In [None]:
CB_percent.groupby('SEX:Sex').nunique()

In [None]:
CB_percent.groupby('TIME_PERIOD:Time period').nunique()

In [None]:
CB_percent.groupby('UNIT_MULTIPLIER:Unit multiplier').nunique()

In [None]:
CB_percent.groupby('OBS_STATUS:Observation Status').nunique()

In [None]:
CB_percent.groupby('OBS_CONF:Observation confidentaility').nunique()

In [None]:
CB_percent.groupby('LOWER_BOUND:Lower Bound').nunique()

In [None]:
CB_percent.groupby('UPPER_BOUND:Upper Bound').nunique()

In [None]:
CB_percent.groupby('WGTD_SAMPL_SIZE:Weighted Sample Size').nunique()

In [None]:
CB_percent.groupby('OBS_FOOTNOTE:Observation footnote').nunique()

In [None]:
CB_percent.groupby('SERIES_FOOTNOTE:Series footnote').nunique()

In [None]:
CB_percent.groupby('DATA_SOURCE:Data Source').nunique()

In [None]:
CB_percent.groupby('SOURCE_LINK:Citation of or link to the data source').nunique()

In [None]:
CB_percent.groupby('CUSTODIAN:Custodian').nunique()

In [None]:
CB_percent.groupby('TIME_PERIOD_METHOD:Time period activity related to when the data are collected').nunique()

In [None]:
CB_percent.groupby('REF_PERIOD:Reference Period').nunique()

In [None]:
CB_percent.groupby('COVERAGE_TIME:The period of time for which data are provided').nunique()

In [None]:
CB_percent.groupby('AGE:Current age').nunique()

- `DATAFLOW` column has only one value, which has no added value for our project, hence can be dropped from the original data set.
- `REF_AREA:Geographic area` column contains 330 differenc geographical areas (ie. countries) which is one of the main coulumns we expected to find.
- `INDICATOR:Indicator` column lists different indicators collected data represents 
- `SEX:Sex` column indicate if the figures collected reffer to *F: Female*, *M: Male* or *_T: Total*. Something I'm not focusing to look into at this moment, hence decision to look at only *_T: Total* values.
- `TIME_PERIOD:Time period` column indicates year the data is reported for, but it was noticed that some fields don't only contain just a year, but also a specific dates. This needs further analysis and filtering.
- `UNIT_MULTIPLIER:Unit multiplier`column indicates if data represents *0: Units* or *3: Thousands*.
- `OBS_STATUS:Observation Status` column has 8 unique values describing how the values were obtained. I have decided to discard all rows that are not having *A: Normal value* or *RP: Reported* values.
- `OBS_CONF:Observation confidentaility`column indicates if data is *F: Free* or *N: Not for publication, restricted for internal use only*.
- `LOWER_BOUND:Lower Bound` contains 289582 different values. 
- `UPPER_BOUND:Upper Bound` contains 297085 different values.
- `WGTD_SAMPL_SIZE:Weighted Sample Size` contains 16445 different values.
- `OBS_FOOTNOTE:Observation footnote` contains 1037 different notes about the data points.
- `SERIES_FOOTNOTE:Series footnote`  contains 21 different values and provides some additional note about the data points
- `DATA_SOURCE:Data Source` contains 5007 different references on where is data coming from and it looks it's mainly written in the local language.
- `SOURCE_LINK:Citation of or link to the data source` column name is self explanatory and contas 51 unique values
- `CUSTODIAN:Custodian` column contains information about 7 different data custodians.
- `TIME_PERIOD_METHOD:Time period activity related to when the data are collected` column has 3 unique values *EOF: End of fieldwork*,  *MOF: Middle of fieldwork* and *OTHER: Other* which doesn't seem too interesting hence will drop from the final data set. 
- `REF_PERIOD:Reference Period` column contains two unique values *2000-2019* and *2000-2020*
- `COVERAGE_TIME:The period of time for which data are provided` column contains 377 unique values for the time period data points are provided.
- `AGE:Current age` column indicates the age range of population counted and contains 29 unique values. 

In [None]:
CB_percent_opt = CB_percent["REF_AREA:Geographic area","INDICATOR:Indicator","??? - TIME_PERIOD:Time period",""]

In [None]:
CB_percent.head(5)

#### Further analysis and filtering out non-needed column data

Currently, we are only interested in the % of childbirths registered, and can notice that for `AFG: Afganistan` there are values for Life expectancy in the data set. Also `DATAFLOW` column appear to offer not much value for us. Let's explore the values in those (and other) columns.

In [7]:
CB_percent['DATAFLOW'].values

array(['UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators',
       'UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators',
       'UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators', ...,
       'UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators',
       'UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators',
       'UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators'],
      dtype=object)

The value `UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators` seems to be repeating within the `DATAFLOW` column. Let's check if there are any other values contained.

In [8]:
Examine = CB_percent[~CB_percent["DATAFLOW"].isin(["UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators"])]
Examine

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age


No, the value `UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indicators` is the only one within the `DATAFLOW` column, hence we should drop it from the `Filtered_CB_percent` dataset we are planning to create later on.

In [9]:
Examine = CB_percent[~CB_percent["SEX:Sex"].isin(["_T: Total"])]
Examine

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age


The value `SEX:Sex` is the same as it contains only one value within (ie. `_T: Total`) we should drop it from the `Filtered_CB_percent` as well.

In [10]:
Examine1 = CB_percent[~CB_percent["UNIT_MULTIPLIER:Unit multiplier"].isin(["0: Units"])]
Examine1

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
62,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2015,42.3,,PCNT: %,RP: Reported,F: Free,,,DHS 2015,,EOF: End of fieldwork,2015,Y0T4: Under 5 years old
125,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ALB: Albania,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,98.4,,PCNT: %,RP: Reported,F: Free,,,DHS 2017-18,,EOF: End of fieldwork,2017-18,Y0T4: Under 5 years old
188,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,DZA: Algeria,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2019,99.6,,PCNT: %,RP: Reported,F: Free,,,MICS 2018-19,,EOF: End of fieldwork,2018-19,Y0T4: Under 5 years old
375,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AGO: Angola,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2016,25.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2015-16,,EOF: End of fieldwork,2015-16,Y0T4: Under 5 years old
562,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ARG: Argentina,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2020,99.7,,PCNT: %,RP: Reported,F: Free,The sample was national and urban,,MICS 2019-20,,EOF: End of fieldwork,2019-20,Y0T4: Under 5 years old
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15234,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_WE: Western Europe,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,100.0,,PCNT: %,E: Estimated value,F: Free,Based on 32 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
15297,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,WORLD: World,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,75.6,,PCNT: %,E: Estimated value,F: Free,Based on 166 countries with a population cover...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
15360,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,YEM: Yemen,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2013,30.7,,PCNT: %,RP: Reported,F: Free,,,DHS 2013,,EOF: End of fieldwork,2013,Y0T4: Under 5 years old
15423,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZMB: Zambia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,14.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2018,,EOF: End of fieldwork,2018,Y0T4: Under 5 years old


In [11]:
Examine1 = Examine1[~Examine1["UNIT_MULTIPLIER:Unit multiplier"].isna()]
Examine1

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age


Simiar is true for `UNIT_MULTIPLIER:Unit multiplier` with only one value of `0: Units` and 142 empty rows.

In [12]:
Examine1 = CB_percent[~CB_percent["UNIT_MEASURE:Unit of measure"].isin(["YR: Years"])]
Examine1

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
31,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1992,21.355,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
32,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1993,21.444,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
33,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1994,21.534,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
34,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1995,21.624,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
35,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1996,21.714,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15482,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2019,32.210,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15483,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2020,32.242,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15484,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2021,32.303,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15485,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2022,32.395,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total


In [13]:
Examine2 = Examine1[~Examine1["UNIT_MEASURE:Unit of measure"].isin(["PCNT: %"])]
Examine2

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age


For `UNIT_MEASURE:Unit of measure` need to check if for every value of `PCNT: %` each of the values in column `INDICATOR:Indicator` is equal to `PT_CHLD_Y0T4_REG: Percentage of children under age 5 whose births are registered`

In [14]:
Examine1 = CB_percent[~CB_percent["UNIT_MEASURE:Unit of measure"].isin(["YR: Years"])]
Examine2 = Examine1 [~Examine1 ["INDICATOR:Indicator"].isin(["PT_CHLD_Y0T4_REG: Percentage of children under age 5 whose births are registered"])]
Examine2

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
31,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1992,21.355,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
32,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1993,21.444,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
33,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1994,21.534,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
34,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1995,21.624,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
35,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,DM_POP_URBN: Share of urban population,_T: Total,1996,21.714,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15481,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2018,32.209,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15482,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2019,32.210,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15483,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2020,32.242,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total
15484,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZWE: Zimbabwe,DM_POP_URBN: Share of urban population,_T: Total,2021,32.303,0: Units,PCNT: %,,,,"233 countries from WUP2018, used WPP2019 total...","United Nations, Department of Economic and Soc...",https://population.un.org/wup/,,,_T: Total


As per above check since for each row where `UNIT_MEASURE:Unit of measure` is equal to `PCNT: %` at the same time values in column `INDICATOR:Indicator` are all equal to `PT_CHLD_Y0T4_REG: Percentage of children under age 5 whose births are registered` then column `UNIT_MEASURE:Unit of measure` can also be discarded.

In [15]:
CB_percent["OBS_STATUS:Observation Status"].value_counts()

RP: Reported          120
E: Estimated value     13
RA: Reanalysed          9
Name: OBS_STATUS:Observation Status, dtype: int64

In [16]:
Examine = CB_percent[~CB_percent["OBS_STATUS:Observation Status"].isna()]
Examine

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
62,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2015,42.3,,PCNT: %,RP: Reported,F: Free,,,DHS 2015,,EOF: End of fieldwork,2015,Y0T4: Under 5 years old
125,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ALB: Albania,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,98.4,,PCNT: %,RP: Reported,F: Free,,,DHS 2017-18,,EOF: End of fieldwork,2017-18,Y0T4: Under 5 years old
188,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,DZA: Algeria,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2019,99.6,,PCNT: %,RP: Reported,F: Free,,,MICS 2018-19,,EOF: End of fieldwork,2018-19,Y0T4: Under 5 years old
375,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AGO: Angola,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2016,25.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2015-16,,EOF: End of fieldwork,2015-16,Y0T4: Under 5 years old
562,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ARG: Argentina,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2020,99.7,,PCNT: %,RP: Reported,F: Free,The sample was national and urban,,MICS 2019-20,,EOF: End of fieldwork,2019-20,Y0T4: Under 5 years old
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15234,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_WE: Western Europe,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,100.0,,PCNT: %,E: Estimated value,F: Free,Based on 32 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
15297,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,WORLD: World,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,75.6,,PCNT: %,E: Estimated value,F: Free,Based on 166 countries with a population cover...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
15360,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,YEM: Yemen,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2013,30.7,,PCNT: %,RP: Reported,F: Free,,,DHS 2013,,EOF: End of fieldwork,2013,Y0T4: Under 5 years old
15423,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZMB: Zambia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,14.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2018,,EOF: End of fieldwork,2018,Y0T4: Under 5 years old


In [17]:
Examine2 = CB_percent[CB_percent["OBS_STATUS:Observation Status"].isna()]
Examine2["OBS_STATUS:Observation Status"].value_counts()

Series([], Name: OBS_STATUS:Observation Status, dtype: int64)

In [18]:
Estimated_CB_percent = CB_percent[CB_percent["OBS_STATUS:Observation Status"].isin(["E: Estimated value"])]
Estimated_CB_percent

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
4063,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_EECA: Eastern Europe and Central Asia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,99.25,,PCNT: %,E: Estimated value,F: Free,Based on 19 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
4157,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_ESA: Eastern and Southern Africa,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,40.19,,PCNT: %,E: Estimated value,F: Free,Based on 19 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
4690,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_ECA: Europe and Central Asia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,99.62,,PCNT: %,E: Estimated value,F: Free,Based on 51 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
5377,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,DEU: Germany,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2020,100.0,,PCNT: %,E: Estimated value,F: Free,Estimated coverage of birth registration was o...,,Federal Statistical Office,,EOF: End of fieldwork,2020,Y0T4: Under 5 years old
7567,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_LAC: Latin America and Caribbean,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,94.85,,PCNT: %,E: Estimated value,F: Free,Based on 27 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
7692,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNSDG_LDC: Least developed countries,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,45.59,,PCNT: %,E: Estimated value,F: Free,Based on 40 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
9036,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_MENA: Middle East and North Africa,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,91.73,,PCNT: %,E: Estimated value,F: Free,Based on 15 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
10164,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_NA: North America,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,100.0,,PCNT: %,E: Estimated value,F: Free,Based on 2 countries with a population coverag...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
12944,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_SA: South Asia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,70.11,,PCNT: %,E: Estimated value,F: Free,Based on 6 countries with a population coverag...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old
13196,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,UNICEF_SSA: Sub-Saharan Africa,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2021,46.87,,PCNT: %,E: Estimated value,F: Free,Based on 40 countries with a population covera...,,"DHS, MICS, other national surveys, censuses an...",,EOF: End of fieldwork,2012-21,Y0T4: Under 5 years old


In [19]:
Reported_CB_percent = CB_percent[CB_percent["OBS_STATUS:Observation Status"].isin(["RP: Reported"])]
Reported_CB_percent

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
62,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AFG: Afghanistan,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2015,42.3,,PCNT: %,RP: Reported,F: Free,,,DHS 2015,,EOF: End of fieldwork,2015,Y0T4: Under 5 years old
125,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ALB: Albania,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,98.4,,PCNT: %,RP: Reported,F: Free,,,DHS 2017-18,,EOF: End of fieldwork,2017-18,Y0T4: Under 5 years old
188,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,DZA: Algeria,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2019,99.6,,PCNT: %,RP: Reported,F: Free,,,MICS 2018-19,,EOF: End of fieldwork,2018-19,Y0T4: Under 5 years old
375,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,AGO: Angola,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2016,25.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2015-16,,EOF: End of fieldwork,2015-16,Y0T4: Under 5 years old
562,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ARG: Argentina,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2020,99.7,,PCNT: %,RP: Reported,F: Free,The sample was national and urban,,MICS 2019-20,,EOF: End of fieldwork,2019-20,Y0T4: Under 5 years old
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14828,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,VUT: Vanuatu,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2013,43.4,,PCNT: %,RP: Reported,F: Free,Data refer only to children with a birth certi...,,DHS 2013,,EOF: End of fieldwork,2013,Y0T4: Under 5 years old
14953,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,VNM: Viet Nam,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2014,96.1,,PCNT: %,RP: Reported,F: Free,,,MICS 2014,,EOF: End of fieldwork,2014,Y0T4: Under 5 years old
15360,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,YEM: Yemen,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2013,30.7,,PCNT: %,RP: Reported,F: Free,,,DHS 2013,,EOF: End of fieldwork,2013,Y0T4: Under 5 years old
15423,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZMB: Zambia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,14.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2018,,EOF: End of fieldwork,2018,Y0T4: Under 5 years old


In [20]:
Rd_CB_percent_sorted = Reported_CB_percent.sort_values(["OBS_VALUE:Observation Value"])
Rd_CB_percent_sorted

Unnamed: 0,DATAFLOW,REF_AREA:Geographic area,INDICATOR:Indicator,SEX:Sex,TIME_PERIOD:Time period,OBS_VALUE:Observation Value,UNIT_MULTIPLIER:Unit multiplier,UNIT_MEASURE:Unit of measure,OBS_STATUS:Observation Status,OBS_CONF:Observation confidentaility,OBS_FOOTNOTE:Observation footnote,SERIES_FOOTNOTE:Series footnote,DATA_SOURCE:Data Source,SOURCE_LINK:Citation of or link to the data source,TIME_PERIOD_METHOD:Time period activity related to when the data are collected,COVERAGE_TIME:The period of time for which data are provided,AGE:Current age
4658,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ETH: Ethiopia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2016,2.7,,PCNT: %,RP: Reported,F: Free,,,DHS 2016,,EOF: End of fieldwork,2016,Y0T4: Under 5 years old
12788,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,SOM: Somalia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2006,3.0,,PCNT: %,RP: Reported,F: Free,,,MICS 2006,,EOF: End of fieldwork,2006,Y0T4: Under 5 years old
8255,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,MWI: Malawi,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2014,5.6,,PCNT: %,RP: Reported,F: Free,Data refer only to children with a birth certi...,,MICS 2013-14,,EOF: End of fieldwork,2013-14,Y0T4: Under 5 years old
10726,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,PNG: Papua New Guinea,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,13.4,,PCNT: %,RP: Reported,F: Free,,,DHS 2016-18,,EOF: End of fieldwork,2016-18,Y0T4: Under 5 years old
15423,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,ZMB: Zambia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2018,14.0,,PCNT: %,RP: Reported,F: Free,,,DHS 2018,,EOF: End of fieldwork,2018,Y0T4: Under 5 years old
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1500,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,BTN: Bhutan,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2010,99.9,,PCNT: %,RP: Reported,F: Free,,,MICS 2010,,EOF: End of fieldwork,2010,Y0T4: Under 5 years old
14075,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,TKM: Turkmenistan,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2019,99.9,,PCNT: %,RP: Reported,F: Free,,,MICS 2019,,EOF: End of fieldwork,2019,Y0T4: Under 5 years old
999,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,BHR: Bahrain,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2019,100.0,,PCNT: %,RP: Reported,F: Free,,,Information and e-Government Authority,,EOF: End of fieldwork,2019,Y0T4: Under 5 years old
12600,UNICEF:GLOBAL_DATAFLOW(1.0): Cross-sector indi...,SVK: Slovakia,PT_CHLD_Y0T4_REG: Percentage of children under...,_T: Total,2020,100.0,,PCNT: %,RP: Reported,F: Free,,,"Vital statistics, Statistical Office of Slovak...",,EOF: End of fieldwork,2020,Y0T4: Under 5 years old


#### 2.1.1.3 Filtering out important data columns

In [21]:
Filtered_Reported_CB_percent = Rd_CB_percent_sorted[["REF_AREA:Geographic area","TIME_PERIOD:Time period","OBS_VALUE:Observation Value"]]
Filtered_Reported_CB_percent

Unnamed: 0,REF_AREA:Geographic area,TIME_PERIOD:Time period,OBS_VALUE:Observation Value
4658,ETH: Ethiopia,2016,2.7
12788,SOM: Somalia,2006,3.0
8255,MWI: Malawi,2014,5.6
10726,PNG: Papua New Guinea,2018,13.4
15423,ZMB: Zambia,2018,14.0
...,...,...,...
1500,BTN: Bhutan,2010,99.9
14075,TKM: Turkmenistan,2019,99.9
999,BHR: Bahrain,2019,100.0
12600,SVK: Slovakia,2020,100.0


In [22]:
Filtered_Reported_CB_percent_lessthan50 = Filtered_Reported_CB_percent[Filtered_Reported_CB_percent["OBS_VALUE:Observation Value"]<50]
Filtered_Reported_CB_percent_lessthan50

Unnamed: 0,REF_AREA:Geographic area,TIME_PERIOD:Time period,OBS_VALUE:Observation Value
4658,ETH: Ethiopia,2016,2.7
12788,SOM: Somalia,2006,3.0
8255,MWI: Malawi,2014,5.6
10726,PNG: Papua New Guinea,2018,13.4
15423,ZMB: Zambia,2018,14.0
375,AGO: Angola,2016,25.0
2565,TCD: Chad,2019,25.7
14577,TZA: United Republic of Tanzania,2016,26.4
15360,YEM: Yemen,2013,30.7
14327,UGA: Uganda,2016,32.2


In [23]:
Filtered_Reported_CB_percent_morethan80 = Filtered_Reported_CB_percent[Filtered_Reported_CB_percent["OBS_VALUE:Observation Value"]>80]
Filtered_Reported_CB_percent_morethan80

Unnamed: 0,REF_AREA:Geographic area,TIME_PERIOD:Time period,OBS_VALUE:Observation Value
9475,MMR: Myanmar,2016,81.3
13761,TGO: Togo,2017,82.9
2126,BDI: Burundi,2017,83.5
8568,MHL: Marshall Islands,2017,83.8
6127,HTI: Haiti,2017,84.8
...,...,...,...
1500,BTN: Bhutan,2010,99.9
14075,TKM: Turkmenistan,2019,99.9
999,BHR: Bahrain,2019,100.0
12600,SVK: Slovakia,2020,100.0


From `Filtered_Reported_CB_percent_morethan80` we can read out that 75 countries have reported child births percentage in over `80%` cases

### Observation: 75 countries have reported child births percentage in over `80%` cases

---

---

## Age of mothers at childbirth

In [24]:
## Importing a data file into a panda dataframe

In [25]:
Age_mothers_raw = pd.read_csv('SF_2_3_Age_mothers_childbirth.csv')

### Let's explore this data set

Investigating a dataframe and cleaning data

In [26]:
Age_mothers_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 94 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   51 non-null     object 
 1   Unnamed: 1   51 non-null     object 
 2   Unnamed: 2   51 non-null     object 
 3   Unnamed: 3   51 non-null     object 
 4   Unnamed: 4   51 non-null     object 
 5   Unnamed: 5   51 non-null     object 
 6   Unnamed: 6   51 non-null     object 
 7   Unnamed: 7   51 non-null     object 
 8   Unnamed: 8   51 non-null     object 
 9   Unnamed: 9   51 non-null     object 
 10  Unnamed: 10  51 non-null     object 
 11  Unnamed: 11  51 non-null     object 
 12  Unnamed: 12  51 non-null     object 
 13  Unnamed: 13  51 non-null     object 
 14  Unnamed: 14  51 non-null     object 
 15  Unnamed: 15  51 non-null     object 
 16  Unnamed: 16  51 non-null     object 
 17  Unnamed: 17  51 non-null     object 
 18  Unnamed: 18  51 non-null     object 
 19  Unnamed: 1

From info() I've learned that there are 51 rows and 94 columns of data 

In [27]:
Age_mothers_raw.head(5)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93
0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,,,,,,,,,,
1,OECD-Average,..,..,..,..,..,..,..,..,..,...,,,,,,,,,,
2,Australia,27.5,27.5,27.5,27.5,27.5,27.4,27.3,27.3,27.2,...,,,,,,,,,,
3,Austria,27.6,27.5,27.5,27.4,27.4,27.3,27.1,27.0,26.8,...,,,,,,,,,,
4,Belgium,28.0,27.9,27.9,27.8,27.7,27.6,27.5,27.4,27.3,...,,,,,,,,,,


Observations:
1. could first row be used for column index and first column as row index
2. looking at the first few rows I determined that there are some null values and the question was is if some of the rows/columns could be discarded

#### Setting first row as column index and first column as row index

In [28]:
Age_mothers = Age_mothers_raw[:]
Age_mothers.columns=Age_mothers.iloc[0]
Age_mothers.head(3)

Unnamed: 0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,NaN,NaN.1,NaN.2,NaN.3,NaN.4,NaN.5,NaN.6,NaN.7,NaN.8,NaN.9
0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,,,,,,,,,,
1,OECD-Average,..,..,..,..,..,..,..,..,..,...,,,,,,,,,,
2,Australia,27.5,27.5,27.5,27.5,27.5,27.4,27.3,27.3,27.2,...,,,,,,,,,,


In [29]:
Age_mothers_index = Age_mothers.set_index('Country')
Age_mothers_index.head(3)

Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,,,,,,,,,,
OECD-Average,..,..,..,..,..,..,..,..,..,..,...,,,,,,,,,,
Australia,27.5,27.5,27.5,27.5,27.5,27.4,27.3,27.3,27.2,27.2,...,,,,,,,,,,


In [30]:
#Age_mothers.set_index('Country',inplace=True)

In [31]:
Age_mothers_index.index.duplicated()

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False])

#### Checking for columns with Null values that can be discarded

In [32]:
Age_mothers_index.isnull().sum()

0
1960     0
1961     0
1962     0
1963     0
1964     0
        ..
NaN     51
NaN     51
NaN     51
NaN     51
NaN     51
Length: 93, dtype: int64

In [33]:
Age_mothers_index.dropna(axis=1,how='all').shape

(51, 61)

In [34]:
Age_mothers_index.dropna(axis=1,how='all',inplace=True)
Age_mothers_index

Unnamed: 0_level_0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2011.0,2012,2013,2014,2015,2016,2017,2018,2019,2020
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2011.0,2012,2013,2014,2015,2016,2017,2018,2019,2020
OECD-Average,..,..,..,..,..,..,..,..,..,..,...,29.7,29.8,29.9,30.0,30.2,30.3,30.4,30.5,30.6,30.7
Australia,27.5,27.5,27.5,27.5,27.5,27.4,27.3,27.3,27.2,27.2,...,30.0,30.1,30.1,30.2,30.3,30.5,30.6,30.7,30.8,30.7
Austria,27.6,27.5,27.5,27.4,27.4,27.3,27.1,27.0,26.8,26.8,...,30.0,30.2,30.3,30.4,30.6,30.6,30.7,30.9,31.0,31.0
Belgium,28.0,27.9,27.9,27.8,27.7,27.6,27.5,27.4,27.3,27.2,...,29.8,30.0,30.2,30.3,30.4,30.5,30.6,30.7,30.8,30.8
Canada,27.9,27.8,27.8,27.8,27.9,27.8,27.7,27.5,27.3,27.3,...,30.2,30.3,30.4,30.5,30.6,30.7,30.9,31.0,31.2,31.3
Chile,29.3,29.3,29.3,29.2,29.1,29.1,29.0,28.8,28.7,28.6,...,28.0,28.1,28.3,28.5,28.8,29.1,29.4,29.6,29.9,30.1
Czech Republic,25.7,25.6,25.5,25.7,25.8,25.5,25.2,25.0,24.9,24.8,...,29.7,29.8,29.9,29.9,30.0,30.0,30.0,30.1,30.2,30.2
Colombia,29.4,29.4,29.5,29.5,29.6,29.7,29.8,29.8,29.9,29.8,...,26.1,25.9,25.9,26.0,26.2,26.2,26.4,26.5,26.5,26.6
Costa Rica,29.1,29.3,29.3,29.3,29.3,29.3,29.3,29.2,29.1,28.9,...,26.6,26.5,26.7,26.8,27.1,27.2,27.4,27.6,27.9,28.4


In [35]:
Age_mothers_index.info()

<class 'pandas.core.frame.DataFrame'>
Index: 51 entries, Country to Romania
Data columns (total 61 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   1960    51 non-null     object 
 1   1961    51 non-null     object 
 2   1962    51 non-null     object 
 3   1963    51 non-null     object 
 4   1964    51 non-null     object 
 5   1965    51 non-null     object 
 6   1966    51 non-null     object 
 7   1967    51 non-null     object 
 8   1968    51 non-null     object 
 9   1969    51 non-null     object 
 10  1970    51 non-null     object 
 11  1971    51 non-null     object 
 12  1972    51 non-null     object 
 13  1973    51 non-null     object 
 14  1974    51 non-null     object 
 15  1975    51 non-null     object 
 16  1976    51 non-null     object 
 17  1977    51 non-null     object 
 18  1978    51 non-null     object 
 19  1979    51 non-null     object 
 20  1980    51 non-null     object 
 21  1981    51 non-null     object 
 22

In [36]:
Age_mothers_index.dtypes[42]

dtype('float64')

---

---

# Visualisation