#         The World Bank EdStats Data Analysis

The World Bank EdStats All Indicator Query holds over 4,000 internationally comparable indicators that describe education access, progression, completion, literacy, teachers, population, and expenditures. The indicators cover the education cycle from pre-primary to vocational and tertiary education.The query also holds learning outcome data from international and regional learning assessments (e.g. PISA, TIMSS, PIRLS), equity data from household surveys, and projection/attainment data to 2050.

With 189 member countries, staff from more than 170 countries, and offices in over 130 locations, the World Bank Group is a unique global partnership: five institutions working for sustainable solutions that reduce poverty and build shared prosperity in developing countries.

The World Bank Group works in every major area of development. They provide a wide array of financial products and technical assistance, and they help countries share and apply innovative knowledge and solutions to the challenges they face.

__Topics:__ Education, Gender

__Granularity:__ National

__Geographical Coverage:__ World East Asia & Pacific American Samoa Australia Brunei Darussalam Cambodia China Fiji French Polynesia Guam Hong Kong SAR, China Indonesia Japan Kiribati Korea, Dem. People's Rep. Korea, Rep. Lao PDR Macao SAR, China Malaysia Marshall Islands Mongolia Myanmar Nauru New Caledonia New Zealand Northern Mariana Islands Palau Papua New Guinea Philippines Samoa Singapore Solomon Islands Thailand Timor-Leste Tonga Tuvalu Vanuatu Vietnam Europe & Central Asia Albania Andorra Armenia Austria Azerbaijan Belarus Belgium Bosnia and Herzegovina Bulgaria Croatia Cyprus Czech Republic Denmark Estonia Faroe Islands Finland France Georgia Germany Gibraltar Greece Greenland Hungary Iceland Ireland Isle of Man Italy Kazakhstan Kyrgyz Republic Latvia Liechtenstein Lithuania Luxembourg Moldova Monaco Montenegro Netherlands North Macedonia Norway Poland Portugal Romania Russian Federation San Marino Serbia Slovak Republic Slovenia Spain Sweden Switzerland Tajikistan Turkey Turkmenistan Ukraine United Kingdom Uzbekistan Latin America & Caribbean Antigua and Barbuda Aruba Argentina Bahamas, The Barbados Belize Bolivia Brazil Cayman Islands Chile Costa Rica Colombia Cuba Curaçao Dominica Dominican Republic Ecuador El Salvador Grenada Guatemala Guyana Haiti Honduras Jamaica Mexico Nicaragua Panama Paraguay Peru Puerto Rico Sint Maarten (Dutch part) St. Kitts and Nevis St. Martin (French part) St. Lucia St. Vincent and the Grenadines Suriname Trinidad and Tobago Turks and Caicos Islands Uruguay Venezuela, RB Virgin Islands (U.S.) Middle East & North Africa Algeria Bahrain Egypt, Arab Rep. Djibouti Iraq Iran, Islamic Rep. Israel Jordan Kuwait Lebanon Libya Malta Morocco Oman Qatar Saudi Arabia Syrian Arab Republic United Arab Emirates Tunisia Yemen, Rep. Bermuda Canada United States South Asia Afghanistan Bangladesh Bhutan India Pakistan Nepal Maldives Sri Lanka Angola Benin Botswana Burkina Faso Burundi Cabo Verde Cameroon Central African Republic Chad Comoros Congo, Dem. Rep. Congo, Rep. Côte d'Ivoire Ethiopia Eritrea Equatorial Guinea Gabon Gambia, The Ghana Guinea Guinea-Bissau Kenya Lesotho Liberia Madagascar Malawi Mali Mauritania Mauritius Mozambique Namibia Niger Nigeria Rwanda São Tomé and Principe Seychelles Senegal Sierra Leone Somalia South Africa South Sudan Sudan Eswatini Tanzania Togo Uganda Zambia Zimbabwe

__Economy Coverage:__ High Income IBRD IDA Low Income

__Number of Economies:__ 214

__Periodicity:__ Annual

__Temporal Coverage:__ 1970 - 2100

_________________________________________________________________________________________________________

## Importing liberaries

In [102]:
import numpy as np  # linear algebra
import pandas as pd  # data processing
import matplotlib.pyplot as plt  # for Visualization
%matplotlib inline

import seaborn as sns

## Reading Data

In [103]:
Data = pd.read_csv('EdStatsData.csv')
Data.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
0,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2,,,,,,,...,,,,,,,,,,
1,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.F,,,,,,,...,,,,,,,,,,
2,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.GPI,,,,,,,...,,,,,,,,,,
3,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.M,,,,,,,...,,,,,,,,,,
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.36554,...,,,,,,,,,,


It's clear that we have a lot of missing values so cheking for more information.

   ### MISSING values

In [101]:
Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 886930 entries, 0 to 886929
Data columns (total 70 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Country Name    886930 non-null  object 
 1   Country Code    886930 non-null  object 
 2   Indicator Name  886930 non-null  object 
 3   Indicator Code  886930 non-null  object 
 4   1970            72288 non-null   float64
 5   1971            35537 non-null   float64
 6   1972            35619 non-null   float64
 7   1973            35545 non-null   float64
 8   1974            35730 non-null   float64
 9   1975            87306 non-null   float64
 10  1976            37483 non-null   float64
 11  1977            37574 non-null   float64
 12  1978            37576 non-null   float64
 13  1979            36809 non-null   float64
 14  1980            89122 non-null   float64
 15  1981            38777 non-null   float64
 16  1982            37511 non-null   float64
 17  1983      

__To sort the number of null value that we have for each year (decreasing)__

In [122]:
Data.isnull().sum().sort_values(ascending=False).head(66)

Unnamed: 69    886930
2017           886787
2016           870470
1971           851393
1973           851385
                ...  
2011           740918
2012           739666
2000           710254
2005           702822
2010           644488
Length: 66, dtype: int64

It seems like an incomplete data set. 2010, the most complete year with 644488 missing values, but there are 3665 indicators and we need to check completeness for those relevant to our problem.

In [105]:
Data.groupby('Indicator Code').count()


Unnamed: 0_level_0,Country Name,Country Code,Indicator Name,1970,1971,1972,1973,1974,1975,1976,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
Indicator Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
BAR.NOED.1519.FE.ZS,242,242,242,144,0,0,0,0,144,0,...,0,0,0,0,0,0,0,0,0,0
BAR.NOED.1519.ZS,242,242,242,144,0,0,0,0,144,0,...,0,0,0,0,0,0,0,0,0,0
BAR.NOED.15UP.FE.ZS,242,242,242,144,0,0,0,0,144,0,...,0,0,0,0,0,0,0,0,0,0
BAR.NOED.15UP.ZS,242,242,242,144,0,0,0,0,144,0,...,0,0,0,0,0,0,0,0,0,0
BAR.NOED.2024.FE.ZS,242,242,242,144,0,0,0,0,144,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UIS.XUNIT.USCONST.3.FSGOV,242,242,242,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
UIS.XUNIT.USCONST.4.FSGOV,242,242,242,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
UIS.XUNIT.USCONST.56.FSGOV,242,242,242,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
XGDP.23.FSGOV.FDINSTADM.FFD,242,242,242,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Checking duplicated data

In [38]:
Data.duplicated().sum()

0

### The number of raws & columns of our Data

In [4]:
Data.shape

(886930, 70)

__Features of our Data__

In [4]:
Data.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978',
       '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987',
       '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996',
       '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005',
       '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2020', '2025', '2030', '2035', '2040', '2045',
       '2050', '2055', '2060', '2065', '2070', '2075', '2080', '2085', '2090',
       '2095', '2100', 'Unnamed: 69'],
      dtype='object')

## Reading Country Data

In [77]:
Country = pd.read_csv('C:/Users/azade/Desktop/OC/Projet 2/EdStatsCountry.csv')
Country.head()

Unnamed: 0,Country Code,Short Name,Table Name,Long Name,2-alpha code,Currency Unit,Special Notes,Region,Income Group,WB-2 code,...,IMF data dissemination standard,Latest population census,Latest household survey,Source of most recent Income and expenditure data,Vital registration complete,Latest agricultural census,Latest industrial data,Latest trade data,Latest water withdrawal data,Unnamed: 31
0,ABW,Aruba,Aruba,Aruba,AW,Aruban florin,SNA data for 2000-2011 are updated from offici...,Latin America & Caribbean,High income: nonOECD,AW,...,,2010,,,Yes,,,2012.0,,
1,AFG,Afghanistan,Afghanistan,Islamic State of Afghanistan,AF,Afghan afghani,Fiscal year end: March 20; reporting period fo...,South Asia,Low income,AF,...,General Data Dissemination System (GDDS),1979,"Multiple Indicator Cluster Survey (MICS), 2010/11","Integrated household survey (IHS), 2008",,2013/14,,2012.0,2000.0,
2,AGO,Angola,Angola,People's Republic of Angola,AO,Angolan kwanza,"April 2013 database update: Based on IMF data,...",Sub-Saharan Africa,Upper middle income,AO,...,General Data Dissemination System (GDDS),1970,"Malaria Indicator Survey (MIS), 2011","Integrated household survey (IHS), 2008",,2015,,,2005.0,
3,ALB,Albania,Albania,Republic of Albania,AL,Albanian lek,,Europe & Central Asia,Upper middle income,AL,...,General Data Dissemination System (GDDS),2011,"Demographic and Health Survey (DHS), 2008/09",Living Standards Measurement Study Survey (LSM...,Yes,2012,2010.0,2012.0,2006.0,
4,AND,Andorra,Andorra,Principality of Andorra,AD,Euro,,Europe & Central Asia,High income: nonOECD,AD,...,,2011. Population figures compiled from adminis...,,,Yes,,,2006.0,,


### Number of missing values

In [125]:
Country.isnull().sum().sort_values(ascending=False).head(20)

Unnamed: 31                                          241
National accounts reference year                     209
Alternative conversion factor                        194
Other groups                                         183
Latest industrial data                               134
Vital registration complete                          130
External debt Reporting status                       117
Latest household survey                              100
Latest agricultural census                            99
Lending category                                      97
PPP survey year                                       96
Special Notes                                         96
Source of most recent Income and expenditure data     81
Government Accounting concept                         80
Latest water withdrawal data                          62
Balance of Payments Manual in use                     60
IMF data dissemination standard                       60
Latest trade data              

### Checking duplicated data

In [85]:
Country.duplicated().sum()

0

### The number of raws & columns 

In [86]:
Country.shape

(241, 32)

__Features of Country Data__

In [21]:
Country.columns

Index(['Country Code', 'Short Name', 'Table Name', 'Long Name', '2-alpha code',
       'Currency Unit', 'Special Notes', 'Region', 'Income Group', 'WB-2 code',
       'National accounts base year', 'National accounts reference year',
       'SNA price valuation', 'Lending category', 'Other groups',
       'System of National Accounts', 'Alternative conversion factor',
       'PPP survey year', 'Balance of Payments Manual in use',
       'External debt Reporting status', 'System of trade',
       'Government Accounting concept', 'IMF data dissemination standard',
       'Latest population census', 'Latest household survey',
       'Source of most recent Income and expenditure data',
       'Vital registration complete', 'Latest agricultural census',
       'Latest industrial data', 'Latest trade data',
       'Latest water withdrawal data', 'Unnamed: 31'],
      dtype='object')

__Create a DF with contry_code, name, region and income_group__

In [112]:

countries_income = pd.DataFrame({"Country_Code" : Country["Country Code"].unique(), "Name" : Country["Short Name"],
                                 "Region" : Country["Region"], "Income_group" : Country["Income Group"]})

countries_income

Unnamed: 0,Country_Code,Name,Region,Income_group
0,ABW,Aruba,Latin America & Caribbean,High income: nonOECD
1,AFG,Afghanistan,South Asia,Low income
2,AGO,Angola,Sub-Saharan Africa,Upper middle income
3,ALB,Albania,Europe & Central Asia,Upper middle income
4,AND,Andorra,Europe & Central Asia,High income: nonOECD
...,...,...,...,...
236,XKX,Kosovo,Europe & Central Asia,Lower middle income
237,YEM,Yemen,Middle East & North Africa,Lower middle income
238,ZAF,South Africa,Sub-Saharan Africa,Upper middle income
239,ZMB,Zambia,Sub-Saharan Africa,Lower middle income


we have 242 countries in our data countries

__check how many countries we have in our Data__

In [115]:
Data["Country Name"].describe()

count           886930
unique             242
top       South Africa
freq              3665
Name: Country Name, dtype: object

__Check how many indicators in data and for doubles__

In [116]:
Data["Indicator Name"].describe()

count                                                886930
unique                                                 3665
top       DHS: Average years of schooling by age group. ...
freq                                                    242
Name: Indicator Name, dtype: object

Create a file with list of countries in data

In [117]:
countries = Data["Country Name"].unique()
countries = pd.DataFrame({"Country Name" : Data["Country Name"].unique()})
countries.to_csv("countries.csv")
print(countries)


                                    Country Name
0                                     Arab World
1                            East Asia & Pacific
2    East Asia & Pacific (excluding high income)
3                                      Euro area
4                          Europe & Central Asia
..                                           ...
237                        Virgin Islands (U.S.)
238                           West Bank and Gaza
239                                  Yemen, Rep.
240                                       Zambia
241                                     Zimbabwe

[242 rows x 1 columns]


__Merging__ our Data with the countries_income that we have created.

In [129]:
data_plus_country = pd.merge(Data,countries_income, left_on='Country Code', right_on='Country_Code')
data_plus_country.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2080,2085,2090,2095,2100,Unnamed: 69,Country_Code,Name,Region,Income_group
0,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2,,,,,,,...,,,,,,,ARB,Arab World,,
1,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.F,,,,,,,...,,,,,,,ARB,Arab World,,
2,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.GPI,,,,,,,...,,,,,,,ARB,Arab World,,
3,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.M,,,,,,,...,,,,,,,ARB,Arab World,,
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.36554,...,,,,,,,ARB,Arab World,,


In [130]:
data_plus_country.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 883265 entries, 0 to 883264
Data columns (total 74 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Country Name    883265 non-null  object 
 1   Country Code    883265 non-null  object 
 2   Indicator Name  883265 non-null  object 
 3   Indicator Code  883265 non-null  object 
 4   1970            72278 non-null   float64
 5   1971            35508 non-null   float64
 6   1972            35594 non-null   float64
 7   1973            35514 non-null   float64
 8   1974            35708 non-null   float64
 9   1975            87268 non-null   float64
 10  1976            37445 non-null   float64
 11  1977            37562 non-null   float64
 12  1978            37564 non-null   float64
 13  1979            36771 non-null   float64
 14  1980            89109 non-null   float64
 15  1981            38738 non-null   float64
 16  1982            37504 non-null   float64
 17  1983      

In [134]:
data_plus_country.isnull().sum().sort_values(ascending= False).head(20)

Unnamed: 69    883265
2017           883122
2016           866819
1971           847757
1973           847751
1972           847671
1974           847557
1979           846494
1976           845820
1982           845761
1989           845732
1977           845703
1978           845701
1983           844837
1988           844720
1984           844701
1987           844631
1981           844527
1986           843900
2050           831829
dtype: int64

In [131]:
s1= data_plus_country.dtypes
s1

Country Name       object
Country Code       object
Indicator Name     object
Indicator Code     object
1970              float64
                   ...   
Unnamed: 69       float64
Country_Code       object
Name               object
Region             object
Income_group       object
Length: 74, dtype: object

## Reading Country_Series

In [82]:
Country_Series = pd.read_csv('C:/Users/azade/Desktop/OC/Projet 2/EdStatsCountry-Series.csv')
Country_Series.head()

Unnamed: 0,CountryCode,SeriesCode,DESCRIPTION,Unnamed: 3
0,ABW,SP.POP.TOTL,Data sources : United Nations World Population...,
1,ABW,SP.POP.GROW,Data sources: United Nations World Population ...,
2,AFG,SP.POP.GROW,Data sources: United Nations World Population ...,
3,AFG,NY.GDP.PCAP.PP.CD,Estimates are based on regression.,
4,AFG,SP.POP.TOTL,Data sources : United Nations World Population...,


### Number of missing values in Country_Series

In [83]:
Country_Series.isnull().sum().sort_values(ascending=False)

Unnamed: 3     613
DESCRIPTION      0
SeriesCode       0
CountryCode      0
dtype: int64

### Duplicated Country_Series Data

In [84]:
Country_Series.duplicated().sum()

0

### The number of raws & columns 

In [71]:
Country_Series.shape

(613, 4)

## Reading FootNote Data

In [87]:
FootNote = pd.read_csv('C:/Users/azade/Desktop/OC/Projet 2/EdStatsFootNote.csv')
FootNote.head()

Unnamed: 0,CountryCode,SeriesCode,Year,DESCRIPTION,Unnamed: 4
0,ABW,SE.PRE.ENRL.FE,YR2001,Country estimation.,
1,ABW,SE.TER.TCHR.FE,YR2005,Country estimation.,
2,ABW,SE.PRE.TCHR.FE,YR2000,Country estimation.,
3,ABW,SE.SEC.ENRL.GC,YR2004,Country estimation.,
4,ABW,SE.PRE.TCHR,YR2006,Country estimation.,


### Number of missing values in FootNote

In [90]:
FootNote.isnull().sum().sort_values(ascending= False)

Unnamed: 4     643638
DESCRIPTION         0
Year                0
SeriesCode          0
CountryCode         0
dtype: int64

### Duplicated data

In [91]:
FootNote.duplicated().sum()

0

### The number of raws & columns 

In [70]:
FootNote.shape

(643638, 5)

## Reading Series

In [96]:
Series = pd.read_csv('C:/Users/azade/Desktop/OC/Projet 2/EdStatsSeries.csv')
Series.head()

Unnamed: 0,Series Code,Topic,Indicator Name,Short definition,Long definition,Unit of measure,Periodicity,Base Period,Other notes,Aggregation method,...,Notes from original source,General comments,Source,Statistical concept and methodology,Development relevance,Related source links,Other web links,Related indicators,License Type,Unnamed: 20
0,BAR.NOED.1519.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 15-19 with...,Percentage of female population age 15-19 with...,,,,,,...,,,Robert J. Barro and Jong-Wha Lee: http://www.b...,,,,,,,
1,BAR.NOED.1519.ZS,Attainment,Barro-Lee: Percentage of population age 15-19 ...,Percentage of population age 15-19 with no edu...,Percentage of population age 15-19 with no edu...,,,,,,...,,,Robert J. Barro and Jong-Wha Lee: http://www.b...,,,,,,,
2,BAR.NOED.15UP.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 15+ with n...,Percentage of female population age 15+ with n...,,,,,,...,,,Robert J. Barro and Jong-Wha Lee: http://www.b...,,,,,,,
3,BAR.NOED.15UP.ZS,Attainment,Barro-Lee: Percentage of population age 15+ wi...,Percentage of population age 15+ with no educa...,Percentage of population age 15+ with no educa...,,,,,,...,,,Robert J. Barro and Jong-Wha Lee: http://www.b...,,,,,,,
4,BAR.NOED.2024.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 20-24 with...,Percentage of female population age 20-24 with...,,,,,,...,,,Robert J. Barro and Jong-Wha Lee: http://www.b...,,,,,,,


### Number of missing values in Series Data

In [97]:
Series.isnull().sum().sort_values(ascending=False)

Unnamed: 20                            3665
Related indicators                     3665
Other web links                        3665
Unit of measure                        3665
License Type                           3665
Notes from original source             3665
Development relevance                  3662
General comments                       3651
Limitations and exceptions             3651
Statistical concept and methodology    3642
Aggregation method                     3618
Periodicity                            3566
Related source links                   3450
Base Period                            3351
Other notes                            3113
Short definition                       1509
Source                                    0
Long definition                           0
Indicator Name                            0
Topic                                     0
Series Code                               0
dtype: int64

### Duplicated data

In [98]:
Series.duplicated().sum()

0

### The number of raws & columns 

In [99]:
Series.shape

(3665, 21)

In [23]:
Series.columns

Index(['Series Code', 'Topic', 'Indicator Name', 'Short definition',
       'Long definition', 'Unit of measure', 'Periodicity', 'Base Period',
       'Other notes', 'Aggregation method', 'Limitations and exceptions',
       'Notes from original source', 'General comments', 'Source',
       'Statistical concept and methodology', 'Development relevance',
       'Related source links', 'Other web links', 'Related indicators',
       'License Type', 'Unnamed: 20'],
      dtype='object')

In [28]:
Series.shape

(3665, 5)

In [41]:
Series.head(50)

Unnamed: 0,Series Code,Topic,Indicator Name,Long definition,Source
0,BAR.NOED.1519.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 15-19 with...,Robert J. Barro and Jong-Wha Lee: http://www.b...
1,BAR.NOED.1519.ZS,Attainment,Barro-Lee: Percentage of population age 15-19 ...,Percentage of population age 15-19 with no edu...,Robert J. Barro and Jong-Wha Lee: http://www.b...
2,BAR.NOED.15UP.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 15+ with n...,Robert J. Barro and Jong-Wha Lee: http://www.b...
3,BAR.NOED.15UP.ZS,Attainment,Barro-Lee: Percentage of population age 15+ wi...,Percentage of population age 15+ with no educa...,Robert J. Barro and Jong-Wha Lee: http://www.b...
4,BAR.NOED.2024.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 20-24 with...,Robert J. Barro and Jong-Wha Lee: http://www.b...
5,BAR.NOED.2024.ZS,Attainment,Barro-Lee: Percentage of population age 20-24 ...,Percentage of population age 20-24 with no edu...,Robert J. Barro and Jong-Wha Lee: http://www.b...
6,BAR.NOED.2529.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 25-29 with...,Robert J. Barro and Jong-Wha Lee: http://www.b...
7,BAR.NOED.2529.ZS,Attainment,Barro-Lee: Percentage of population age 25-29 ...,Percentage of population age 25-29 with no edu...,Robert J. Barro and Jong-Wha Lee: http://www.b...
8,BAR.NOED.25UP.FE.ZS,Attainment,Barro-Lee: Percentage of female population age...,Percentage of female population age 25+ with n...,Robert J. Barro and Jong-Wha Lee: http://www.b...
9,BAR.NOED.25UP.ZS,Attainment,Barro-Lee: Percentage of population age 25+ wi...,Percentage of population age 25+ with no educa...,Robert J. Barro and Jong-Wha Lee: http://www.b...


In [50]:
Series['Indicator Name'].value_counts()


Gross enrolment ratio, upper secondary, male (%)                                                                                      1
Barro-Lee: Percentage of population age 20-24 with primary schooling. Total (Incomplete and Completed Primary)                        1
Percentage of students enrolled in Engineering, Manufacturing and Construction programmes in tertiary education who are female (%)    1
Adult illiterate population, 15+ years, male (number)                                                                                 1
SABER: (Engaging the Private Sector, Government funded) Policy Goal 8 Lever 5: Funding                                                1
                                                                                                                                     ..
Barro-Lee: Percentage of population age 35-39 with no education                                                                       1
PIAAC: Mean Young Adult Numeracy Proficiency. Fe

In [26]:
 Series.drop(['Unnamed: 20','Related indicators','Other web links','Unit of measure','License Type',
              'Notes from original source','Development relevance','General comments','Limitations and exceptions',
                'Statistical concept and methodology','Aggregation method','Periodicity',
                      'Related source links','Base Period','Other notes','Short definition'],axis=1, inplace=True)

### Q1:Which countries have a strong potential of customers for our services?