# JSON exercise

Download the data from [**here**](https://drive.google.com/file/d/1DGaX5AVfYhmWeb15lI-MzUbSKTYSz9fQ/view?usp=sharing) and answer following questions:
1. Find the 10 countries with most projects
2. What are top 10 sectors with projects?
3. Find the top 10 major project themes (using column 'mjtheme_namecode')
4. In 3. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

In [203]:
from pprint import pprint
import json
import pandas as pd
import numpy as np

### 1. Find the 10 countries with most projects

In [204]:
with open('./_data/world_bank_projects.json') as f:
  data = json.load(f)
pprint(data)

[{'_id': {'$oid': '52b213b38594d8a2be17c780'},
  'approvalfy': 1999,
  'board_approval_month': 'November',
  'boardapprovaldate': '2013-11-12T00:00:00Z',
  'borrower': 'FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA',
  'closingdate': '2018-07-07T00:00:00Z',
  'country_namecode': 'Federal Democratic Republic of Ethiopia!$!ET',
  'countrycode': 'ET',
  'countryname': 'Federal Democratic Republic of Ethiopia',
  'countryshortname': 'Ethiopia',
  'docty': 'Project Information Document,Indigenous Peoples Plan,Project '
           'Information Document',
  'envassesmentcategorycode': 'C',
  'grantamt': 0,
  'ibrdcommamt': 0,
  'id': 'P129828',
  'idacommamt': 130000000,
  'impagency': 'MINISTRY OF EDUCATION',
  'lendinginstr': 'Investment Project Financing',
  'lendinginstrtype': 'IN',
  'lendprojectcost': 550000000,
  'majorsector_percent': [{'Name': 'Education', 'Percent': 46},
                          {'Name': 'Education', 'Percent': 26},
                          {'Name': 'Public Administrati

In [205]:
df = pd.json_normalize(data)
pprint(df.columns[df.columns.str.contains('country')])
pprint(df.columns[df.columns.str.contains('project')])

Index(['countrycode', 'countryshortname', 'country_namecode', 'countryname'], dtype='object')
Index(['supplementprojectflg', 'projectfinancialtype', 'project_name',
       'projectdocs', 'lendprojectcost', 'projectstatusdisplay',
       'project_abstract.cdata'],
      dtype='object')


In [206]:
print(df['countryname'])
print('\n')
print(df['project_name'])

0      Federal Democratic Republic of Ethiopia
1                          Republic of Tunisia
2                                       Tuvalu
3                            Republic of Yemen
4                           Kingdom of Lesotho
                        ...                   
495                                    Jamaica
496           Lao People's Democratic Republic
497                         Republic of Guinea
498                      Republic of Indonesia
499                          Republic of Kenya
Name: countryname, Length: 500, dtype: object


0      Ethiopia General Education Quality Improvement...
1              TN: DTF Social Protection Reforms Support
2      Tuvalu Aviation Investment Project - Additiona...
3       Gov't and Civil Society Organization Partnership
4      Second Private Sector Competitiveness and Econ...
                             ...                        
495    Technological Scale Up for Youth-led Urban Orn...
496                  Lao Eight Pover

In [207]:
df[['countryshortname', 'project_name']].groupby('countryshortname').count().sort_values(by='project_name', ascending=False).head(10)

Unnamed: 0_level_0,project_name
countryshortname,Unnamed: 1_level_1
China,19
Indonesia,19
Vietnam,17
India,16
"Yemen, Republic of",13
Nepal,12
Bangladesh,12
Morocco,12
Mozambique,11
Africa,11


### 2. What are top 10 sectors with projects?

In [208]:
df.columns[df.columns.str.contains('sector')]

Index(['sector', 'mjsector_namecode', 'sectorcode', 'majorsector_percent',
       'sector_namecode', 'sector1.Percent', 'sector1.Name', 'sector2.Percent',
       'sector2.Name', 'sector4.Percent', 'sector4.Name', 'sector3.Percent',
       'sector3.Name'],
      dtype='object')

In [209]:
mjsector_df = pd.json_normalize(
  data,
  record_path='mjtheme_namecode',
  meta=[
    'countryshortname',
    'project_name',
  ['sector1', 'Name'],
  ['sector1', 'Percent'],
  ['sector2', 'Name'],
  ['sector2', 'Percent'],
  ['sector3', 'Name'],
  ['sector3', 'Percent'],
  ['sector4', 'Name'],
  ['sector4', 'Percent'],],
  errors='ignore')

In [210]:
sector1 = mjsector_df[['name', 'sector1.Name']].groupby('sector1.Name').count().sort_values(by='name', ascending=False).head(10)
sector2 = mjsector_df[['name', 'sector2.Name']].groupby('sector2.Name').count().sort_values(by='name', ascending=False).head(10)
sector3 = mjsector_df[['name', 'sector3.Name']].groupby('sector3.Name').count().sort_values(by='name', ascending=False).head(10)
sector4 = mjsector_df[['name', 'sector4.Name']].groupby('sector4.Name').count().sort_values(by='name', ascending=False).head(10)

In [211]:
sectors = pd.concat([sector1, sector2, sector3, sector4], axis=0).reset_index()
print(sectors.shape)
sectors.groupby('index').sum().sort_values(by='name', ascending=False).head(10)

(40, 2)


Unnamed: 0_level_0,name
index,Unnamed: 1_level_1
Other social services,341
Central government administration,295
Sub-national government administration,233
"General agriculture, fishing and forestry sector",204
Health,161
General public administration sector,132
Agricultural extension and research,109
"Public administration- Agriculture, fishing and forestry",104
Rural and Inter-Urban Roads and Highways,102
Public administration- Health,65


### 3. Find the top 10 major project themes (using column 'mjtheme_namecode')

In [212]:
mjsector_df[['code', 'name']].groupby('name').count().sort_values(by='code', ascending=False).head(10)

Unnamed: 0_level_0,code
name,Unnamed: 1_level_1
Environment and natural resources management,223
Rural development,202
Human development,197
Public sector governance,184
Social protection and risk management,158
Financial and private sector development,130
,122
Social dev/gender/inclusion,119
Trade and integration,72
Urban development,47


In [213]:
mjsector_lst = {}
for k,v in mjsector_df[['code', 'name']].values:
  if v:
    mjsector_lst.update({k: v})
mjsector_lst

{'8': 'Human development',
 '1': 'Economic management',
 '6': 'Social protection and risk management',
 '5': 'Trade and integration',
 '2': 'Public sector governance',
 '11': 'Environment and natural resources management',
 '7': 'Social dev/gender/inclusion',
 '4': 'Financial and private sector development',
 '10': 'Rural development',
 '9': 'Urban development',
 '3': 'Rule of law'}

In [218]:
mjsector_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1499 entries, 0 to 1498
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   code              1499 non-null   object
 1   name              1499 non-null   object
 2   countryshortname  1499 non-null   object
 3   project_name      1499 non-null   object
 4   sector1.Name      1499 non-null   object
 5   sector1.Percent   1499 non-null   object
 6   sector2.Name      1227 non-null   object
 7   sector2.Percent   1227 non-null   object
 8   sector3.Name      922 non-null    object
 9   sector3.Percent   922 non-null    object
 10  sector4.Name      630 non-null    object
 11  sector4.Percent   630 non-null    object
dtypes: object(12)
memory usage: 140.7+ KB


In [215]:
mjsector_missing_idx = mjsector_df[mjsector_df['name'] == ''].index

In [216]:
mjsector_df['name'] = mjsector_df['code'].map(mjsector_lst)
mjsector_df

Unnamed: 0,code,name,countryshortname,project_name,sector1.Name,sector1.Percent,sector2.Name,sector2.Percent,sector3.Name,sector3.Percent,sector4.Name,sector4.Percent
0,8,Human development,Ethiopia,Ethiopia General Education Quality Improvement...,Primary education,46,Secondary education,26,Public administration- Other social services,16,Tertiary education,12
1,11,Environment and natural resources management,Ethiopia,Ethiopia General Education Quality Improvement...,Primary education,46,Secondary education,26,Public administration- Other social services,16,Tertiary education,12
2,1,Economic management,Tunisia,TN: DTF Social Protection Reforms Support,Public administration- Other social services,70,General public administration sector,30,,,,
3,6,Social protection and risk management,Tunisia,TN: DTF Social Protection Reforms Support,Public administration- Other social services,70,General public administration sector,30,,,,
4,5,Trade and integration,Tuvalu,Tuvalu Aviation Investment Project - Additiona...,Rural and Inter-Urban Roads and Highways,100,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
1494,10,Rural development,Indonesia,Sustainable Management of Agricultural Researc...,Agricultural extension and research,80,"Public administration- Agriculture, fishing an...",15,"Agro-industry, marketing, and trade",5,,
1495,9,Urban development,Kenya,KENYA: NATIONAL URBAN TRANSPORT IMPROVEMENT PR...,Urban Transport,79,Public administration- Transportation,21,,,,
1496,8,Human development,Kenya,KENYA: NATIONAL URBAN TRANSPORT IMPROVEMENT PR...,Urban Transport,79,Public administration- Transportation,21,,,,
1497,5,Trade and integration,Kenya,KENYA: NATIONAL URBAN TRANSPORT IMPROVEMENT PR...,Urban Transport,79,Public administration- Transportation,21,,,,


In [217]:
mjsector_df[mjsector_df.index.isin(mjsector_missing_idx)]

Unnamed: 0,code,name,countryshortname,project_name,sector1.Name,sector1.Percent,sector2.Name,sector2.Percent,sector3.Name,sector3.Percent,sector4.Name,sector4.Percent
1,11,Environment and natural resources management,Ethiopia,Ethiopia General Education Quality Improvement...,Primary education,46,Secondary education,26,Public administration- Other social services,16,Tertiary education,12
13,6,Social protection and risk management,Kenya,Additional Financing for Cash Transfers for Or...,Other social services,100,,,,,,
17,8,Human development,China,China Renewable Energy Scale-Up Program Phase II,Other Renewable Energy,100,,,,,,
19,7,Social dev/gender/inclusion,India,Rajasthan Road Sector Modernization Project,Rural and Inter-Urban Roads and Highways,100,,,,,,
24,2,Public sector governance,South Sudan,Southern Sudan Emergency Food Crisis Response ...,Crops,50,Other social services,30,"General agriculture, fishing and forestry sector",20,,
...,...,...,...,...,...,...,...,...,...,...,...,...
1457,4,Financial and private sector development,Mongolia,Capacity Builiding for Emerging Infectious Dis...,Animal production,40,Health,30,General public administration sector,30,,
1477,11,Environment and natural resources management,West Bank and Gaza,Water Supply and Sanitation Improvements for W...,Sanitation,70,Water supply,30,,,,
1481,5,Trade and integration,Bangladesh,Revision and Alignment of NAP with UNCCD 10-ye...,Central government administration,100,,,,,,
1483,8,Human development,Nepal,Nepal: Pilot Project for Seismic School Safety...,General education sector,100,,,,,,
