# JSON Mini-Project 

Find the 10 countries with most projects
Find the top 10 major project themes (using column 'mjtheme_namecode')
In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.

**Import Libraries**

In [272]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Import JSON

In [273]:
import json
from pandas.io.json import json_normalize

Import World Bank Data and load as Pandas DataFrame. Using .info() helps inform us data types used, any missing data points and dataframe shape.

In [274]:
world_bank_df = pd.read_json('world_bank_projects.json')
world_bank_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 50 columns):
sector                      500 non-null object
supplementprojectflg        498 non-null object
projectfinancialtype        500 non-null object
prodline                    500 non-null object
mjtheme                     491 non-null object
idacommamt                  500 non-null int64
impagency                   472 non-null object
project_name                500 non-null object
mjthemecode                 500 non-null object
closingdate                 370 non-null object
totalcommamt                500 non-null int64
id                          500 non-null object
mjsector_namecode           500 non-null object
docty                       446 non-null object
sector1                     500 non-null object
lendinginstr                495 non-null object
countrycode                 500 non-null object
sector2                     380 non-null object
totalamt                    500 n

**1. Find the 10 countries with most projects.**

In [275]:
countryname = world_bank_df['countryname'].value_counts()[:13]
countryname

Republic of Indonesia              19
People's Republic of China         19
Socialist Republic of Vietnam      17
Republic of India                  16
Republic of Yemen                  13
People's Republic of Bangladesh    12
Kingdom of Morocco                 12
Nepal                              12
Africa                             11
Republic of Mozambique             11
Islamic Republic of Pakistan        9
Burkina Faso                        9
Federative Republic of Brazil       9
Name: countryname, dtype: int64

In [276]:
shortname = world_bank_df['countryshortname'].value_counts()[:13]
shortname

Indonesia             19
China                 19
Vietnam               17
India                 16
Yemen, Republic of    13
Bangladesh            12
Nepal                 12
Morocco               12
Mozambique            11
Africa                11
Brazil                 9
Pakistan               9
Burkina Faso           9
Name: countryshortname, dtype: int64

As we know "Africa" is not a country. After exploring the data visiting the associated URL with the rows for "Africa" I have determined that these projects are not supported by one specific country but the area of "West Africa". The top countries with the most projects are:
1. Republic of Indonesia              
2. People's Republic of China         
3. Socialist Republic of Vietnam      
4. Republic of India                  
5. Republic of Yemen                  
6. People's Republic of Bangladesh    
7. Kingdom of Morocco                 
8. Nepal                                                          
9. Republic of Mozambique   
10. Islamic Republic of Pakistan, Burkina Faso, Brazil tying for tenth with nine projects each.    

**2. Find the top 10 major project themes (using column 'mjtheme_namecode')**

In [277]:
themes = json_normalize(world_bank, 'mjtheme_namecode')
top_themes = themes['name'].value_counts()[:11]
top_themes  

Environment and natural resources management    223
Rural development                               202
Human development                               197
Public sector governance                        184
Social protection and risk management           158
Financial and private sector development        130
                                                122
Social dev/gender/inclusion                     119
Trade and integration                            72
Urban development                                47
Economic management                              33
Name: name, dtype: int64

Missing data is both present in both the 'name' columns so we need to fill missing entries before concluding the top 10.
      


**3. In problem 2 above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in.**

In [278]:
themes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1499 entries, 0 to 1498
Data columns (total 2 columns):
code    1499 non-null object
name    1499 non-null object
dtypes: object(2)
memory usage: 23.5+ KB


Sorting by code and name will make it easier to fill in names corresponding with codes

In [279]:
themes = themes.sort_values(['code', 'name'])
themes.head(10)

Unnamed: 0,code,name
212,1,
363,1,
1024,1,
1114,1,
1437,1,
2,1,Economic management
88,1,Economic management
175,1,Economic management
204,1,Economic management
205,1,Economic management


In order to use 'fillna' method all empty strings need to be converted to NaN values

In [280]:
themes.name[themes['name'] == ''] = np.nan
themes.head(10)

Unnamed: 0,code,name
212,1,
363,1,
1024,1,
1114,1,
1437,1,
2,1,Economic management
88,1,Economic management
175,1,Economic management
204,1,Economic management
205,1,Economic management


In [281]:
themes = themes.fillna(method='bfill')
themes.head(10)

Unnamed: 0,code,name
212,1,Economic management
363,1,Economic management
1024,1,Economic management
1114,1,Economic management
1437,1,Economic management
2,1,Economic management
88,1,Economic management
175,1,Economic management
204,1,Economic management
205,1,Economic management


In [282]:
top_themes_revised = themes['name'].value_counts()[:11]
top_themes_revised

Environment and natural resources management    250
Rural development                               216
Human development                               210
Public sector governance                        199
Social protection and risk management           168
Financial and private sector development        146
Social dev/gender/inclusion                     130
Trade and integration                            77
Urban development                                50
Economic management                              38
Rule of law                                      15
Name: name, dtype: int64