# Investment Flow Type Classification

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/unkletam/Investment_Flow_Type_Classification/master) [![Build Status](https://travis-ci.org/joemccann/dillinger.svg?branch=master)](https://travis-ci.org/joemccann/dillinger) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
### Aim
This dataset geolocates Chinese Government-financed projects that were implemented between 2000-2014. It captures 3,485 projects worth $273.6 billion in total official financing. The dataset includes both Chinese aid and non-concessional official financing.

### Dataset
This dataset is made available by AidData.
For any Licensing related queries please refer to AidData's website: https://www.aiddata.org

In [1]:
import numpy as np 
import pandas as pd 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objects as go
import plotly.express as px
import plotly as py

import warnings
warnings.filterwarnings("ignore")


In [2]:
init_notebook_mode(connected=True)

master = pd.read_csv(r'dataset\all_flow_classes.csv')
ms = master.copy() #working dataframe

#### Filter Sectors Related to Infrastructure

In [3]:
mask = ms['ad_sector_names'].isin(['Transport and Storage', 'Water Supply and Sanitation', 'Industry, Mining, Construction', 'Communications', 'Energy Generation and Supply'])
ms = ms[mask]

## Analysis
### Visualize Geo-located data

Since each record denotes a location on map corresponding to a project. It would be nice to have an idea of how the data is spread across locations. While we are at it, why not use a slider to comb through Years one by one.

*You can select a patricular category to visualize by clicking on corresponding labels in legend.*

In [4]:
ms.rename(columns = {'recipient_condensed':'country', 'transactions_start_year':'start_year', 'ad_sector_names':'sector' }, inplace = True) 

In [5]:
dfmap = ms[['project_id', 'gazetteer_adm_name', 'start_year', 'project_title','recipients',  'location_type_name', 'longitude', 'latitude','place_name', 'sector', 'usd_current']]

#### Compute cumulative investment by year - for animation

In [6]:
li = list()
dfmap  = dfmap.sort_values(by=['start_year'], ascending=True)

for year in range(2000, 2014):
    mask = dfmap['start_year'].lt(year +1 )
    di = dfmap[mask]
    di['cumulative_by_year'] = year
    li.append(di)
    
dfmap = pd.concat(li, axis=0, ignore_index=True, copy=False, sort=False)

In [7]:
mapbox_access_token  =  'pk.eyJ1IjoidW5rbGV0YW0iLCJhIjoiY2tkNnljemFxMG1mYTJ6cmE4bW1yYjczeiJ9.rBvwQf6Zw4BTA_f_O9dKbg'


dfmap['Project'] = " "+ dfmap['project_title'] + " [" + dfmap['location_type_name']+ "]"
fig = px.scatter_mapbox(dfmap, 
                     lon = dfmap['longitude'],
                     lat = dfmap['latitude'],
                     
                     opacity = 0.5,
                     color="sector", 
                     hover_name="place_name",
                    # remove longitude, latitude and add amount, start year 
                     animation_frame="cumulative_by_year",
                     )

fig.update_layout(
    width=1850, height=550,
    title = "Chinese Investment [2000 - 2014] - Click on Legend to Filter by Sector",
    legend_title_text='Sector', 
    hovermode='closest',
    mapbox=dict(
        
        accesstoken=mapbox_access_token,
        bearing=0,
        pitch=0,
        zoom=2,
        style = "dark"
    )
)
#fig.show()
iplot(fig)

#### Transform the data

In [8]:
sample_countries = ms[['country', 'project_id', 'usd_current', 'flow_class']].drop_duplicates(['project_id'])

ms_country = sample_countries.groupby('country').agg( amount_USD_by_country=pd.NamedAgg(column='usd_current', aggfunc='sum'), 
                          project_count_by_country=pd.NamedAgg(column='project_id', aggfunc='count') 
                         )
ms_country = ms_country.reset_index()
ms_country['amount_USD_by_country'] = ms_country['amount_USD_by_country'].div(1000000).round(0)

In [9]:
sample_sectors = ms[['sector', 'project_id', 'usd_current', 'flow_class']].drop_duplicates(['project_id'])
ms_sector = sample_sectors.groupby('sector').agg( amount_USD_by_sector=pd.NamedAgg(column='usd_current', aggfunc='sum'), 
                          project_count_by_sector=pd.NamedAgg(column='project_id', aggfunc='count') 
                         )
ms_sector = ms_sector.reset_index()
ms_sector['amount_USD_by_sector'] = ms_sector['amount_USD_by_sector'].div(1000000).round(0)

In [10]:
top_countries = ms_country.sort_values(by='amount_USD_by_country', ascending=False).head(10)['country'].tolist()

#### The following Pie Chart gives us a look into the proportion of aid / investment in each sector.

Most investments fall under the *Transport and Storage* sector. 





In [11]:
values = ms_sector['amount_USD_by_sector'] #ms['ad_sector_names'].value_counts()
class_ = pd.unique(ms_sector['sector'])

fig = px.pie(ms, values=values, labels=class_, names = class_)
fig.update_layout(
    width=1850, height=550,
    title = "Chinese Investment (USD) By Sector - Click on Legend to Filter",
    legend_title_text='Sector',
    showlegend=True
    
)

fig.show()

#### The following Donut Chart illustrates the proportion of projects in the top 10 countries.



In [13]:
mask = ms_country['country'].isin(top_countries)
ms_country_top = ms_country[mask]

values = ms_country_top['amount_USD_by_country']
class_ = pd.unique(ms_country_top['country'])

fig = px.pie( ms_country, values=values, labels=class_, names = class_, hole=.7)
fig.update_layout(
    width=1850, height=550,
    title = "Chinese Investment (USD) in the Top 10 Countries",
    legend_title_text='Country',
    showlegend=True
    
)

fig.show()

#### We sure would like to see which countries enjoy high number of projects backed by Chinese Investors and also the amount spent in each country.

Cambodia, Pakistan and Zimbabwe are one of the countries with highest number of projects whereas when we compare total amount spent in USD we can see that Pakistan and Russia come out on top as  the countries which recieved huge investments.


In [15]:
#Creating a TreeMap

fig = px.treemap(ms_country, 
                  path = ['country', ], values = 'project_count_by_country',
                  color='amount_USD_by_country',
                  color_continuous_scale='Tropic',
                 color_continuous_midpoint=np.average( ms_country['amount_USD_by_country']),
                  
                  
                )

fig.update_layout(
    width=1850, height=550,
    title = "MAP OF PROJECTS BACKED BY CHINA BY COUNTRY [Size: Number of Projects, Color: Amount Invested]",
    legend_title_text='Amount Invested', 
    
)

fig.show()

#fig.write_html('first_figure.html', auto_open=True)

#### Furthermore, We can visualize the proportions of each category by Country.


In [16]:
ms_c = sample_countries.sort_values(by=['usd_current'], ascending=False)
ms_c['flow_class'] = ms_c['flow_class'].replace({'ODA-like': 'Aid', 'OOF-like': 'Non-Aid', 'Vague (Official Finance)': 'N/A'})

ms_c = ms_c.loc[ms_c['country'].isin(top_countries)]


fig = px.histogram(ms_c, x = 'flow_class', title = 'NO. OF PROJECTS BY AID/NON-AID CATEGORY (IN TOP 10 COUNTRIES BY $USD AMOUNT)', color ='flow_class', facet_col ='country', facet_col_wrap=5, log_y=False )
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

for axis in fig.layout:
    if type(fig.layout[axis]) == go.layout.YAxis:
        fig.layout[axis].title.text = ''
        
for axis in fig.layout:
    if type(fig.layout[axis]) == go.layout.XAxis:
        fig.layout[axis].title.text = ''
        
fig.update_layout(
    width=1850, height=550,
    # keep the original annotations and add a list of new annotations:
    annotations = list(fig.layout.annotations) + [
        dict(
            x=0.5,
            y=-0.15,
            font=dict(
                size=14
            ),
            showarrow=False,
            text="Flow_Class",
            xref="paper",
            yref="paper"
        ),
        dict(
            x=-0.05,
            y=0.5,
            font=dict(
                size=14
            ),
            showarrow=False,
            text="Count",
            textangle=-90,
            xref="paper",
            yref="paper"
        )
    ]
)

fig.show()

#### Visualizing Investment across various Sectors.

While creating visualizations, I noticed that most investments were generally related to Infrastructure projects which doesn't seem to be something out of the way but considering China's 'Belt and Road Initiative' which costs a whopping $ 900 billion dollars would benefit from all these investments made in Transportation Sector.
The disparity between Transportationa and Storage against other sectors was so strong I had to use log attribute.

In [20]:
#pd.set_option("display.precision", 3)
ms_sector = ms_sector.sort_values(by='amount_USD_by_sector', ascending=False)
fig = px.bar(ms_sector, y='amount_USD_by_sector', x='sector', log_y = False )

fig.update_layout(
                 title="Chinese Investments By Sectors in USD",
                 xaxis_title="Sectors",
                 yaxis_title="Amount Spent (USD mm)",
                width=1850, height=550,
                 yaxis = dict(showexponent = 'all',exponentformat = 'e'))

fig.update_yaxes(nticks=3)
fig.update_xaxes(tickangle=45)

fig.update_traces(
                  marker_line_color='blue',
                  marker_line_width=1.5,opacity=0.6,
                  marker_color='indianred')



fig.show()
