# NLP Modeling to Predict Chinese Project Financing

### Since the Chinese government is not transparent in its aid contributions, the data from William and Mary's Aid Data project is missing a lot of project funding amounts. The text columns do have a lot of descriptive text columns, so I thought it would be interesting to see how well NLP modeling would predict the amount contributed to projects.

In [24]:
# Importing libraries needed for modeling
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction import stop_words
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor, ExtraTreesRegressor, AdaBoostRegressor 

In [25]:
# Reading in cleaned Chinese data
china_aid = pd.read_csv('./aid_data/aid_data_wm/chinese_official_finance_clean.csv')

In [26]:
# Looking at the dataframe I read in 
china_aid.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,null_amounts_as_zero
0,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,18.08581,-15.9785,Nouakchott,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0
1,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,15.15846,-12.1843,S√©libaby,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0
2,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,16.61659,-11.40453,Kiffa,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0
3,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,0.0,0.0,Mauritania,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0
4,Mauritania,Transport and Storage,Loan (excluding debt rescheduling),ODA-like,"Export-Import Bank of China, Government Agency",Development,Nouakchott,18.08581,-15.9785,Nouakchott,396886331.0,2008,China issues 2 billion yuan loan to fund Port ...,Mauritania,ChinatoAfrica,396886331.0


In [27]:
# Creating one column with all the text into one column 
china_aid['words'] = china_aid['funding_agency'] + ' ' + china_aid['all_recipients'] + ' ' + china_aid['crs_sector_name'] + ' ' + china_aid['flow'] + ' ' + china_aid['flow_class'] + ' ' + china_aid['intent'] + ' ' + china_aid['location_details'] + ' ' + china_aid['place_name'] + ' ' + china_aid['project_title'] + ' ' + china_aid['recipient_condensed'] + ' ' + china_aid['round_coded']

In [28]:
# Checking to see how my dataframe looks 
china_aid.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,null_amounts_as_zero,words
0,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,18.08581,-15.9785,Nouakchott,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
1,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,15.15846,-12.1843,S√©libaby,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
2,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,16.61659,-11.40453,Kiffa,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
3,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,0.0,0.0,Mauritania,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
4,Mauritania,Transport and Storage,Loan (excluding debt rescheduling),ODA-like,"Export-Import Bank of China, Government Agency",Development,Nouakchott,18.08581,-15.9785,Nouakchott,396886331.0,2008,China issues 2 billion yuan loan to fund Port ...,Mauritania,ChinatoAfrica,396886331.0,"Export-Import Bank of China, Government Agency..."


In [29]:
# Looking at the info for missing values and data types 
china_aid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3644 entries, 0 to 3643
Data columns (total 17 columns):
all_recipients          3644 non-null object
crs_sector_name         3644 non-null object
flow                    3644 non-null object
flow_class              3644 non-null object
funding_agency          3644 non-null object
intent                  3644 non-null object
location_details        3644 non-null object
latitude                3644 non-null float64
longitude               3644 non-null float64
place_name              3644 non-null object
usd_defl_2014           2208 non-null float64
year                    3644 non-null int64
project_title           3644 non-null object
recipient_condensed     3644 non-null object
round_coded             3644 non-null object
null_amounts_as_zero    3644 non-null float64
words                   3644 non-null object
dtypes: float64(4), int64(1), object(12)
memory usage: 484.1+ KB


In [30]:
# Creating a data frame with known values for modeling 
china_aid_known = china_aid[china_aid['null_amounts_as_zero'] != 0] 

In [31]:
# Creating a dataframe with all the unknown values
china_aid_unknown = china_aid[china_aid['null_amounts_as_zero'] == 0.0]

In [32]:
# Looking at the shape of my unknown values
china_aid_unknown.shape

(1436, 17)

In [33]:
# Looking at my unknown aid dataframe 
china_aid_unknown.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,null_amounts_as_zero,words
0,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,18.08581,-15.9785,Nouakchott,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
1,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,15.15846,-12.1843,S√©libaby,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
2,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,16.61659,-11.40453,Kiffa,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
3,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,0.0,0.0,Mauritania,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."
6,Angola,Government and Civil Society,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,-8.83682,13.23432,Luanda,,2001,Computers and Sewing Machines,Angola,ChinatoAfrica,0.0,"Unspecified Chinese Government Institution, Go..."


In [34]:
# Reseting my index so that the numbers are consecutive 
china_aid_unknown = china_aid_unknown.reset_index(drop=True)

In [35]:
# Dropping the column of zeros 
china_aid_unknown = china_aid_unknown.drop(columns=['null_amounts_as_zero'])

In [36]:
# Changing the name of the column to aid amount 
china_aid_known.rename(columns={'null_amounts_as_zero' : 'aid_amount'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


In [37]:
# Looking at the info for the known aid 
china_aid_known.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2208 entries, 4 to 3643
Data columns (total 17 columns):
all_recipients         2208 non-null object
crs_sector_name        2208 non-null object
flow                   2208 non-null object
flow_class             2208 non-null object
funding_agency         2208 non-null object
intent                 2208 non-null object
location_details       2208 non-null object
latitude               2208 non-null float64
longitude              2208 non-null float64
place_name             2208 non-null object
usd_defl_2014          2208 non-null float64
year                   2208 non-null int64
project_title          2208 non-null object
recipient_condensed    2208 non-null object
round_coded            2208 non-null object
aid_amount             2208 non-null float64
words                  2208 non-null object
dtypes: float64(4), int64(1), object(12)
memory usage: 310.5+ KB


In [38]:
# Looking at how the aid amounts are distributed 
china_aid_known['aid_amount'].describe()

count    2.208000e+03
mean     1.764717e+08
std      3.838847e+08
min      1.760000e+02
25%      4.109015e+06
50%      4.100000e+07
75%      1.346656e+08
max      2.846909e+09
Name: aid_amount, dtype: float64

In [39]:
# Using my words variable as the X variable and aid_amount as my y (predicted) variable 
X = china_aid_known['words']
y = china_aid_known['aid_amount']

In [40]:
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    random_state=42)

In [41]:
# Setting the pipeline for random forest 
pipe = Pipeline([('tfidf', TfidfVectorizer()),
                     ('rf', RandomForestRegressor())
                ])
# Pipeline parameters
pipe_params = {
    'tfidf__max_features': [10000],
    'tfidf__ngram_range': [(1,1)],
    'tfidf__stop_words' : [stop_words.ENGLISH_STOP_WORDS],
    'rf__n_estimators': [100, 150],
    'rf__max_depth': [None, 5, 6]
}
# Instantiating a grid search
gs = GridSearchCV(pipe, 
                  param_grid=pipe_params) 
# Fitting my model
gs.fit(X_train, y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=Pipeline(memory=None,
                                steps=[('tfidf',
                                        TfidfVectorizer(analyzer='word',
                                                        binary=False,
                                                        decode_error='strict',
                                                        dtype=<class 'numpy.float64'>,
                                                        encoding='utf-8',
                                                        input='content',
                                                        lowercase=True,
                                                        max_df=1.0,
                                                        max_features=None,
                                                        min_df=1,
                                                        ngram_range=(1, 1),
                                                      

In [42]:
# Looking at the best score 
gs.best_score_

0.9174359749937853

In [43]:
# Setting the model as the best estimator 
gs_model = gs.best_estimator_

In [44]:
# Scoring my model on the training data
gs_model.score(X_train, y_train)

0.9901172610766108

In [45]:
# Scoring my model on the testing data. The model is a bit overfit. 
gs_model.score(X_test, y_test)

0.9264090347217876

In [46]:
# Setting the pipeline for tfidf and extra trees
pipe2 = Pipeline([('tfidf', TfidfVectorizer()),
                     ('xt', ExtraTreesRegressor())
                ])
# Setting the pipeline parameters
pipe_params2 = {
    'tfidf__max_features': [10000],
    'tfidf__ngram_range': [(1,1)],
    'tfidf__stop_words' : [stop_words.ENGLISH_STOP_WORDS],
    'xt__n_estimators': [100, 150],
    'xt__max_depth': [None, 5, 6]
}
# Instantiating the grid search
gs2 = GridSearchCV(pipe2, 
                  param_grid=pipe_params2) 
# Fitting the model
gs2.fit(X_train, y_train)

GridSearchCV(cv=None, error_score=nan,
             estimator=Pipeline(memory=None,
                                steps=[('tfidf',
                                        TfidfVectorizer(analyzer='word',
                                                        binary=False,
                                                        decode_error='strict',
                                                        dtype=<class 'numpy.float64'>,
                                                        encoding='utf-8',
                                                        input='content',
                                                        lowercase=True,
                                                        max_df=1.0,
                                                        max_features=None,
                                                        min_df=1,
                                                        ngram_range=(1, 1),
                                                      

In [47]:
# Checking my best score 
gs2.best_score_

0.9356264698065665

In [48]:
# Setting the model as the best estimator 
gs_model2 = gs2.best_estimator_

In [49]:
# Checking the training score 
gs_model2.score(X_train, y_train)

0.9999999999906333

In [50]:
# Checking the testing score. It's a bit overfit. 
gs_model2.score(X_test, y_test)

0.945507191198287

In [51]:
# Generating predictions based from the model for the unknown data 
predictions = gs2.predict(china_aid_unknown['words'])

In [52]:
# Looking at the shape of my predictions
predictions.shape

(1436,)

In [53]:
# Putting my predicted value into a data frame
china_aid_predicted = pd.DataFrame(predictions, columns=['aid_amount'])

In [54]:
# Looking at the shape of my data frame
china_aid_predicted.shape

(1436, 1)

In [55]:
# Checking out my dataframe
china_aid_predicted.head()

Unnamed: 0,aid_amount
0,2096398.68
1,2207918.52
2,2486903.2
3,3548931.7
4,5394323.98


In [56]:
# Checking the shape of the unknown aid dataframe
china_aid_unknown.shape

(1436, 16)

In [57]:
# Looking at my dataframe 
china_aid_unknown.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,words
0,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,18.08581,-15.9785,Nouakchott,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go..."
1,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,15.15846,-12.1843,S√©libaby,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go..."
2,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,16.61659,-11.40453,Kiffa,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go..."
3,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,0.0,0.0,Mauritania,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go..."
4,Angola,Government and Civil Society,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,-8.83682,13.23432,Luanda,,2001,Computers and Sewing Machines,Angola,ChinatoAfrica,"Unspecified Chinese Government Institution, Go..."


In [58]:
# Merging my predictions with my unknown dataframe 
china_aid_modeled = china_aid_unknown.merge(china_aid_predicted, left_index=True, right_index=True)

In [59]:
# Looking at the shape of the merged data frame 
china_aid_modeled.shape

(1436, 17)

In [60]:
# Making a column to distinguish the values predicted by modeling
# This way I can exclude the predicted data if I would like
china_aid_modeled['predicted_by_modeling']= True

In [61]:
# Looking at my modeled data 
china_aid_modeled.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,words,aid_amount,predicted_by_modeling
0,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,18.08581,-15.9785,Nouakchott,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go...",2096398.68,True
1,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,15.15846,-12.1843,S√©libaby,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go...",2207918.52,True
2,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,16.61659,-11.40453,Kiffa,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go...",2486903.2,True
3,Mauritania,Health,Free-standing technical assistance,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,0.0,0.0,Mauritania,,2010,29th medical team to Mauritania to assist loca...,Mauritania,ChinatoAfrica,"Unspecified Chinese Government Institution, Go...",3548931.7,True
4,Angola,Government and Civil Society,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,unknown,-8.83682,13.23432,Luanda,,2001,Computers and Sewing Machines,Angola,ChinatoAfrica,"Unspecified Chinese Government Institution, Go...",5394323.98,True


In [62]:
# Making a column for the known values 
china_aid_known['predicted_by_modeling']= False 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [63]:
# Looking at my dataframe 
china_aid_known.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,usd_defl_2014,year,project_title,recipient_condensed,round_coded,aid_amount,words,predicted_by_modeling
4,Mauritania,Transport and Storage,Loan (excluding debt rescheduling),ODA-like,"Export-Import Bank of China, Government Agency",Development,Nouakchott,18.08581,-15.9785,Nouakchott,396886331.0,2008,China issues 2 billion yuan loan to fund Port ...,Mauritania,ChinatoAfrica,396886331.0,"Export-Import Bank of China, Government Agency...",False
5,Angola,Emergency Response,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,Bie Province,-12.34989,17.3031,Prov√≠ncia do Bi√©,1364094.0,2001,"China grants $600,000 USD in food aid for floo...",Angola,ChinatoAfrica,1364094.0,"Unspecified Chinese Government Institution, Go...",False
10,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.60166,24.7281,Jwaneng,51378371.0,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,"Unspecified Chinese Government Institution, Go...",False
11,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-25.22435,25.67728,Lobatse,51378371.0,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,"Unspecified Chinese Government Institution, Go...",False
12,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.65451,25.90859,Gaborone,51378371.0,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,"Unspecified Chinese Government Institution, Go...",False


In [64]:
# Dropping the columns for words and usd_defl_2014 (since the latter is a repeat)
china_aid_known = china_aid_known.drop(columns=['words', 'usd_defl_2014'])

In [65]:
# Setting the columns to be the same order as the known data for appending them
china_aid_modeled = china_aid_modeled[['all_recipients', 'crs_sector_name', 'flow', 'flow_class', 'funding_agency', 'intent', 'location_details', 'latitude', 'longitude', 'place_name', 'year', 'project_title', 'recipient_condensed', 'round_coded', 'aid_amount', 'predicted_by_modeling']]

In [66]:
# Appending the two data frames 
china_aid_all = china_aid_known.append([china_aid_modeled])

In [67]:
# Looking at the combined new data frame
china_aid_all.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,year,project_title,recipient_condensed,round_coded,aid_amount,predicted_by_modeling
4,Mauritania,Transport and Storage,Loan (excluding debt rescheduling),ODA-like,"Export-Import Bank of China, Government Agency",Development,Nouakchott,18.08581,-15.9785,Nouakchott,2008,China issues 2 billion yuan loan to fund Port ...,Mauritania,ChinatoAfrica,396886331.0,False
5,Angola,Emergency Response,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,Bie Province,-12.34989,17.3031,Prov√≠ncia do Bi√©,2001,"China grants $600,000 USD in food aid for floo...",Angola,ChinatoAfrica,1364094.0,False
10,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.60166,24.7281,Jwaneng,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False
11,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-25.22435,25.67728,Lobatse,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False
12,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.65451,25.90859,Gaborone,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False


In [68]:
# Looking for null values and data types. Everything seems good
china_aid_all.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3644 entries, 4 to 1435
Data columns (total 16 columns):
all_recipients           3644 non-null object
crs_sector_name          3644 non-null object
flow                     3644 non-null object
flow_class               3644 non-null object
funding_agency           3644 non-null object
intent                   3644 non-null object
location_details         3644 non-null object
latitude                 3644 non-null float64
longitude                3644 non-null float64
place_name               3644 non-null object
year                     3644 non-null int64
project_title            3644 non-null object
recipient_condensed      3644 non-null object
round_coded              3644 non-null object
aid_amount               3644 non-null float64
predicted_by_modeling    3644 non-null bool
dtypes: bool(1), float64(3), int64(1), object(11)
memory usage: 459.1+ KB


In [69]:
# Making a column for aid in millions
china_aid_all['aid_in_millions'] = china_aid_all['aid_amount'].div(1000000)

In [70]:
# Looking to see the new column 
china_aid_all.head()

Unnamed: 0,all_recipients,crs_sector_name,flow,flow_class,funding_agency,intent,location_details,latitude,longitude,place_name,year,project_title,recipient_condensed,round_coded,aid_amount,predicted_by_modeling,aid_in_millions
4,Mauritania,Transport and Storage,Loan (excluding debt rescheduling),ODA-like,"Export-Import Bank of China, Government Agency",Development,Nouakchott,18.08581,-15.9785,Nouakchott,2008,China issues 2 billion yuan loan to fund Port ...,Mauritania,ChinatoAfrica,396886331.0,False,396.886331
5,Angola,Emergency Response,Grant,ODA-like,"Unspecified Chinese Government Institution, Go...",Development,Bie Province,-12.34989,17.3031,Prov√≠ncia do Bi√©,2001,"China grants $600,000 USD in food aid for floo...",Angola,ChinatoAfrica,1364094.0,False,1.364094
10,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.60166,24.7281,Jwaneng,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False,51.378371
11,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-25.22435,25.67728,Lobatse,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False,51.378371
12,Botswana,Other Social infrastructure and services,Loan (excluding debt rescheduling),ODA-like,"Unspecified Chinese Government Institution, Go...",Development,"Maun, Jwaneng, Gaborone, Lobatse, Francistown ...",-24.65451,25.90859,Gaborone,2004,China loans 117 million BWP for medium and low...,Botswana,ChinatoAfrica,51378371.0,False,51.378371


In [71]:
# Saving my dataframe to a csv 
china_aid_all.to_csv('./aid_data/aid_data_wm/chinese_aid_modeled.csv', index=False)