# Are Kiva loans growing over time?

### Import packages

At the beginning of our notebooks we always import all the libraries we will use.

In [2]:
import pandas as pd
import numpy as np
from ggplot import *
import matplotlib.pyplot as plt
from datetime import datetime
import dateutil.parser
pd.options.display.mpl_style = 'default'

# the matplotlib inline command is important, it tells jupyter notebook to show the output of the cell for charts
%matplotlib inline

You can access Timestamp as pandas.Timestamp
  pd.tslib.Timestamp,
  from pandas.lib import Timestamp
  from pandas.core import datetools
mpl_style had been deprecated and will be removed in a future version.
Use `matplotlib.pyplot.style.use` instead.

  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
# the command below means that the output of multiple commands in a cell will be output at once.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [None]:
# the command below tells jupyter to display up to 80 columns, this keeps everything visible
pd.set_option('display.max_columns', 80)
pd.set_option('expand_frame_repr', True)

### Import dataset

We read in our dataset from notebook 1.2 below.

In [None]:
data_path = '~/intro_course_data_science_for_good/data'

In [3]:
df=pd.read_csv(data_path+'/df.csv', low_memory=False)

NameError: name 'data_path' is not defined

In [None]:
df.sample(2)

## How many years do we have data for?

Since we imported our dataset, we lost the date conversions we did in 1.2. We do this again below.

In [None]:
df['posted_datetime'] = pd.to_datetime(df['posted_date'])
df['funded_datetime'] = pd.to_datetime(df['funded_date'])
df['planned_expiration_datetime'] = pd.to_datetime(df['planned_expiration_date'])
df['dispursal_datetime'] = pd.to_datetime(df['terms.disbursal_date'])


df['posted_date']=df['posted_datetime'].dt.date
df['planned_expiration_date']=df['planned_expiration_datetime'].dt.date
df['funded_date']=df['funded_datetime'].dt.date
df['dispersal_date']=df['dispursal_datetime'].dt.date
df['posted_year']=df['posted_datetime'].dt.year
df['posted_month']=df['posted_datetime'].dt.month
df['time_to_fund'] =df['funded_datetime'] - df['posted_datetime']
df['time_to_fund'] = df.time_to_fund.dt.days

We can check if we were successful by checking the type again. We were successful!

In [None]:
df['posted_datetime'].head(1)
df['posted_datetime'].dtype

## number of loans posted on Kiva is increasing year over year

Now that we have successfully manipulated our data fields we can plot the time range of our data. This is an important piece of sanity checking because if we see an unexpected year, or big spikes in one year we should be suspicious about the quality of our data.

In [None]:
df['posted_year'].value_counts().sort_index().plot(kind='bar',title='Number of loans on KIVA each year', fontsize=20, figsize=(8, 8))

In [None]:
df.sum()

In the chart above we can see that we have loan data from 2006 through to 2017. This seems reasonable and means KIVA has been lending in Kenya for 11 years. We also see that although the number of loans has grown each year there is a dip in 2017. However, this isn't that suspicious because remember that we are only in May, so the total volume will likely increase by December. 

## Total loan dollar amount posted to Kiva is increasing year over year

We see a similar patten in the sum of dollars loaned. It mimics the distribution of the number of loans which is a good sign. Kiva has posted more dollars to Kiva each successive year except for 2007.

In [None]:
df.groupby('posted_year')['loan_amount'].sum().plot(kind="bar", title='Loan Total Dollar Amount on KIVA By Year', fontsize=20, figsize=(8, 8))
plt.xticks(rotation=90)

In [None]:
df.groupby('posted_year')['loan_amount'].sum()

In [None]:
df.to_csv('~/intro_course_data_science_for_good/data/df.csv')