# DataScience for Good: DonersChoose.org

The below will include the kernel for an investigation into the doner modelling from the data publicly available at Kaggle - https://www.kaggle.com/donorschoose/io/data


The problem description from Kaggle:

A good solution will enable DonorsChoose.org to build targeted email campaigns recommending specific classroom requests to prior donors. Part of the challenge is to assess the needs of the organization, uncover insights from the data available, and build the right solution for this problem. Submissions will be evaluated on the following criteria:

    Performance - How well does the solution match donors to project requests to which they would be motivated to donate? DonorsChoose.org will not be able to live test every submission, so a strong entry will clearly articulate why it will be effective at motivating repeat donations.

    Adaptable - The DonorsChoose.org team wants to put the winning submissions to work, quickly. Therefore a good entry will be easy to implement in production.

    Intelligible - A good entry should be easily understood by the DonorsChoose.org team should it need to be updated in the future to accommodate a changing marketplace.


In [46]:
import pandas as pd
import numpy as np

columns = ['Project ID',
          'Donation ID',
          'Donor ID',
          'Donation Included Optional Donation',
          'Donation Amount',
          'Donor Cart Sequence',
          'Donation Received Date']

dtypes = {'Project ID': object,
          'Donation ID': object,
          'Donor ID': object,
          'Donation Included Optional Donation': object,
          'Donation Amount': np.float64,
          'Donor Cart Sequence': np.float64,
          'Donation Received Date': object}

#needed header=0 as names was passed to the function, but first line of csv contains column titles. This ensures they're not converted
df = pd.read_csv('Donations.csv', names = columns, header=0, dtype = dtypes)

In [49]:
#Note the as_index option will enable the datasets to be joined. The default behaviour of the groupby method is to make the groupby variable an index.
df_donor_count = df[['Donation ID', 'Donor ID']].groupby('Donor ID', as_index=False).count()
df_donor_recency = df[['Donor ID', 'Donation Received Date']].groupby('Donor ID', as_index=False).max()
df_donor_donations = df[['Donor ID', 'Donation Amount']].groupby('Donor ID', as_index=False).agg({'Donation Amount': ['min', 'max', 'avg, ''sum']})

KeyboardInterrupt: 

In [None]:
df_donor_count.rename(columns = {'Donor ID': 'Donor ID', 'Donation ID':'Donation Count'}, inplace=True) 
df_donor_recency.rename(columns = {'Donor ID': 'Donor ID', 'Donation Received Date': 'Most Recent Donation'}, inplace=True)

In [None]:
print(df_donor_recency.index, df_donor_count.index)

In [None]:
df_donor_int = pd.merge(df_donor_count, df_donor_recency, how='inner', on='Donor ID')
df_donor = pd.merge(df_donor_int, df_donor_donations, how='inner', on='Donor ID')