# DONATION PREDICTOR

## PROBLEM STATEMENT

What is the problem that you’re solving? Who are you solving it for? 
* Reaching out to people asking for them to signup to become a donor or donate money is difficult. We have millions of potential clients but the majority of them refuse to donate money to charities. It becomes increasingly difficult and inefficient (cost wise and time wise) to reach out to every single person. In order to lower expenses and waste less time, there needs to be a method to reach out to high potential donors who are willing to donate. We can use machine learning to predict high potential individuals who are likely to respond/donate and give this model as a tool for organizations. Organizations who may find this helpful are nonprofits, charities, or hospitals.

What are potential features, whether already built or that you’ll need to make? What is your target variable? What transformations will you need to perform on your input data?
* The dataset comes with about 50 features which may help. Some relevant features past money donations, response to surveys, donation amount, income, and donation frequency. We'll need to make a new feature column that is a boolean for whether someone has donated in 1997 (year of the data set). This feature column would be based off of column 'TARGET_D', which is the frequency of donations in the year of 1997. This and "TARGET_D" will be our targeted variables (if they donated in 1997 & how many times they've donated in 1997). There are also many columns that may need to get one hot encoded such as if the donor lives in the sububurbs, the city, or rural area.     

What modeling approach(es) will you use for your problem? Is this classification or regression?
How will you evaluate your model’s performance? This should be in terms of quantitative metrics, qualitative evaluation, and visualizations.
* This is a regression problem since we are predicting the amount of times an individual will donate in the current year. Some modeling approaches we can take logistic regression, neural networks, regression trees, and random forest. Metrics we can utilize to measure our performance are:
1. Mean Squared Error(MSE)
2. Root-Mean-Squared-Error(RMSE)
3. Mean-Absolute-Error(MAE)
4. R² or Coefficient of Determination
5. Adjusted R²

Some visualizations we can utilize to understand the data and our results are:
1. Histogram of income demographic
2. Scatter plot on average amount of donation and frequency
3. Correlation of features with heatmaps
4. Whisker plots for donation distributions

# PROJECT TIMELINE

NOTE - Please also include a rough project timeline, dividing up the work among your group members and demonstrating how you will complete both the analysis and the presentation by May 10th. 

## Denny Liang

Complete by May 5th

* Data Preprocessing
* Create a logistic regression model
* Create histogram of income demographic of donors

## Mohammed Bhuiyan

Complete by May 5th

* Create a regression tree model
* Scatter plot on average of donation and frequency
* Measure performance using Mean Squared Error (MSE)
* Measure performance using Root Mean Squared Error (RMSE)

## Myriam Yumbla

Complete by May 5th

* Create a neural network model
* Handling hyperparameter tuning
* Correlation of features with heatmaps

## Ravid Rahman

Complete by May 5th

* Create a random forest model
* Measure performance of model with Mean-Absolute-Error(MAE)
* Measure performance of model with R² or Coefficient of Determination
* Measure performance of model with Adjusted R²

After May 5th we will start working on the presentation and should have it completed before May 10th

# Dataset 

In [2]:
import pandas as pd

In [4]:

raw_train_data = pd.read_csv('Raw_Data_for_train_test.csv')
raw_train_data

Unnamed: 0,TARGET_B,TARGET_D,CONTROL_NUMBER,MONTHS_SINCE_ORIGIN,DONOR_AGE,IN_HOUSE,URBANICITY,SES,CLUSTER_CODE,HOME_OWNER,...,LIFETIME_GIFT_RANGE,LIFETIME_MAX_GIFT_AMT,LIFETIME_MIN_GIFT_AMT,LAST_GIFT_AMT,CARD_PROM_12,NUMBER_PROM_12,MONTHS_SINCE_LAST_GIFT,MONTHS_SINCE_FIRST_GIFT,FILE_AVG_GIFT,FILE_CARD_GIFT
0,0,,5,101,87.0,0,?,?,.,H,...,15.0,20.0,5.0,15.0,5,12,26,92,8.49,7
1,1,10.0,12,137,79.0,0,R,2,45,H,...,20.0,25.0,5.0,17.0,7,21,7,122,14.72,12
2,0,,37,113,75.0,0,S,1,11,H,...,23.0,28.0,5.0,19.0,11,32,6,105,16.75,16
3,0,,38,92,,0,U,2,4,H,...,14.0,17.0,3.0,15.0,11,33,6,92,11.76,12
4,0,,41,101,74.0,0,R,2,49,U,...,20.0,25.0,5.0,25.0,6,19,18,92,8.83,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19367,0,,191687,89,66.0,1,U,1,3,H,...,22.0,25.0,3.0,15.0,6,13,18,81,17.17,3
19368,0,,191710,137,77.0,1,C,1,24,H,...,9.0,10.0,1.0,10.0,6,13,21,130,7.81,13
19369,0,,191746,29,,1,S,1,11,U,...,0.0,15.0,15.0,15.0,3,9,23,23,15.00,0
19370,0,,191775,129,78.0,1,?,?,.,U,...,20.0,25.0,5.0,25.0,7,24,8,129,18.33,11
