# Donor Analysis
We use data about donors and about their donations to create a picture of the donor-nonprofit relationship. The goal is to understand the effect of the experiences that donors have on their donation habits. 

# Data Gathering
In this first step data about donors are collected. Some important data are appeals donors have received, the demographics of donors, and their donation history. Some good sources for these data are a nonprofit's Customer Relationship Management (CRM) database, donation receipts, and records of appeals sent and responded to. In addition, nonprofits may turn to peoplesearch sites or census census data for additional demographics information. As much as possible, nonprofits should try to get information about individual donors. This will allow the results to be applicable to individuals. 

In our study with a US-based nonprofit, the nonprofit had a CRM database in a SQL server. We wrote SQL queries to gather data on donation history, volunteer experiences, and also the appeals households had responded to.

In [46]:
data <- read.csv("synthetic_data.csv")
head(data)

Unnamed: 0_level_0,ID,Donations,Volunteering,Appeal1,Appeal2,Appeal3
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,11137.51,3,0,0,0.0
2,2,64737.03,16,7,0,0.77716865
3,3,7140.81,2,1,0,0.796440907
4,4,8601.0,1,0,1,0.37429541
5,5,5699.45,2,2,0,1.091352021
6,6,15301.12,3,1,0,0.007265313


# Data Cleaning and Shaping
Once the data has been gathered, it can be cleaned and shaped. Often data will have missing or meaningless values. These values are dealt with in the data cleanign step. Often records with null values are excluded from the study, or the null values are set to zero. Careful consideration has to be given to this step for the study to produce accurate results.

In data shaping, the goal is to create a table which can be used by a modeling algorithm such as Linear Regression or Supervised Machine Learning. The general
idea is to have a row for every individual donor, a column for donations, and another column for each variable being used to predict donations, such as various appeals or volunteer experiences or demographics. 

Some useful tools for this step are R, Tableau Data Prep, and several Python libraries such as Numpy.

In [47]:
# Here Appeal3 has too many digits, so we will use R's builtin libraries to truncate it
data$Appeal3 = round(data$Appeal3, digits=2)
head(data)

Unnamed: 0_level_0,ID,Donations,Volunteering,Appeal1,Appeal2,Appeal3
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,11137.51,3,0,0,0.0
2,2,64737.03,16,7,0,0.78
3,3,7140.81,2,1,0,0.8
4,4,8601.0,1,0,1,0.37
5,5,5699.45,2,2,0,1.09
6,6,15301.12,3,1,0,0.01


In [48]:
# We also want to restrict our study to donors
donors <- data[data$Donations > 0, ]
nrow(data)
nrow(donors)

# Modeling
Once the data is in the proper shape and has no null values, it can be used for modeling. The goal of modeling is to get a picture of how various factors affect donations. The simplest modeling tool is Linear Regression. Nonprofits may also use supervised machine learning tools, demonstrated in another notebook

In [49]:
model <- lm(Donations ~ Volunteering + Appeal1 + Appeal2 + Appeal3, data)
summary(model)


Call:
lm(formula = Donations ~ Volunteering + Appeal1 + Appeal2 + Appeal3, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-11564.4  -1914.7    121.2   1848.3  12145.8 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   305.467     85.108   3.589 0.000335 ***
Volunteering 3372.980     22.995 146.683  < 2e-16 ***
Appeal1      1409.754     52.862  26.669  < 2e-16 ***
Appeal2      -603.933     96.248  -6.275  3.8e-10 ***
Appeal3         5.755     51.889   0.111 0.911697    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3236 on 4995 degrees of freedom
Multiple R-squared:  0.9528,	Adjusted R-squared:  0.9528 
F-statistic: 2.523e+04 on 4 and 4995 DF,  p-value: < 2.2e-16


# Interpretation
In the interpretation step we consider the meaning of the results of the models. In this example we see that Volunteering, Appeal1, and Appeal2 are all very strong predictors of donations, as evidenced by the `***` symbol to the right of their values. On the other hand, Appeal3 is no statistically significant, so should not be considered as a predictor variable. 

In the Estimate column we see that volunteering has a strong positive correlation with donations, as does Appeal1, but Appeal2 has a strong negative correlation. With these correlations, the nonprofit may decide to discontinue Appeal3, and focus their efforts on encouraging their donors to volunteer.