# Prescriptive Models and Data Analytics Problem Set #1

## Table 1

### Question 1

**Load the “charitable giving.csv” dataset and run a regression to assess whether the average “Number of months since last donation” is significantly different between treatment and control. Interpret the relevant regression coefficients and compare the regression-based comparison to the group-specific means reported in Table 1 of the paper.**

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [2]:
df = pd.read_csv('charitable_giving.csv')

df.head(5)

Unnamed: 0,donation_amount,donation_dummy,control,treatment,match_ratio,ratio1,ratio2,ratio3,red_state_dummy,months_since_last_donation,highest_previous_donation,prior_donations_num
0,0.0,0.0,0.0,1.0,1.0,1,0.0,0.0,1.0,19.0,500.0,32.0
1,0.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,29.0,300.0,22.0
2,0.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,3.0,500.0,22.0
3,0.0,0.0,0.0,1.0,3.0,0,0.0,1.0,0.0,4.0,250.0,29.0
4,0.0,0.0,0.0,1.0,2.0,0,1.0,0.0,0.0,8.0,50.0,17.0


In [3]:
number_month_reg = smf.ols(formula = 'months_since_last_donation ~ treatment', data = df)
result = number_month_reg.fit()
print(result.summary())

                                OLS Regression Results                                
Dep. Variable:     months_since_last_donation   R-squared:                       0.000
Model:                                    OLS   Adj. R-squared:                 -0.000
Method:                         Least Squares   F-statistic:                   0.01428
Date:                        Fri, 26 Jan 2024   Prob (F-statistic):              0.905
Time:                                23:54:13   Log-Likelihood:            -1.9585e+05
No. Observations:                       50082   AIC:                         3.917e+05
Df Residuals:                           50080   BIC:                         3.917e+05
Df Model:                                   1                                         
Covariance Type:                    nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------

**Interpretation**: The intercept represents the average number of months since last donation for the control group, which is 12.9981. The treatment coefficient represents the increase in the average number of months in the treatment group relative to the control group, which is 0.0137 but it is not statistically significant.


**Compared to Group-specific Means in Report**: In Table 1, the average number of months since last donation for the control group is 12.998, which is exactly the same as the regression method (=12.998). The average number of months since last donation for the treatment group is 13.012, which is also similiar to the regression model 13.0118 (=12.9981 + 0.0137).

### Question 2

**Is the difference in “Number of month since last donation” between treatment and control statistically significant (at the usual 95% confidence level)? Is this the result you expected?**

The treatment coefficient represents the difference between treatment and control group, which is 0.0137. However, it is not significant at the 95% confidence level as the p_value (0.905) is greater than 0.05. 

Yes, this result is expected. Because the matching grant offers should affect the amount of donation instead of the frequency of the donation. Since individuals are randomly assigned to control and treatment, so the treatment coefficient is insignificant.

### Question 3

**More generally, describe the take-away from Table 1 in the paper.**

The treatment and control groups are similar in key characteristics (member activity, census demographics, and state-level activity of organization) before the experiment began. This similarity is important because it supports the conclusion that any differences observed in the donating behaviors during the experiment are likely due to the treatment (matching grants) rather than pre-existing differences between the groups. 

## Response rate regressions

### Question 1

**Run a linear regression of response rate (the donation dummy) on the treatment dummy (and an intercept). Interpret both coefficients and compare them to the results presented in the first row of Table 2a.**

In [4]:
donation_dummy_reg = smf.ols(formula = 'donation_dummy ~ treatment', data = df)
result = donation_dummy_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:         donation_dummy   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     9.618
Date:                Fri, 26 Jan 2024   Prob (F-statistic):            0.00193
Time:                        23:54:13   Log-Likelihood:                 26630.
No. Observations:               50083   AIC:                        -5.326e+04
Df Residuals:                   50081   BIC:                        -5.324e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0179      0.001     16.225      0.0

**Interpretation**: It has a positive and highly significant effect in this regression. This is a linear probability model (0-1 Y-variable), therefore the intercept represents the donation probability in the control group, which is 1.79%. The treatment coefficient represents the increase in donation probability in the treatment group relative to the control group, which is 0.42% and is statistically significant. 

In Table 2a, it is noted that the control group has a response rate of 1.8%, while the treatment group exhibits a slightly higher rate of 2.2%. This aligns with the regression results, where the response rate for the control group is 1.79% and for the treatment group is 2.21%, indicating an increase of 0.42%.

### Question 2

**Run a regression on three dummies for match ratio treatment (1:1, 2:1, and 3:1 and an intercept). Interpret all four regression coefficients.**

In [5]:
donation_dummy_3_reg = smf.ols(formula = 'donation_dummy ~ ratio1 + ratio2 + ratio3', data = df)
result = donation_dummy_3_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:         donation_dummy   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     3.665
Date:                Fri, 26 Jan 2024   Prob (F-statistic):             0.0118
Time:                        23:54:13   Log-Likelihood:                 26630.
No. Observations:               50083   AIC:                        -5.325e+04
Df Residuals:                   50079   BIC:                        -5.322e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0179      0.001     16.225      0.0

**Interpretation**: the intercept represents the donation probability in the control group, which is still 1.79%. 

The ratio1 coefficient represents the increase in donation probability in the 1:1 match ratio treatment group relative to the control group, which is 0.29% and is statistically insignificant at 95% confidence level.

The ratio2 coefficient represents the increase in donation probability in the 2:1 match ratio treatment group relative to the control group, which is 0.48% and is statistically significant at 95% confidence level.

The ratio3 coefficient represents the increase in donation probability in the 3:1 match ratio treatment group relative to the control group, which is 0.49% and is statistically significant at 95% confidence level.

### Question 3

**Calculate the response rate difference between the 1:1 and 2:1 match ratios.**

In [6]:
0.0048 - 0.0029

0.0018999999999999998

The difference between the 1:11 and 2:1 match ratios is 0.19%

### Question 4

**Based on the regressions you just ran and more generally the results in Table 2a, what do you
conclude regarding the effectiveness of using matched donations?**

Based on the regression result and the results in Table 2a, all three match ratios have a positive effect on the donation probability compared to the control group (noticing ratio1 is insignificant). However, the differences between the three match ratios (0.29%, 0.48%, and 0.49%) are minimal, indicating that higher match ratios only slightly increase the donation rate. 

## Response rates in red/blue states

### Question 1

**Repeat the regression of response rate on treatment and an intercept (do not include separate match ratio dummies). But this time, base the regression only on respondents in blue states or red states. I.e. run two regressions, one on each of the two sub-samples of data. Interpret the coefficients in both regressions. Is the treatment more effective in red or blue states?**

In [7]:
donation_dummy_red_reg = smf.ols(formula = 'donation_dummy ~ treatment', data = df[df['red_state_dummy'] == 1])
result = donation_dummy_red_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:         donation_dummy   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     17.24
Date:                Fri, 26 Jan 2024   Prob (F-statistic):           3.31e-05
Time:                        23:54:13   Log-Likelihood:                 10839.
No. Observations:               20242   AIC:                        -2.167e+04
Df Residuals:                   20240   BIC:                        -2.166e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0146      0.002      8.398      0.0

**Interpretation**: the intercept represents the donation probability in the red states in the control group, which is 1.46%. 

The treatment coefficient represents the increase in donation probability in the red states relative to the control group, which is 0.88% and is statistically significant at 95% confidence level.

In [8]:
donation_dummy_blue_reg = smf.ols(formula = 'donation_dummy ~ treatment', data = df[df['red_state_dummy'] == 0])
result = donation_dummy_blue_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:         donation_dummy   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.3567
Date:                Fri, 26 Jan 2024   Prob (F-statistic):              0.550
Time:                        23:54:13   Log-Likelihood:                 15783.
No. Observations:               29806   AIC:                        -3.156e+04
Df Residuals:                   29804   BIC:                        -3.155e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0200      0.001     14.085      0.0

**Interpretation**: the intercept represents the donation probability in the blue states in the control group, which is 2%. 

The treatment coefficient represents the increase in donation probability in the blue states relative to the control group, which is 0.1% and is statistically insignificant at 95% confidence level.

By comparing treatment coefficient (0.1%) from the blue states to the treatment coefficient (0.88%) from the red states, the treatment is more effective in red states.

### Question 2

**States are of course not randomly assigned. Does the treatment coefficient have a causal interpretation in each of the two regressions? Does the difference in the treatment effect between states have a causal interpretation?**

In the red state regression, the p-value of the treatment coefficient is 0, which is smaller than 0.05, indicating statistical significance. This suggests a correlation, but the study needs to implement randomization and control for other confounding variables to establish a causal interpretation.

In the blue state regression, the p-value of the treatment coefficient is 0.550, which is greater than 0.05, indicating statistical insignificance. Therefore, it does not necessarily imply a causal interpretation.

Difference in treatment coefficient between states does not have a causal interpretation. Since states are not randomly assigned, there could be many unobserved factors that systematically differ between red and blue states and influence the likelihood of donating probability.

## Response rates and donation amount

### Question 1

**Run a regression of dollars given on a treatment dummy and an intercept. Interpret the regression coefficients. Does the treatment coefficient have a causal interpretation?**

In [9]:
dollar_reg = smf.ols(formula = 'donation_amount ~ treatment', data = df)
result = dollar_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:        donation_amount   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     3.461
Date:                Fri, 26 Jan 2024   Prob (F-statistic):             0.0628
Time:                        23:54:13   Log-Likelihood:            -1.7946e+05
No. Observations:               50083   AIC:                         3.589e+05
Df Residuals:                   50081   BIC:                         3.589e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.8133      0.067     12.063      0.0

**Interpretation**: the intercept represents the average dollars given for the control group, which is 0.8133. 

The treatment coefficient represents the increase in the average dollars given in the treatment group relative to the control group, which is 0.1536 and is statistically insignificant at 95% confidence level. 

Because the treatment coefficient is insignificant, so I cannot conclude that the treatment coefficient has a casual relationship. 

### Question 2

**Next, regress dollars given on a treatment dummy and an intercept, but base the regression only on respondents that made a donation (i.e. donation dummy is equal to 1). This regression allows you to analyze how much respondents donate conditional on donating some positive amount. Interpret the regression coefficients. Does the treatment coefficient have a causal interpretation?**

In [10]:
dollar_reg = smf.ols(formula = 'donation_amount ~ treatment', data = df[df['donation_dummy'] == 1])
result = dollar_reg.fit()
print(result.summary())

                            OLS Regression Results                            
Dep. Variable:        donation_amount   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                    0.3374
Date:                Fri, 26 Jan 2024   Prob (F-statistic):              0.561
Time:                        23:54:13   Log-Likelihood:                -5326.8
No. Observations:                1034   AIC:                         1.066e+04
Df Residuals:                    1032   BIC:                         1.067e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     45.5403      2.423     18.792      0.0

**Interpretation**: the intercept represents the average dollars given for the control group, which is 45.5403, conditional on donating a positive amount. 

The treatment coefficient represents the decrease in the average dollars given in the treatment group relative to the control group conditional on donating a positive amount, which is 1.6684 and is statistically insignificant at 95% confidence level. 

Because the treatment coefficient is insignificant, so I cannot conclude that the treatment coefficient has a casual relationship. 