# Business Experimentation and Causal Methods

## Assignment 2: Rocket Fuel

Your Name Here

Due Date: 12:30pm, February 6, 2024

__Instructions:__

Please read the case about RocketFuel from the HBS Case Pack.
For the assignment you need to read the Rocketfuel case, but answer the questions in the assignment. You can ignore the questions on the HBS website


__Important Tips:__

- Remember to write your name in the above markdown cell.

- Remember to write out your answers in words, don’t just output Python statistics.

- Before you submit the notebook, please make sure that the text is readable and does not spill over the right side of the screen. To prevent this from happening, make sure to write your verbal answer in the markdown blocks.
  
- The definitions for the columns in the data are in case! Please read them carefully.


In [11]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# import modules and functions

import numpy as np
import pandas as pd
import seaborn as sns
sns.set(rc={'figure.figsize':(15, 8)})

from statsmodels.stats.weightstats import ttest_ind # for t-test

# read data

ads_data = pd.read_csv('rocketfuel_data.csv')

ads_data

Unnamed: 0,user_id,test,converted,tot_impr,mode_impr_day,mode_impr_hour
0,1069124,1,0,130,1,20
1,1119715,1,0,93,2,22
2,1144181,1,0,21,2,18
3,1435133,1,0,355,2,10
4,1015700,1,0,276,5,14
...,...,...,...,...,...,...
588043,1496403,1,0,24,2,19
588044,1496404,1,0,199,6,19
588045,1496405,1,0,211,6,15
588046,1496406,1,0,98,5,19


### 1. ATE and statistical significance.
#### 1.a What is the ATE (hat) of the ads on purchases (conversions)? 


In [2]:
# your code here

#### 1.b Did the campaign cause more purchases? Is this difference statistically significant? 
Hint: Use the t.test function. For example, the code below conducts a t-test on the number of impressions.

In [12]:
tstat, pvalue, df = ttest_ind(ads_data.loc[ads_data['test'] == 1, 'tot_impr'], 
                              ads_data.loc[ads_data['test'] == 0, 'tot_impr'],
                              alternative = 'two-sided', usevar = 'pooled', value = 0)

print(f"t-score (t): {tstat}")
print(f"P-value (p): {pvalue}")

t-score (t): 0.2127032030589101
P-value (p): 0.8315585419840091


Modify the function above to get the right answer. Your answer in the code chunk below.

In [4]:
# your code here

### 2. Was the campaign profitable?
#### 2.a How much more profit did TaskaBella make by running the campaign (excluding advertising costs) ?
Hint: the profit per conversion is given on page 2 of the case.

#### 2.b What was the cost of the campaign (including the control group)?  
Hint: The cost per thousand impressions is $9

#### 2.c Calculate the ROI of the campaign (including the control group). Was the campaign profitable?  
The ROI is calculated by 
$$\text{ROI} = \frac{\text{Effect on Profits per Person in Campaign} - \text{Cost of Ads per Person in Campaign}}{\text{Cost of Ads per Person in Campaign}}$$

#### 2.d What was the opportunity cost of including a control group --- how much more could TaskaBella have made by not having a control group at all? 

### 3. Did the number of impressions seen by each user influence the effectiveness of advertising?

#### 3a. Plot the conversion rate by treatment group and by the number of impressions seen by users. 

In [13]:
# Useful code for creating bins (modify as you like)
bins = pd.IntervalIndex.from_tuples([(0,100), (100,500), (500, 10000)])

ads_data['group_tot_impr'] = pd.cut(ads_data['tot_impr'], bins).astype(str)


#### 3.b Based on the above figure, can we say that more impressions cause more conversions? (No more than 2 sentences)

Answer here:

### 4 Calculate the power of this experiment.

#### 4.a Calculate Cohen’s D. Cohen’s D, in this case, is the estimated average treatment effect on conversion divided by the standard deviation of conversion.

#### 4.b Use the `power_ttest2n` function in `pingouin` to calculate the power of the experiment. 
Note, this is very similar to the TTestPower function you've been shown previously, but this one also allows for treatment arms of a different size.

In [14]:
##  requirements: pingouin-0.5.4
from pingouin import power_ttest2n

cohens_d = 0.05 #(modify this value as you like)
power = power_ttest2n(nx = 100, ny = 100, d = cohens_d, 
                      alpha = 0.05, alternative = 'two-sided')
power

0.06429839333874966

#### 4.c What would the power be instead if the true effect had a cohen's D of .01?

#### 4.d What would the power be instead if the true effect had a cohen's of .01 and the sample was equally split between treatment and control?

### 5. Case Discussion in Class
Please write what you would discuss in your presentation to TaskaBella. Your answer should be one paragraph and the paragraph should be five or fewer sentences. Be prepared to discuss in class. Think about what is the most important thing to say to TaskaBella. 
No additional analysis is needed to answer this question.

### How long did this problem set take you in hours? How did you find the level of difficulty?
