# Subgroup B: Measuring Campaign ROI

## Exploratory Data Analysis

In this segment, data relevant to addressing the business question is loaded in and prepared for exploratory data analysis.

In [68]:
import  pandas as pd
import numpy as np
import os

# loading in relevant datasets
os.chdir("../updated_data/")
print(os.getcwd())
campaigns = pd.read_csv("campaigns.csv")
customer_engagement = pd.read_csv("customer_engagement.csv")

print(campaigns.head())
print(customer_engagement.head())
print("\n")
print("===========================================================")
print("Information on campaigns dataset")
campaigns.info()
print("\n")
print("===========================================================")
print("Information on customer_engagement dataset")
customer_engagement.info()

C:\Users\rusya\Desktop\NUS\Y3S2\DSA3101\updated_data
   campaign_id     customer_segment  \
0            1              Retired   
1            2        Middle-Market   
2            3              Retired   
3            4     Budget-Conscious   
4            5  Young Professionals   

                                    campaign_name             campaign_type  \
0          Wealth Management Campaign for Retired                     Email   
1            Auto Loan Campaign for Middle-Market  Mobile App Notifications   
2                   Mortgage Campaign for Retired                     Email   
3     Personal Loan Campaign for Budget-Conscious                     Email   
4  Personal Loan Campaign for Young Professionals  Mobile App Notifications   

   start_date    end_date  recommended_product_id recommended_product_name  \
0  2022-03-17  2022-05-04                     107        Wealth Management   
1  2022-03-23  2022-05-02                     106                Auto Loan   
2  

## Feature Engineering
We ensure that columns relevant to answering our business question are created before answering the subquestion in Section B: Measuring Campaign ROI.
In the following code block, the following features are generated: `conversion_rate`, `customer_lifetime_value` and `acquisition_cost`.
`ROI` of past marketing campaigns are also computed, serving as the response variable on which our model will be based on.

### Generating `conversion_rate`

`conversion_rate` is obtained by first determining the number of customers that a campaign has engaged with a customer (i.e number of rows for where `sent` == "Yes", for each campaign), to give us `total_engagements`. This is derived from the `customer_engagement` dataset.

The number of successful conversions is also derived from the `customer_engagement` dataset. `conversion_rate` for each campaign is then computed as follows:

$$
\text{conversion rate} = \frac{\text{successful conversions}}{\text{total engagements}}
$$

In [49]:
total_engagements = customer_engagement[customer_engagement["sent"] == "Yes"].groupby("campaign_id").size().reset_index(name = "total_engagements")
print(total_engagements.head())

successful_conversions = customer_engagement[customer_engagement["conversion_status"] == "Yes"] \
    .groupby("campaign_id").size().reset_index(name="successful_conversions")
print(successful_conversions.head())

conversion_data = total_engagements.merge(successful_conversions, on="campaign_id", how="left")
print(conversion_data.head())

# in case no conversions happened for a campaign, fill NA as 0
conversion_data["successful_conversions"] = conversion_data["successful_conversions"].fillna(0)

# computing conversion rate
conversion_data["conversion_rate"] = conversion_data["successful_conversions"] / conversion_data["total_engagements"]
conversion_data.head()

   campaign_id  total_engagements
0            1                473
1            2                740
2            3                946
3            4                570
4            5                829
   campaign_id  successful_conversions
0            1                      52
1            2                      59
2            3                     113
3            4                      48
4            5                      66
   campaign_id  total_engagements  successful_conversions
0            1                473                      52
1            2                740                      59
2            3                946                     113
3            4                570                      48
4            5                829                      66


Unnamed: 0,campaign_id,total_engagements,successful_conversions,conversion_rate
0,1,473,52,0.109937
1,2,740,59,0.07973
2,3,946,113,0.11945
3,4,570,48,0.084211
4,5,829,66,0.079614


### Generating `acquisition_cost`

`acquisition_cost` is generated with the following formula:

$$
\text{acquisition cost} = \frac{\text{total\_campaign\_cost}}{\text{successful\_conversions}}
$$


In [47]:
acquisition_data = conversion_data.merge(campaigns[["campaign_id","total_campaign_cost"]], on = "campaign_id")
print(acquisition_data.head())

# computing acquisition_cost
acquisition_data["acquisition_cost"] = acquisition_data["total_campaign_cost"] / acquisition_data["successful_conversions"]
acquisition_data.head()

   campaign_id  total_engagements  successful_conversions  conversion_rate  \
0            1                473                      52         0.109937   
1            2                740                      59         0.079730   
2            3                946                     113         0.119450   
3            4                570                      48         0.084211   
4            5                829                      66         0.079614   

   total_campaign_cost  
0             70597.33  
1             87461.14  
2             56053.99  
3             22910.89  
4             24374.42  


Unnamed: 0,campaign_id,total_engagements,successful_conversions,conversion_rate,total_campaign_cost,acquisition_cost
0,1,473,52,0.109937,70597.33,1357.640962
1,2,740,59,0.07973,87461.14,1482.392203
2,3,946,113,0.11945,56053.99,496.053009
3,4,570,48,0.084211,22910.89,477.310208
4,5,829,66,0.079614,24374.42,369.309394


### Generating `customer_lifetime_value`

### Generating `ROI` for past marketing campaigns

ROI is calculated using data from the `campaigns` dataset. The formula used is as follows:

$$
\text{ROI} = \frac{\text{revenue} - \text{cost}}{\text{cost}}
$$

In [55]:
roi_data = campaigns[["total_campaign_cost","total_revenue_generated"]]
roi_data["roi"] = (roi_data["total_revenue_generated"] - roi_data["total_campaign_cost"])
roi_data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  roi_data["roi"] = (roi_data["total_revenue_generated"] - roi_data["total_campaign_cost"])


Unnamed: 0,total_campaign_cost,total_revenue_generated,roi
0,70597.33,94151.78,23554.45
1,87461.14,184133.52,96672.38
2,56053.99,160828.61,104774.62
3,22910.89,58342.54,35431.65
4,24374.42,63738.91,39364.49


## Data Preparation for Machine Learning

Dataset which includes all generated features, as well as machine learning algorithms, to be implemented here.

In [77]:
## Manipulation of dataframes to create X here, data cleaning etc.



## Model Implementation (Random Forests)


In [79]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

## X = [dataset excluding response variable here]
## Y = [response variable -> roi]

#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Initialize model
#model = RandomForestRegressor(n_estimators=100, random_state=42)

## Train model
#model.fit(X_train, y_train)

## Predict on test data
#y_pred = model.predict(X_test)

