# LockerDome Technical Assessment

## Overview
This Notebook details the exploration and analysis of data from a single advertiser between the dates 12/01/2020 and 02/28/2021.

## Project Goals
Through the lens of optimizing for LockerDome:
- Using raw historical campaign data, make concrete optimization recommendations for Creative.ID, Slot.ID, and Device variables.
- Deliver results as tables and graphics that illustrate results
- Detail any additional observations

## Metrics
The main metric in this analysis will be **Net Revenue** (Gross - Publisher Split)

Secondary metrics that may be explored are **Impressions** and **Referrals** as they can eventually lead to more conversions. 

## Data
The data used in this analysis describes a single advertiser's campaigns.
The data is imported and previewed below.

In [1]:
# Import Key DataFrame Libraries
import pandas as pd
import numpy as np

In [2]:
#Import and Preview Data
df = pd.read_csv('data/project_data.csv')
df.head()

Unnamed: 0,Slot.ID,Placement,Ad.Slot.Page.Layout,Creative.ID,Content.Type,Content.Has.Play.Button,Date,Device,Size,Publisher.Split,Impressions,Referrals,Conversions
0,1,,,1,advertorial,yes,12/01/2020,Other,Unknown,0.0,0,0,0
1,2,external_in_content,full_article,1,advertorial,yes,12/01/2020,Mobile,Responsive,0.02,10,0,0
2,3,external_above_content,full_article,1,advertorial,yes,12/01/2020,Mobile,Responsive,0.32,94,0,0
3,4,external_below_content,full_article,1,advertorial,yes,12/01/2020,Mobile,Responsive,0.0,54,0,0
4,5,external_below_content,full_article,1,advertorial,yes,12/01/2020,Mobile,Responsive,0.03,21,1,0


In [3]:
# Inspect Column Datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5484486 entries, 0 to 5484485
Data columns (total 13 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   Slot.ID                  int64  
 1   Placement                object 
 2   Ad.Slot.Page.Layout      object 
 3   Creative.ID              int64  
 4   Content.Type             object 
 5   Content.Has.Play.Button  object 
 6   Date                     object 
 7   Device                   object 
 8   Size                     object 
 9   Publisher.Split          float64
 10  Impressions              int64  
 11  Referrals                int64  
 12  Conversions              int64  
dtypes: float64(1), int64(5), object(7)
memory usage: 544.0+ MB


In [4]:
# Inspect Missing Values
df.isna().sum()

Slot.ID                        0
Placement                  12326
Ad.Slot.Page.Layout        12330
Creative.ID                    0
Content.Type                   0
Content.Has.Play.Button        0
Date                           0
Device                         0
Size                           0
Publisher.Split                0
Impressions                    0
Referrals                      0
Conversions                    0
dtype: int64

### Feature Creation/DataType Correction
Through 1/7, conversion value is assumed to be \\$35.00. All dates following, the conversion value is assumed to be \\$40.00. This information can be used to calculate gross and net revenue.

To help create these features efficiently, the Date column will be converted to the ```datetime64``` data type. Using a lambda function and the corrected Date column, a conversion_value column can be properly populated.

In [5]:
# Convert Date column from object to datetime
df['Date'] = pd.to_datetime(df['Date'])

In [6]:
# Create conversion value column
value_change_date = pd.to_datetime('01/08/2021')
df['conversion_value'] = df['Date'].map(lambda x: 35 if x < value_change_date else 40)

In [7]:
# Gross revenue = conversions * conversion value
df['gross_revenue'] = df['Conversions']*df['conversion_value']

In [8]:
# Net revenue = Gross revenue - Publisher Split*Conversions
df['net_revenue'] = df['gross_revenue'] - df['Publisher.Split']*df['Conversions']

In [9]:
#Preview Current DataFrame
df.head()

Unnamed: 0,Slot.ID,Placement,Ad.Slot.Page.Layout,Creative.ID,Content.Type,Content.Has.Play.Button,Date,Device,Size,Publisher.Split,Impressions,Referrals,Conversions,conversion_value,gross_revenue,net_revenue
0,1,,,1,advertorial,yes,2020-12-01,Other,Unknown,0.0,0,0,0,35,0,0.0
1,2,external_in_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.02,10,0,0,35,0,0.0
2,3,external_above_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.32,94,0,0,35,0,0.0
3,4,external_below_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.0,54,0,0,35,0,0.0
4,5,external_below_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.03,21,1,0,35,0,0.0


### Dealing with Missing Values
A small percentage of the values (roughly 0.22%) in columns ```Placement``` and ```Ad.Slot.Page.Layout``` are missing. Upon inspection, none of the rows with missing data produced any profit, leading to the assumption that more data in those rows may be misleading.

For the purpose of this analysis, rows with missing columns will be removed.

In [10]:
# checking if rows with missing Placement values produce any revenue
df[df['Placement'].isna()]['net_revenue'].sum()

0.0

In [11]:
# Checking if rows with missing Ad.Slot.Page.Layout produce any revenue
df[df['Ad.Slot.Page.Layout'].isna()]['net_revenue'].sum()

0.0

In [12]:
# Drop Missing Data and Reset Index
df.dropna(inplace=True)
df.reset_index(inplace=True, drop=True)

In [13]:
df.head()

Unnamed: 0,Slot.ID,Placement,Ad.Slot.Page.Layout,Creative.ID,Content.Type,Content.Has.Play.Button,Date,Device,Size,Publisher.Split,Impressions,Referrals,Conversions,conversion_value,gross_revenue,net_revenue
0,2,external_in_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.02,10,0,0,35,0,0.0
1,3,external_above_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.32,94,0,0,35,0,0.0
2,4,external_below_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.0,54,0,0,35,0,0.0
3,5,external_below_content,full_article,1,advertorial,yes,2020-12-01,Mobile,Responsive,0.03,21,1,0,35,0,0.0
4,6,external_feed,homepage,1,advertorial,yes,2020-12-01,Mobile,Responsive,5.03,6368,395,2,35,70,59.94


***Recap:*** At this point, the data is in the correct datatype, new features have been created (including our primary metric), and missing values have been dealt with.

## Exploratory Data Analysis

### Data Preprocessing

#### Train-Test Split

##### Standardization

## Baseline Models

### Model 1

### Model 2

### Model 3

## Model Tuning

## Model Evaluation

## Conclusions

## Deployment

## Future Work

explore why some values were missing