#### COMPGW02 Web Economics - Coursework
# Online Advertising: Optimal Bidding Strategy
### Alexandros Baltas, Maximilian Bartolo, Gerard Cardoso Negrie
Date: 14 April 2017

## Introduction

In this section, we combine the Logistic Regression, One-Class SVM and Neural Network models to create a single bidding strategy.

### Importing Python Libraries
Let's start off my importing the libraries and packages we'll be using for our analysis as well as setting our default options.

In [1]:
# Importing the libraries
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.stats
import glob, re, random, itertools, time
from collections import Counter, defaultdict
from datetime import datetime, timedelta

# Importing additional required libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import LinearSVC, SVC, OneClassSVM
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_selection import RFE
from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix
from sklearn import preprocessing
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline

# Set options
pd.set_option("display.max_colwidth", 1000)
# pd.options.mode.chained_assignment = None  # default='warn'

# Set a random seed for repeatability
rand_seed = 27
random.seed(rand_seed)
np.random.seed(rand_seed)

### Combined Model
We note that performance for our Logistic Regression model is also very good, therefore, we attempt a combined model which ensembles both model predictions.

#### Validation Function

In [2]:
glob_cash_in_bank = 25000000
glob_cash_in_bank = glob_cash_in_bank * (1)

In [3]:
def validate_results(df, cash_in_bank, random=True):
    col_name_validate = 'bidprice_validate'
    
    df_temp = df[df[col_name_validate] > 0]
    if random == True:
        df_temp = df_temp.sample(frac=1, random_state=rand_seed).reset_index(drop=True)
    
    strategy_impressions = 0
    strategy_clicks = 0
    n_rows_in_budget = 0
    
    for row in df_temp.iterrows():
        row = row[1]
        if cash_in_bank > 0:
            n_rows_in_budget += 1
            if row[col_name_validate] > row['payprice']: #was bidprice
                strategy_impressions += 1
                strategy_clicks += int(row['click'])
                cash_in_bank -= row['payprice'] #was bidprice but Jun Wang said payprice
        else:
            break

    return cash_in_bank, strategy_impressions, strategy_clicks

def calc_results(df_validate, df_submit, budget_ratio):
    glob_cash_in_bank = 25000000
    glob_cash_in_bank = glob_cash_in_bank * budget_ratio
    
    cash_in_bank = glob_cash_in_bank
    df_validate['bidprice_validate'] = df_submit['bidprice'].copy()

    cash_in_bank, strategy_impressions, strategy_clicks = \
                    validate_results(df=df_validate, cash_in_bank=cash_in_bank, random=True)
    cost = (glob_cash_in_bank-cash_in_bank)/1000
    ctr = strategy_clicks/strategy_impressions
    cpc = cost/strategy_clicks
    
    return cost, strategy_impressions, strategy_clicks, ctr, cpc

def print_strategy_results(df_validate, df_submit):
    budget_ratios = [1, 1/2, 1/4, 1/8, 1/16]
    budget_ratio_names = ['Full', '1/2', '1/4', '1/8', '1/16']
    for i, budget_ratio in enumerate(budget_ratios):
        cost, strategy_impressions, strategy_clicks, ctr, cpc = calc_results(df_validate, df_submit, budget_ratio)
        print ("{} Budget:".format(budget_ratio_names[i]))
        print ("Cost: ${:.2f}  |  Impressions: {:.0f}   |   Clicks: {:.0f}  |  CTR: {:.5f}%  |  CPC: ${:.2f}" \
           .format(cost, strategy_impressions, strategy_clicks, ctr*100, cpc))
        print ()

In [4]:
# Importing the dataset to validate on
df_val = pd.read_csv("data/validation.csv")
# Total clicks in the validation data set
print ("Total clicks in the data set we are validating against is {}".format(df_val['click'].sum()))

Total clicks in the data set we are validating against is 226


In [5]:
col_names = ['bidid', 'bidprice']

In [6]:
# OCSVM Submissions
df_submit_ocsvm = pd.read_csv("data/submission_val_ocsvm.csv")
df_submit_ocsvm.columns = col_names
df_submit_ocsvm['click_predict'] = df_submit_ocsvm['bidprice'].copy()

In [7]:
# LR Submissions
df_submit_lr = pd.read_csv("data/lr_validation_results.csv")
df_submit_lr = df_submit_lr[['bidid', 'clickpred']].copy()
df_submit_lr.columns = col_names
df_submit_lr['click_predict'] = df_submit_lr['bidprice'].copy()

In [8]:
# NN Submissions
df_submit_nn = pd.read_csv("data/nn_val_preds.csv")
df_submit_nn = df_submit_nn[['bidid', 'clickprob']].copy()
df_submit_nn.columns = col_names
df_submit_nn['click_predict'] = df_submit_nn['bidprice'].copy()

Next, we combine the three models into one dataframe to facilitate validation.

In [9]:
df_submit_combined = df_submit_ocsvm.copy()
df_submit_combined['click_predict_lr'] = df_submit_lr['click_predict'].copy()
df_submit_combined['click_predict_nn'] = df_submit_nn['click_predict'].copy()

We define a basic constant bidding strategy and validate the results.

In [10]:
c = 177
w_ocsvm = 1 # Weighting for the One-Class SVM Model
w_lr = 1 # Weighting for the Logistic Regression Model
w_nn = 2.5 # Weighting for the Neural Network Model

df_submit_combined['bidprice'] = (w_ocsvm*df_submit_combined['click_predict'] + w_lr*df_submit_combined['click_predict_lr'] \
                                 + w_nn*df_submit_combined['click_predict_nn'])*c
print_strategy_results(df_val, df_submit_combined)

Full Budget:
Cost: $5113.05  |  Impressions: 83167   |   Clicks: 172  |  CTR: 0.20681%  |  CPC: $29.73

1/2 Budget:
Cost: $5113.05  |  Impressions: 83167   |   Clicks: 172  |  CTR: 0.20681%  |  CPC: $29.73

1/4 Budget:
Cost: $5113.05  |  Impressions: 83167   |   Clicks: 172  |  CTR: 0.20681%  |  CPC: $29.73

1/8 Budget:
Cost: $3125.05  |  Impressions: 50694   |   Clicks: 102  |  CTR: 0.20121%  |  CPC: $30.64

1/16 Budget:
Cost: $1562.53  |  Impressions: 25206   |   Clicks: 58  |  CTR: 0.23010%  |  CPC: $26.94



### Generating the Submission File

In [11]:
col_names = ['bidid', 'bidprice']

In [12]:
# OCSVM Test Predictions
df_submit_ocsvm = pd.read_csv("data/submission_test_ocsvm.csv")
df_submit_ocsvm.columns = col_names
df_submit_ocsvm['click_predict'] = df_submit_ocsvm['bidprice'].copy()

In [13]:
# LR Test Predictions
df_submit_lr = pd.read_csv("data/lr_test_results.csv")
df_submit_lr = df_submit_lr[['bidid', 'clickpred']].copy()
df_submit_lr.columns = col_names
df_submit_lr['click_predict'] = df_submit_lr['bidprice'].copy()

In [14]:
# NN Test Predictions
df_submit_nn = pd.read_csv("data/nn_test_preds.csv")
df_submit_nn = df_submit_nn[['bidid', 'clickprob']].copy()
df_submit_nn.columns = col_names
df_submit_nn['click_predict'] = df_submit_nn['bidprice'].copy()

In [15]:
df_submit_combined = df_submit_ocsvm.copy()
df_submit_combined['click_predict_lr'] = df_submit_lr['click_predict'].copy()
df_submit_combined['click_predict_nn'] = df_submit_nn['click_predict'].copy()

In [16]:
c = 177
w_ocsvm = 1 # Weighting for the One-Class SVM Model
w_lr = 1 # Weighting for the Logistic Regression Model
w_nn = 2.5 # Weighting for the Neural Network Model

df_submit_combined['bidprice'] = (w_ocsvm*df_submit_combined['click_predict'] + w_lr*df_submit_combined['click_predict_lr'] \
                                 + w_nn*df_submit_combined['click_predict_nn'])*c

In [17]:
# SAVE THE FILE
import time
t = time.localtime()
timestamp = time.strftime('%b-%d-%Y_%H%M', t)
df_submit_combined[col_names].to_csv(timestamp + " testing_bidding_price.csv", index=False)