# Bidding Strategies

These are some suggestions for bidding strategies utilizing the data. The goal is to ensure at least 400 policies sold per 10000 customers while maintaining bidding costs. This is using the models to predict rank, clicks and policies sold. Some tendencies see from the data analysis shows that higher rank determines higher chance of clicks and policies sold. One thing to keep mind of is that cost only accrue if the customer clicks on the ad.

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

rank_proba_df = pd.read_pickle("rank_proba_df.pkl")
click_policies_proba_df = pd.read_pickle("click_policies_proba.pkl")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

dist_of_cust_at_rank = []
for index, row in click_policies_proba_df.iterrows():
    temp = rank_proba_df[(rank_proba_df["Insured"] == click_policies_proba_df.iloc[index,0]) & 
                         (rank_proba_df["Num_Vehicles"] == click_policies_proba_df.iloc[index,1]) & 
                         (rank_proba_df["Num_Drivers"] == click_policies_proba_df.iloc[index,2])]
    dist_of_cust_at_rank.append(dict(temp.iloc[0])["Rank " + str(dict(row)["rank"]) + " Prob"])
    
all_data = click_policies_proba_df.copy()
all_data["policies_sold_per_click"] = all_data["policies_sold"] / all_data["click"]
all_data["rank_dist"] = dist_of_cust_at_rank
all_data["Expected_Num_Cust"] = np.round(all_data["rank_dist"] * 300).astype(int)
all_data["total_bid_cost"] = all_data["Expected_Num_Cust"] * all_data["click"]* all_data["bid"]    
all_data["Expected_policies_sold"] = np.round(all_data["Expected_Num_Cust"] * all_data["policies_sold"]).astype(int)
all_data["Expected_cost_per_policies_sold"] = all_data["total_bid_cost"] / all_data["Expected_policies_sold"]
expected_gains_increase_rank = []
expected_loss_decrease_rank = []
for index, row in all_data.iterrows():
    temp = dict(row)
    if temp["rank"] == 1:
        expected_gains_increase_rank.append(0)
        temp_below = dict(all_data.iloc[index + 1])
        expected_loss_decrease_rank.append((temp_below["policies_sold"] - temp["policies_sold"])*temp["Expected_Num_Cust"])
    elif temp["rank"] == 5:
        expected_loss_decrease_rank.append(0)
        temp_above = dict(all_data.iloc[index - 1])
        expected_gains_increase_rank.append((temp_above["policies_sold"] - temp["policies_sold"])*temp["Expected_Num_Cust"])
    else:
        temp_above = dict(all_data.iloc[index - 1])
        temp_below = dict(all_data.iloc[index + 1])
        expected_gains_increase_rank.append((temp_above["policies_sold"] - temp["policies_sold"])*temp["Expected_Num_Cust"])
        expected_loss_decrease_rank.append((temp_below["policies_sold"] - temp["policies_sold"])*temp["Expected_Num_Cust"])
all_data["exp_gains_inc_rank"] = np.round(expected_gains_increase_rank).astype(int)
all_data["exp_loss_dec_rank"] = np.round(expected_loss_decrease_rank).astype(int)

insured = ["unknown", "Y", "N"]
vehicles = [1, 2, 3]
drivers = [1, 2]
marry = ["M", "S"]
bid = [10]
rank = [1,2,3,4,5]
pre_df = []
for x in insured:
    for y in vehicles:
        for z in drivers:
            for u in marry:
                temp = {}
                temp["Currently Insured"] = x
                temp["Number of Vehicles"] = y
                temp["Number of Drivers"] = z
                temp["Marital Status"] = u
                temp_df = all_data[(all_data["Currently Insured"] == x) & 
                                   (all_data["Number of Vehicles"] == y) & 
                                   (all_data["Number of Drivers"] == z) & 
                                   (all_data["Marital Status"] == u)]
                temp_df_rank = rank_proba_df[(rank_proba_df["Insured"] == x) & 
                                             (rank_proba_df["Num_Vehicles"] == y) & 
                                             (rank_proba_df["Num_Drivers"] == z)]
                temp["Expected Rank"] = float(temp_df_rank["Expected Rank"])
                temp["Rank 1 Prob"] = float(temp_df_rank["Rank 1 Prob"])
                temp["Rank 2 Prob"] = float(temp_df_rank["Rank 2 Prob"])
                temp["Rank 3 Prob"] = float(temp_df_rank["Rank 3 Prob"])
                temp["Rank 4 Prob"] = float(temp_df_rank["Rank 4 Prob"])
                temp["Rank 5 Prob"] = float(temp_df_rank["Rank 5 Prob"])
                temp["(Weighted) Click Prob"] = np.sum(temp_df["click"] * temp_df["rank_dist"])
                temp["(Weighted) Policies Sold Prob"] = np.sum(temp_df["policies_sold"] * temp_df["rank_dist"])
                temp["Num. of Customers (Assumption)"] = 300
                temp["Expected policies sold"] = np.sum(temp_df["Expected_policies_sold"])
                temp["Current Cost"] = np.sum(temp_df["total_bid_cost"])
                temp["Cost per Policies Sold"] = temp["Current Cost"] / temp["Expected policies sold"]
                temp["avg policies sold per click"] = np.average(temp_df["policies_sold_per_click"])
                temp["Exp_Gains_Inc_Rank_1"] = np.sum(temp_df["exp_gains_inc_rank"])
                temp["Exp_Losses_Dec_Rank_1"] = np.sum(temp_df["exp_loss_dec_rank"])
                temp["Exp_Loss_All_Rank_5"] = -(temp["Expected policies sold"] - np.round(300 * temp_df.iloc[4,7]).astype(int))
                temp["Cost_Increase_per_$1_increase_to_bid"] = np.sum(temp_df["Expected_Num_Cust"] * temp_df["click"])
                pre_df.append(temp)
summary_df = pd.DataFrame(pre_df)

In [12]:
summary_df

Unnamed: 0,Currently Insured,Number of Vehicles,Number of Drivers,Marital Status,Expected Rank,Rank 1 Prob,Rank 2 Prob,Rank 3 Prob,Rank 4 Prob,Rank 5 Prob,(Weighted) Click Prob,(Weighted) Policies Sold Prob,Num. of Customers (Assumption),Expected policies sold,Current Cost,Cost per Policies Sold,avg policies sold per click,Exp_Gains_Inc_Rank_1,Exp_Losses_Dec_Rank_1,Exp_Loss_All_Rank_5,Cost_Increase_per_$1_increase_to_bid
0,unknown,1,1,M,1.648584,0.523282,0.304852,0.171866,0.0,0.0,0.401995,0.202891,300,60,1205.326547,20.088776,0.549236,14,-26,-54,120.532655
1,unknown,1,1,S,1.648584,0.523282,0.304852,0.171866,0.0,0.0,0.401995,0.217835,300,66,1205.326547,18.262523,0.596609,15,-27,-60,120.532655
2,unknown,1,2,M,1.662524,0.525392,0.286692,0.187916,0.0,0.0,0.399923,0.172408,300,51,1201.178116,23.552512,0.45841,12,-22,-46,120.117812
3,unknown,1,2,S,1.662524,0.525392,0.286692,0.187916,0.0,0.0,0.399923,0.185674,300,55,1201.178116,21.839602,0.498674,13,-24,-50,120.117812
4,unknown,2,1,M,1.739222,0.47373,0.313319,0.212951,0.0,0.0,0.372206,0.160929,300,48,1116.18996,23.253958,0.463058,13,-22,-43,111.618996
5,unknown,2,1,S,1.739222,0.47373,0.313319,0.212951,0.0,0.0,0.372206,0.173484,300,52,1116.18996,21.465192,0.503889,14,-22,-47,111.618996
6,unknown,2,2,M,1.684354,0.503722,0.308202,0.188076,0.0,0.0,0.382362,0.140066,300,42,1144.440628,27.248586,0.385128,11,-20,-38,114.444063
7,unknown,2,2,S,1.684354,0.503722,0.308202,0.188076,0.0,0.0,0.382362,0.151355,300,45,1144.440718,25.432016,0.419617,11,-20,-41,114.444072
8,unknown,3,1,M,2.529486,0.184417,0.301838,0.313589,0.200157,0.0,0.23865,0.08786,300,26,715.487313,27.518743,0.386756,14,-13,-22,71.548731
9,unknown,3,1,S,2.529486,0.184417,0.301838,0.313589,0.200157,0.0,0.23865,0.095304,300,29,715.487313,24.671976,0.42149,15,-14,-25,71.548731


### Model predictions

Assuming that each of the 36 classes (from all combinations of insured, number of vehicles, number of drivers, marital status) has 300 customers then we have 10,800 customers which is approximately 10,000 and from out model we predict 838 customers, which is close of the ratio of 780 customers per 10,000 from our data analysis of the actual data. 

### Ensuring Goals
We first want to ensure our requirements of having 400 policies sold per 10,0000 customers, which among out data translates to about 430 customers among the 10,800 hypothetical customers. One main way to achieve this is to not decrease the bids on the following customer demographics:
- Insured: unknown
- Number of Vehicles: 1 or 2
- Number of Drivers: 1 or 2
- Marital Status: M or S
This is the first 8 entries in the summary_df dataframe and combines for a predicted 419 policies sold. Added to the fact that among all customer demographics that having rank 5 still provides a trickle of policies sold, We can reach the 400 policies sold per 10,000 customers by not decreasing the bids on those demographics. If wee include the cases of 3 number of drivers then this will go up to an expected 500 policies sold per 10,000 customers.

### Bidding Strategy - Cutting Cost
To optimize for cost, there are 2 approaches we can take. The first would be to cut bids of the demographics with expected ranks that are 4 and above since they do not generate as many policies sold and so cutting those cost would allow us to allocate for more lucrative customer demographics. This is certainly a viable option but may not generate as much free cash to make more bids since cost comes from clicks and those in rank 4 and 5 has less of the chance to click, thus actually has quite low costs. 



The main thing we should consider is the policies sold per click and consider lowering the bid of the customer demographics with low policies sold per click to $1 per bid. This essentially guarantees that we will have rank 5 for those customers but that we can be more economical and shift that money into bids on customers that get more policies sold per click (and thus getting more policies in a more economical manner). 

In [13]:
summary_df.sort_values(by = ["avg policies sold per click"], ascending= False)

Unnamed: 0,Currently Insured,Number of Vehicles,Number of Drivers,Marital Status,Expected Rank,Rank 1 Prob,Rank 2 Prob,Rank 3 Prob,Rank 4 Prob,Rank 5 Prob,(Weighted) Click Prob,(Weighted) Policies Sold Prob,Num. of Customers (Assumption),Expected policies sold,Current Cost,Cost per Policies Sold,avg policies sold per click,Exp_Gains_Inc_Rank_1,Exp_Losses_Dec_Rank_1,Exp_Loss_All_Rank_5,Cost_Increase_per_$1_increase_to_bid
1,unknown,1,1,S,1.648584,0.523282,0.304852,0.171866,0.0,0.0,0.401995,0.217835,300,66,1205.326547,18.262523,0.596609,15,-27,-60,120.532655
25,N,1,1,S,3.520301,0.0,0.189499,0.306385,0.298432,0.205684,0.129203,0.07343,300,22,388.589883,17.663176,0.589597,18,-10,-16,38.858988
0,unknown,1,1,M,1.648584,0.523282,0.304852,0.171866,0.0,0.0,0.401995,0.202891,300,60,1205.326547,20.088776,0.549236,14,-26,-54,120.532655
24,N,1,1,M,3.520301,0.0,0.189499,0.306385,0.298432,0.205684,0.129203,0.067538,300,19,388.5899,20.4521,0.542438,17,-10,-14,38.85899
5,unknown,2,1,S,1.739222,0.47373,0.313319,0.212951,0.0,0.0,0.372206,0.173484,300,52,1116.18996,21.465192,0.503889,14,-22,-47,111.618996
3,unknown,1,2,S,1.662524,0.525392,0.286692,0.187916,0.0,0.0,0.399923,0.185674,300,55,1201.178116,21.839602,0.498674,13,-24,-50,120.117812
29,N,2,1,S,4.317088,0.0,0.0,0.213218,0.256476,0.530306,0.061189,0.031456,300,9,183.628827,20.403203,0.494058,10,-3,-4,18.362883
27,N,1,2,S,2.436136,0.225783,0.281309,0.323896,0.169012,0.0,0.260772,0.119967,300,36,782.439862,21.734441,0.492161,16,-17,-31,78.243986
4,unknown,2,1,M,1.739222,0.47373,0.313319,0.212951,0.0,0.0,0.372206,0.160929,300,48,1116.18996,23.253958,0.463058,13,-22,-43,111.618996
2,unknown,1,2,M,1.662524,0.525392,0.286692,0.187916,0.0,0.0,0.399923,0.172408,300,51,1201.178116,23.552512,0.45841,12,-22,-46,120.117812


From the dataframe above sorted descendingly by the weighted policies sold per click (which computes, for each customer demographics, the weighted average of policies sold per click with weights the probability of each rank), we notice that the "Y" Insured generally has has the lowest policies sold per click. 

One potentially broad and aggresive strategy would be to set all the bids of these customers to 1. This will save approximately 4350 of which we can use to redistribute to demographics that get more polcieis sold per clicks. The cost of such a strategy would be a loss of 135 policies from the current estimate above of 838, so with the savings of 4350 we still get about 700 customers per 10,800 or 650 customers per 10,000 customers.

We can also be more targeted in our bid lowering, if we wish to not be so aggressive, by going in ascending order of "avg policies sold per click" 

In [4]:
np.sum(summary_df[summary_df["Currently Insured"] == "Y"]["Expected policies sold"])

135

In [5]:
np.sum(summary_df["Current Cost"])

20747.463612556458

### Bidding Strategy - Increasing policies sold
To increase policies sold at an economical rate, we want to maximize the policies sold per click, but we need to consider the conditions for the greatest possible growth. We see int he above dataframe that the tendency for higher policies sold per click is "unknown" and "N" Insured with lower number of vehicles and number of drivers. But the key difference between the two classes of "unknown" and "N" is that unknown is mostly lower rank while "N" has higher ranks. Thus "N" Insured customer demographics has a higher chance of growth since we can not do better than rank 1 which many "unknown" insured are in. 

So the bidding strategy to increase policies sold would be to increase the bids on the "N" Insured customers with possibly a skew towards lower number of vehicles and lower number of drivers since there generally is less competition for those vs more number of vehicles. Additionally the cost to increase bids by $1 for "N" customers is relatively lower than the "unknown" customers, this will change when ranks go higher and we get more of these customers but this just reflects the potential for growth and highlights the low cost in the event we do not get any new policies sold.

If we are able to increase the rank of each "N" Insured customer by 1 (when possible) then we expect an increase of 140 policies sold. This counteracts the loss of policies sold by setting bids of the "Y" insured customers to 1. Whether this estimation is is realizable is a different story since we do not have any more information on bids and rank. This would be a question to be addressed in the next project when new data is gathered with this bidding strategy.

### Bidding Strategy - Increasing information
This bidding strategy is primarily treating this project as the first part of an interative project where we go through cycles of collecting data and optimizing the bidding strategy. The main lack of information that we have is that the relationship of bid vs rank. This is important since this indirectly indicates how much does the other company value certain customer demographics and this information would inform us better if we wish to be more targeted in our bids. Thus we can consider randomizing and redistributing the bids among the "N" and "Y" customers (maybe with uniform distrbutions) so we can gauge how the other companies values this customer demographics. It is clear that the other companies value non-unknown Insured with a leaning towards "Y" but also for higher number of vehicles. Being able to place a bid value of the other companies preference would go far in determining if it is worth it to compete for these customers.

### Possible complete bidding strategy
We can redistribute the "Y" and "N" customers by deacreasing bids for "Y" customers and increasing the bids of "N" customers. How we go about this would be based on how much information we wish to gather for our next model possible randomly decreasing bids for "Y" customers and randomly increasing for "N" customers which can be calculated to ensure a savings of at most 4350 dollars.

For example if we want to save 4350 we can decrease "Y" customers bid to 1 dollar and leaving the "N" customer bids alone. If we want to save 2000 then we can decrease bid for "Y" to 1 dollar and randomly distribute 2350 dollars to "N" customers (maybe skewed to those with high policies sold per click) using the "Cost_Increase_per_$1_increase_to_bid" as a guide. 

In all of this we keep the bids of the "unknown" customers the same thus ensuring at least 500 policies sold per 10,000 customers.

### Another possible complete bidding strategy
We can set a threshold for the "avg policies sold per click" column to distinguish cost/click efficient and non-cost/click efficient customer demographics. Then just decrease the bids of the non-cost/click efficient customer demographics and increase the bids of the cost/click efficient customers. Extra conditions can be added to distribute more to the customers whose demographics expected rank is above 2 (since they have more potential for growth if they can achieve rank 1). A good threshold might just be the midpoint but if we want to account for the customers whose expected rank is above 2 then a threshold of .36 for "avg policies sold per click" whould split those demographics into 2 approximately equal groups. 

The latter strategy would have a potential savings of 5000 dollars if we make all bids below the threshold to be 1 dollar. This would give an expected 680 policies sold per 10,800 customers which translates to approximately 630 policies sold per 10,000 cusomters. This seems to be a more effective approach in preserving more polciies sold while generating more savings. 

### Next Step Considerations
Including cost of policies would let us go even further and have data like revenue per click which would be a better measure of profit for the company. In terms of scaling we can always regress the customer features with some of the new features we derived from our models like "avg policies sold per click" and set a threshold. 