# Spreadsheet columns
- Publisher - org displaying the ad
- Advertiser - brand paying for ad
- Campaign - marketing initiative
- Imps - number of times displayed regardless of views
- Viewable imps - times ad is visible on screen
- Clicks - num clicks
- Dsp total cost USD - cost for advertiser to place ads along with platform fees etc
- Dsp media cost USD - cost for advertiser to place ads
- Ssp media cost USD - fee (%) taken by publisher on any revenue generated by ads
    - is this separate to display side platform costs?
- Pc convs - conversions after click
- Total convs - conversions
- Adstxt verified imps - confirmed seen by real person

In [77]:
import numpy as np
import pandas as pd

In [78]:
dataset = "../data/dataset.csv"
df = pd.read_csv(dataset, index_col=None, thousands=',')
df.head()
df.keys()

Index(['Datetime', 'Publisher', 'Advertiser', 'Campaign', 'Imps',
       'Viewable Imps', 'Clicks', 'Dsp Total Cost USD', 'Dsp Media Cost USD',
       'Ssp Media Cost USD', 'Pc Convs', 'Total Convs',
       'Adstxt Verified Imps'],
      dtype='object')

# Set dtypes

In [79]:
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["Publisher"] = df["Publisher"].astype("category")
df["Advertiser"] = df["Advertiser"].astype("category")
df["Campaign"] = df["Campaign"].astype("category")


# Convert datetime to day of week and hour of day

In [80]:
day, hour = df["Datetime"].dt.dayofweek, df["Datetime"].dt.hour

df["Day of week"] = day.astype("category")
df["Hour of day"] = hour.astype("category")

# Add verified impression rate

In [81]:
df["VerifiedImpRate"] = df["Adstxt Verified Imps"].astype(int) / df["Imps"].astype(int)

# Add conversion / verified impression rate

In [82]:
df["ConvPerVerImp"] = df["Total Convs"].astype(int) / df["Adstxt Verified Imps"].astype(int)

### Ssp rates

In [83]:
df["SspCostPerConv"] = df["Ssp Media Cost USD"].astype(float) / df["Total Convs"].astype(int)

### Dsp rates

In [84]:
# Dsp total cost per conversion
df["DspTotalCostPerConv"] = df["Dsp Total Cost USD"].astype(float) / df["Total Convs"].astype(int)

# Dsp total cost per verified impression
df["DspTotalCostPerVerImp"] = df["Dsp Total Cost USD"].astype(float) / df["Adstxt Verified Imps"].astype(int)

# Overall rates

In [85]:
# Overall cost per conversion (dsp total and ssp)
df["ConversionsPerUSD"] = df["Total Convs"].astype(float) / (df["Dsp Total Cost USD"].astype(float))
df["VerImpPerUSD"] = df["Adstxt Verified Imps"].astype(float) / (df["Dsp Total Cost USD"].astype(float))


# Handle infinite values and save csv

In [86]:
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.to_csv("../data/engineered-dataset.csv", index=False)

# Finding key insights

In [87]:
# minimum of 5th percentile with respect to number of verified impressions
# threshold = df['Adstxt Verified Imps'].quantile(0.05)
# df_filtered = df[df['Adstxt Verified Imps'] > threshold]


In [88]:
advertiser_df = df.groupby("Publisher", observed=False)["DspTotalCostPerVerImp"].mean().sort_values(ascending=True)
advertiser_df.head(5)

Publisher
Chegg Inc                 0.000342
SpilGames                 0.000425
Hazo Digital Marketing    0.000543
OLX Group                 0.000958
NewsNow Publishing        0.001139
Name: DspTotalCostPerVerImp, dtype: float64

In [89]:
df_brand_b = df[df["Advertiser"] == "Brand B"]

campaign_performance = df_brand_b.groupby(["Publisher", "Campaign"])["ConversionsPerUSD"].mean().reset_index()

# Sort by ConversionsPerUSD in descending order
campaign_performance_sorted = campaign_performance.sort_values(by="ConversionsPerUSD", ascending=False)

# Display the top campaigns
print(campaign_performance_sorted.head())


                   Publisher Campaign  ConversionsPerUSD
18102              SofaScore      B18         149.666582
15480          Perform Group      B18         127.607487
7683            Futbol Sites      B18          80.823139
21000  Undisclosed publisher      B18          63.428101
6855              Fandom Inc      B18          55.872926


  campaign_performance = df_brand_b.groupby(["Publisher", "Campaign"])["ConversionsPerUSD"].mean().reset_index()
