---

# Predicting Lift with Twitter's 1st Party Advertising Pixel (Lower Funnel Click ID)

This project explored the potential lift in conversion tracking an advertiser could see when switching from 3rd party pixels to Twitter's 1st party pixel, specifically for lower funnel events. 

By using synthetic data to simulate user interactions and applying OLS regression modeling, we aimed to provide advertisers with insights into the benefits of using Twitter's 1st party pixel. 

---

---

## Generation of Synthetic Ad Interaction Data
In this section, we create a synthetic dataset to simulate user interactions with advertisements on a Twitter. The focus was on causal events when considering different stages of user engagement, from viewing an ad to converting. The dataset will capture the following:

* **Advertiser:** The company running the ad.
* **Device Type:** Whether the user interaction happened on an Android or iOS device.
* **Views:** Total number of times the ad was viewed.
* **Click-Through Rate (CTR):** The percentage of views that led to clicks.
* **Ad Clicks:** Total number of times the ad was clicked.
* **Site Visits:** Total number of users who visited the landing page after clicking the ad.
* **Proceeded to Checkout:** Total number of users who proceeded to the checkout page after visiting the site.
* **Conversions:** Total number of users who completed a purchase or a desired action after proceeding to checkout.

### Column Calculations:

* **Advertisers:** Generated using the Faker library to simulate 1000 unique company names.
* **Device Type:** Two types - Android and iOS.
* **Views:** A random number between 10,000 and 20,000.
* **CTR:** Randomly generated between 1% and 3% for both device types.
* **Ad Clicks:** Calculated based on Views and CTR.
* **Site Visits:** Based on device type, a higher percentage for Android (between 95% and 99%) and a slightly lower percentage for iOS (between 90% and 95%).
* **Proceeded to Checkout:** A percentage of site visits, randomly between 5% and 15%.
* **Conversions:** A high percentage of those who proceeded to checkout, randomly between 75% and 90%.

---

In [9]:
import pandas as pd
import numpy as np
from faker import Faker
import dataframe_image as dfi

# Initialize Faker
fake = Faker()

# Generate 1000 unique company names
advertisers = [fake.company() for _ in range(1000)]
device_types = ['android', 'ios']
data_list = []

for advertiser in advertisers:
    for device in device_types:
        
        # Generate views (random number between 10,000 and 20,000)
        views = np.random.randint(10000, 20000)
        
        # Generate click-through rate (CTR) uniformly for both devices (1-3%)
        click_through_rate = np.random.uniform(0.01, 0.03)
        
        ad_clicks = int(views * click_through_rate)
        
        # Generate site visits based on device type
        if device == 'android':
            landing_page_rate = np.random.uniform(0.95, 0.99)
        else:  # iOS
            landing_page_rate = np.random.uniform(0.90, 0.95)
        site_visits = int(ad_clicks * landing_page_rate)
        
        # Generate proceeded to checkout as a percentage of site visits (5-15%)
        proceeded_to_checkout_rate = np.random.uniform(0.05, 0.15)
        proceeded_to_checkout = int(site_visits * proceeded_to_checkout_rate)
        
        # Generate conversions as a high percentage of proceeded_to_checkout (75-90%)
        conversion_rate = np.random.uniform(0.75, 0.90)
        conversions = int(proceeded_to_checkout * conversion_rate)
        
        # Append the generated data to our list
        data_list.append([advertiser, device, views, click_through_rate, ad_clicks, site_visits, proceeded_to_checkout, conversions])

# Convert the list to a DataFrame
columns = ['Advertiser', 
           'Device_Type', 
           'Views', 
           'Click_Through_Rate', 
           'Ad_Clicks', 
           'Site_Visits', 
           'Proceeded_To_Checkout', 
           'Conversions']
before_df = pd.DataFrame(data_list, columns=columns)
before_df.head()

Unnamed: 0,Advertiser,Device_Type,Views,Click_Through_Rate,Ad_Clicks,Site_Visits,Proceeded_To_Checkout,Conversions
0,Paul-Adams,android,12835,0.020149,258,251,21,15
1,Paul-Adams,ios,16154,0.015738,254,235,34,26
2,Gay-Wilson,android,17187,0.012735,218,215,22,19
3,Gay-Wilson,ios,19078,0.026093,497,458,58,45
4,Meyers-Kennedy,android,17374,0.022476,390,382,53,46


In [11]:
# Create a deep copy of the before_df to ensure we don't modify the original data
after_df = before_df.copy()

# Define the lift percentage ranges for each column and device type post-pixel implementation
lift_ranges = {
    'Site_Visits': {'android': (1.05, 1.10), 'ios': (1.10, 1.25)},
    'Proceeded_To_Checkout': {'android': (1.05, 1.10), 'ios': (1.07, 1.15)},
    'Conversions': {'android': (1.05, 1.10), 'ios': (1.05, 1.15)}
}

# Apply the lifts progressively
for index, row in after_df.iterrows():
    device_type = row['Device_Type']
    for column in lift_ranges.keys():
        lower_bound, upper_bound = lift_ranges[column][device_type]
        lift = np.random.uniform(lower_bound, upper_bound)
        after_df.at[index, column] = int(row[column] * lift)

after_df.head()


Unnamed: 0,Advertiser,Device_Type,Views,Click_Through_Rate,Ad_Clicks,Site_Visits,Proceeded_To_Checkout,Conversions
0,Paul-Adams,android,12835,0.020149,258,267,22,16
1,Paul-Adams,ios,16154,0.015738,254,274,36,29
2,Gay-Wilson,android,17187,0.012735,218,231,23,19
3,Gay-Wilson,ios,19078,0.026093,497,565,64,49
4,Meyers-Kennedy,android,17374,0.022476,390,402,56,49


--- 

## Assessing the Impact of the First-Party Pixel
To quantify the effectiveness of our first-party pixel (1PP) solution, we'll evaluate the difference in user interactions before and after its implementation. By calculating the discrepancies in key metrics like site visits and conversions, we can clearly discern the added value brought about by the 1PP. This difference will serve as a foundation for our subsequent analysis, shedding light on the potential uplift advertisers might experience with the 1PP.

---



In [16]:
# Calculate the difference between the 'after' and 'before' datasets
difference_df = after_df.copy()
difference_df[['Views', 'Click_Through_Rate', 'Ad_Clicks', 'Site_Visits', 'Proceeded_To_Checkout', 'Conversions']] = after_df[['Views', 'Click_Through_Rate', 'Ad_Clicks', 'Site_Visits', 'Proceeded_To_Checkout', 'Conversions']] - before_df[['Views', 'Click_Through_Rate', 'Ad_Clicks', 'Site_Visits', 'Proceeded_To_Checkout', 'Conversions']]
difference_df

Unnamed: 0,Advertiser,Device_Type,Views,Click_Through_Rate,Ad_Clicks,Site_Visits,Proceeded_To_Checkout,Conversions
0,Paul-Adams,android,0,0.0,0,16,1,1
1,Paul-Adams,ios,0,0.0,0,39,2,3
2,Gay-Wilson,android,0,0.0,0,16,1,0
3,Gay-Wilson,ios,0,0.0,0,107,6,4
4,Meyers-Kennedy,android,0,0.0,0,20,3,3
...,...,...,...,...,...,...,...,...
1995,Briggs-Wiggins,ios,0,0.0,0,36,3,1
1996,Moore-Moran,android,0,0.0,0,18,1,1
1997,Moore-Moran,ios,0,0.0,0,42,4,3
1998,White Inc,android,0,0.0,0,15,2,2


## Regression Analysis on Conversion Impact

In this section, we aim to quantify the relationship between site visits, the number of users who proceed to checkout, and the ultimate conversions. By doing so, we can gain insights into which factors are the most influential in driving conversions and potentially use this model for future predictions or optimizations.

To achieve this, we employ a linear regression model using the `statsmodels` library. The regression formula we are using is:

$$
\text{Conversions} = \beta_0 + \beta_1 \times \text{Site_Visits} + \beta_2 \times \text{Proceeded_To_Checkout} + \epsilon
$$

Where:
- $\beta_0$ is the intercept.
- $\beta_1$ is the coefficient that quantifies the change in conversions for a one-unit change in site visits, keeping everything else constant.
- $\beta_2$ is the coefficient that quantifies the change in conversions for a one-unit change in the number of users who proceeded to checkout, keeping everything else constant.
- $\epsilon$ represents the error term.

Let's fit this model to our data and examine the results:


In [15]:
import statsmodels.formula.api as smf

# Define the formula for the model
formula = 'Conversions ~ Site_Visits + Proceeded_To_Checkout'

# Fit the model
model = smf.ols(formula, data=difference_df).fit()

# Print the summary
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:            Conversions   R-squared:                       0.571
Model:                            OLS   Adj. R-squared:                  0.571
Method:                 Least Squares   F-statistic:                     1331.
Date:                Tue, 22 Aug 2023   Prob (F-statistic):               0.00
Time:                        09:01:48   Log-Likelihood:                -2343.9
No. Observations:                2000   AIC:                             4694.
Df Residuals:                    1997   BIC:                             4711.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
Intercept                 0.14