### Analyzing Ad Events

A combination of some exploratory data analysis of `ad_events.csv` with analysis of ad performance

In [1]:
# necessary libraries
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# loading all of the datasets
# will start with EDA just on ad_events but then merge the tables to analyze ad performance
path = os.getcwd()

events = pd.read_csv(path + '/data/ad_events.csv')
ads = pd.read_csv(path + '/data/ads.csv')
campaigns = pd.read_csv(path + '/data/campaigns.csv')
users = pd.read_csv(path + '/data/users.csv')

In [4]:
# show first five rows of events
events.head()

Unnamed: 0,event_id,ad_id,user_id,timestamp,day_of_week,time_of_day,event_type
0,1,197,2359b,2025-07-26 00:19:56,Saturday,Night,Like
1,2,51,f9c67,2025-06-15 08:28:07,Sunday,Morning,Share
2,3,46,5b868,2025-06-27 00:40:02,Friday,Night,Impression
3,4,166,3d440,2025-06-05 19:20:45,Thursday,Evening,Impression
4,5,52,68f1a,2025-07-22 08:30:29,Tuesday,Morning,Impression


From `users_eda.ipynb` there appears to be a small portion of user ids which represent two different users. We first how to find what proportion of events consists of ad events involving any of these repeated user ids

In [13]:
# find repeated user ids
temp = pd.DataFrame(users['user_id'].value_counts() > 1)
repeat_ids = list(temp[temp['count'] == True].index)

# filter events for ad events involving a repeat ids
repeat_events = events[events['user_id'].isin(repeat_ids)]
print(f'Ad events involving a repeat user id: {len(repeat_events)}')
print(f'Proportion of ad events involving a repeat id: {(100 * len(repeat_events) / len(events)):.2f}%')

Ad events involving a repeat user id: 3967
Proportion of ad events involving a repeat id: 0.99%


The proportion of ad events involving one of these repeated ids is less than one percent. Therefore, we can reasonably remove the entries of the dataset involving these ids and without adding a significant amount of bias. 

In [15]:
# filter events to exclude repeat user_ids
events = events[~events['user_id'].isin(repeat_ids)]

# ensure correct number of events are left (should be 396033)
print(f'Number of ad events after removing repeat user_ids: {len(events)}')

Number of ad events after removing repeat user_ids: 396033
