# A/B TESTING ANALYSIS

# OBJECTIVE
The aim of this task is to determine which version of an advertisement (Ad A or Ad B) performs better in terms of user engagement and revenue generation. Specifically, the objective is to compare the Click-Through Rate (CTR) and Earnings per Click (EPC) between the two ad versions to assess which one is more effective at attracting clicks and generating revenue per click. This analysis helps in making data-driven decisions about which ad version to use for maximizing overall campaign performance.

In [12]:
import pandas as pd
# Load the dataset
df = pd.read_csv('/content/ab_testing_control.csv')

# Display the first few rows
print(df.head())


   Unnamed: 0     Impression        Click    Purchase      Earning
0           0   82529.459271  6090.077317  665.211255  2311.277143
1           1   98050.451926  3382.861786  315.084895  1742.806855
2           2   82696.023549  4167.965750  458.083738  1797.827447
3           3  109914.400398  4910.882240  487.090773  1696.229178
4           4  108457.762630  5987.655811  441.034050  1543.720179


In [13]:
# Display the first few rows to understand the structure
print(df.head())

# Check for missing values
print(df.isnull().sum())

# You might want to drop rows with missing data
df = df.dropna()

# Verify that the data has been cleaned
print(df.isnull().sum())

   Unnamed: 0     Impression        Click    Purchase      Earning
0           0   82529.459271  6090.077317  665.211255  2311.277143
1           1   98050.451926  3382.861786  315.084895  1742.806855
2           2   82696.023549  4167.965750  458.083738  1797.827447
3           3  109914.400398  4910.882240  487.090773  1696.229178
4           4  108457.762630  5987.655811  441.034050  1543.720179
Unnamed: 0    0
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64
Unnamed: 0    0
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64


In [14]:
print(df.columns)


Index(['Unnamed: 0', 'Impression', 'Click', 'Purchase', 'Earning'], dtype='object')


In [15]:
df.columns = df.columns.str.strip()


In [16]:
print(df.head())


   Unnamed: 0     Impression        Click    Purchase      Earning
0           0   82529.459271  6090.077317  665.211255  2311.277143
1           1   98050.451926  3382.861786  315.084895  1742.806855
2           2   82696.023549  4167.965750  458.083738  1797.827447
3           3  109914.400398  4910.882240  487.090773  1696.229178
4           4  108457.762630  5987.655811  441.034050  1543.720179


In [17]:
# You might want to use a more meaningful criterion based on your actual scenario
split_index = len(df) // 2
group_a = df.iloc[:split_index]
group_b = df.iloc[split_index:]


In [18]:
# Calculate Click-Through Rate (CTR) and Earnings per Click (EPC)
group_a['CTR'] = group_a['Click'] / group_a['Impression']
group_a['EPC'] = group_a['Earning'] / group_a['Click']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group_a['CTR'] = group_a['Click'] / group_a['Impression']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group_a['EPC'] = group_a['Earning'] / group_a['Click']


In [19]:
group_b['CTR'] = group_b['Click'] / group_b['Impression']
group_b['EPC'] = group_b['Earning'] / group_b['Click']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group_b['CTR'] = group_b['Click'] / group_b['Impression']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group_b['EPC'] = group_b['Earning'] / group_b['Click']


In [21]:
from scipy import stats
# Perform t-tests to compare the means of CTR and EPC between the two groups
ctr_ttest = stats.ttest_ind(group_a['CTR'].dropna(), group_b['CTR'].dropna())
epc_ttest = stats.ttest_ind(group_a['EPC'].dropna(), group_b['EPC'].dropna())

In [22]:
print("CTR t-test result:", ctr_ttest)
print("EPC t-test result:", epc_ttest)

CTR t-test result: TtestResult(statistic=1.2273520826105135, pvalue=0.22723974339450542, df=38.0)
EPC t-test result: TtestResult(statistic=-1.309907079224009, pvalue=0.198091199730466, df=38.0)


# CONCLUSIONS AND INSIGHTS


*  No Significant Difference in CTR: The p-value of 0.227 for the Click-Through Rate (CTR) indicates that there is no statistically significant difference between the two groups in terms of how many clicks the ads attract. Both ads perform similarly in attracting clicks.

* No Significant Difference in EPC: The p-value of 0.198 for Earnings per Click (EPC) shows there is no statistically significant difference between the two groups in terms of revenue generated per click. Both ads generate similar revenue per click.

* Effectiveness of Ads: Since neither CTR nor EPC shows a significant difference, both ads are performing similarly in terms of user engagement and revenue generation.

* Reevaluate Metrics: Given the lack of significant results, consider evaluating additional metrics or testing different variations of ads to better understand performance differences.








