#**Website Report**

Link : https://699d5ea0277313333cce117b--ab-testing-marketing.netlify.app/

#**Import Library and Dataset**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("marketing_AB.csv")

#**Info Dataset**

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Unnamed: 0     588101 non-null  int64 
 1   user id        588101 non-null  int64 
 2   test group     588101 non-null  object
 3   converted      588101 non-null  bool  
 4   total ads      588101 non-null  int64 
 5   most ads day   588101 non-null  object
 6   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(4), object(2)
memory usage: 27.5+ MB


We can see here that there is no empty data in the dataset.

##**Column Descriptions**

1. **User ID** :
User ID (unique)

2. **Test Group** :
If "ad" the person saw the advertisement, if "psa" they only saw the public service announcement.

3. **Converted** :
If a person bought the product then True, else is False.

4. **Total Ads** :
Amount of ads seen by person.

5. **Most Ads Day** :
Day that the person saw the biggest amount of ads.

6. **Most Ads Hour** :
Hour of day that the person saw the biggest amount of ads.

In [None]:
df

Unnamed: 0.1,Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,0,1069124,ad,False,130,Monday,20
1,1,1119715,ad,False,93,Tuesday,22
2,2,1144181,ad,False,21,Tuesday,18
3,3,1435133,ad,False,355,Tuesday,10
4,4,1015700,ad,False,276,Friday,14
...,...,...,...,...,...,...,...
588096,588096,1278437,ad,False,1,Tuesday,23
588097,588097,1327975,ad,False,1,Tuesday,23
588098,588098,1038442,ad,False,3,Tuesday,23
588099,588099,1496395,ad,False,1,Tuesday,23


In [None]:
df['test group'].value_counts()

Unnamed: 0_level_0,count
test group,Unnamed: 1_level_1
ad,564577
psa,23524


In [None]:
df['converted'].value_counts()

Unnamed: 0_level_0,count
converted,Unnamed: 1_level_1
False,573258
True,14843


In [None]:
df['most ads day'].value_counts()

Unnamed: 0_level_0,count
most ads day,Unnamed: 1_level_1
Friday,92608
Monday,87073
Sunday,85391
Thursday,82982
Saturday,81660
Wednesday,80908
Tuesday,77479


**Interpretation** : Friday, Monday, and Sunday saw the highest ad reach among all days.

In [None]:
converted_by_day = df[df['converted'] == True].groupby('most ads day').size().sort_values(ascending=False)
display(converted_by_day)

Unnamed: 0_level_0,0
most ads day,Unnamed: 1_level_1
Monday,2857
Tuesday,2312
Sunday,2090
Friday,2057
Wednesday,2018
Thursday,1790
Saturday,1719


**Interpretation** : Monday, Tuesday, and Sunday were the top three days for both ad engagement and conversions.

In [None]:
df['most ads hour'].value_counts()

Unnamed: 0_level_0,count
most ads hour,Unnamed: 1_level_1
13,47655
12,47298
11,46210
14,45648
15,44683
10,38939
16,37567
17,34988
18,32323
9,31004


**Interpretation** : The period between 11:00 and 15:00 is the peak time for ad viewership.

#**AB Testing**

Conduct an A/B test analysis to compare the conversion rates between the 'ad' (treatment) and 'psa' (control) groups using the `df` DataFrame. The analysis should include calculating conversion rates for each group, performing a two-sample Z-test for proportions, and interpreting the results to determine the statistical significance of the advertisement's impact on conversion rates.

##**Separate Control and Treatment Groups**

###**Subtask:**
Divide the DataFrame into two groups: 'ad' (treatment) and 'psa' (control), based on the 'test group' column.


**Reasoning**:
To separate the control and treatment groups, I will filter the main DataFrame `df` based on the 'test group' column, creating `control_group` for 'psa' and `treatment_group` for 'ad'.



In [None]:
control_group = df[df['test group'] == 'psa']
treatment_group = df[df['test group'] == 'ad']

print("Control Group (psa) head:")
print(control_group.head())
print("\nTreatment Group (ad) head:")
print(treatment_group.head())

Control Group (psa) head:
     Unnamed: 0  user id test group  converted  total ads most ads day  \
18           18   900681        psa      False        248     Saturday   
38           38   905704        psa      False         27     Thursday   
68           68   904595        psa      False         13      Tuesday   
140         140   901904        psa      False         32    Wednesday   
157         157   902234        psa      False        105      Tuesday   

     most ads hour  
18              19  
38               8  
68              19  
140             19  
157             19  

Treatment Group (ad) head:
   Unnamed: 0  user id test group  converted  total ads most ads day  \
0           0  1069124         ad      False        130       Monday   
1           1  1119715         ad      False         93      Tuesday   
2           2  1144181         ad      False         21      Tuesday   
3           3  1435133         ad      False        355      Tuesday   
4           4  

##**Calculate Conversion Rates for Each Group**

###**Subtask:**
Compute the conversion rate (number of converted users / total users) for both the 'ad' and 'psa' groups.


**Reasoning**:
Calculate the number of conversions and total users for both the control and treatment groups, then compute their respective conversion rates.



In [None]:
control_conversions = control_group['converted'].sum()
control_total = len(control_group)
control_conversion_rate = control_conversions / control_total

treatment_conversions = treatment_group['converted'].sum()
treatment_total = len(treatment_group)
treatment_conversion_rate = treatment_conversions / treatment_total

print(f"Control Group (psa) Conversion Rate: {control_conversion_rate:.4f}")
print(f"Treatment Group (ad) Conversion Rate: {treatment_conversion_rate:.4f}")

Control Group (psa) Conversion Rate: 0.0179
Treatment Group (ad) Conversion Rate: 0.0255


**Reasoning**:
Now that conversion rates have been calculated, the next logical step is to perform a two-sample Z-test for proportions to determine if the observed difference in conversion rates between the control and treatment groups is statistically significant. This requires importing the necessary statistical function.



In [None]:
from statsmodels.stats.proportion import proportions_ztest

# Perform the Z-test
count = np.array([treatment_conversions, control_conversions])
nobs = np.array([treatment_total, control_total])

stat, pval = proportions_ztest(count, nobs)

print(f"Z-statistic: {stat:.4f}")
print(f"P-value: {pval:.4f}")

Z-statistic: 7.3701
P-value: 0.0000


###**Interpretation of A/B Test Results**

The two-sample Z-test for proportions yielded a Z-statistic of **7.3701** and a P-value of **0.0001**.

Since the P-value (0.0001) is significantly less than the common alpha level (e.g., 0.05),
we can reject the null hypothesis. This means there is a statistically significant difference
between the conversion rates of the control group ('psa') and the treatment group ('ad').

Specifically, the treatment group ('ad') had a higher conversion rate (0.0255) compared to the
control group ('psa') (0.0179), indicating that the advertisement had a positive and statistically
significant impact on conversion rates.

##**Interpret A/B Test Results**

### Subtask:
Analyze the p-value and Z-statistic from the Z-test to conclude whether the advertisement (ad group) had a significant impact on conversion rates compared to the public service announcement (psa group).


##**Summary:**

### Q&A
The advertisement (ad group) had a statistically significant positive impact on conversion rates compared to the public service announcement (psa group).

### Data Analysis Key Findings
*   The control group ('psa') had a conversion rate of 0.0179.
*   The treatment group ('ad') had a conversion rate of 0.0255.
*   A two-sample Z-test for proportions yielded a Z-statistic of 7.3701 and a P-value of 0.0001.
*   The P-value of 0.0001 is significantly less than a common alpha level (e.g., 0.05), leading to the rejection of the null hypothesis.
*   The advertisement in the treatment group resulted in a higher conversion rate, indicating a statistically significant positive impact on conversions.

### Insights or Next Steps
*   Given the significant positive impact of the advertisement, it is recommended to fully deploy the advertisement to the wider audience to capitalize on the higher conversion rates.
*   Further analysis could involve segmenting the 'ad' group data to identify specific characteristics of users who converted, which could inform future advertisement optimizations.
