<a href="https://colab.research.google.com/github/Muhammadridho100902/google_collab/blob/main/A_B_Testing_of_Themes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

An online bookstore is looking to optimize its website design to improve user engagement and ultimately increase book purchases. The website currently offers two themes for its users: “Light Theme” and “Dark Theme.” The bookstore’s data science team wants to conduct an A/B testing experiment to determine which theme leads to better user engagement and higher conversion rates for book purchases.

The data collected by the bookstore contains user interactions and engagement metrics for both the Light Theme and Dark Theme. The dataset includes the following key features:

1. Theme: dark or light
2. Click Through Rate: The proportion of the users who click on links or buttons on the website.
3. Conversion Rate: The percentage of users who signed up on the platform after visiting for the first time.
4. Bounce Rate: The percentage of users who leave the website without further interaction after visiting a single page.
5. Scroll Depth: The depth to which users scroll through the website pages.
6. Age: The age of the user.
7. Location: The location of the user.
8. Session Duration: The duration of the user’s session on the website.
9. Purchases: Whether the user purchased the book (Yes/No).
10. Added_to_Cart: Whether the user added books to the cart (Yes/No).

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from statsmodels.stats.proportion import proportions_ztest
from scipy import stats

In [2]:
data = pd.read_csv('/content/website_ab_test.csv')
data.head()

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,Chennai,1535,No,Yes
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,Pune,303,No,Yes
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,Chennai,563,Yes,Yes
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,Pune,385,Yes,No
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,New Delhi,1437,No,No


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Theme               1000 non-null   object 
 1   Click Through Rate  1000 non-null   float64
 2   Conversion Rate     1000 non-null   float64
 3   Bounce Rate         1000 non-null   float64
 4   Scroll_Depth        1000 non-null   float64
 5   Age                 1000 non-null   int64  
 6   Location            1000 non-null   object 
 7   Session_Duration    1000 non-null   int64  
 8   Purchases           1000 non-null   object 
 9   Added_to_Cart       1000 non-null   object 
dtypes: float64(4), int64(2), object(4)
memory usage: 78.2+ KB


In [4]:
data.describe()

Unnamed: 0,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.256048,0.253312,0.505758,50.319494,41.528,924.999
std,0.139265,0.139092,0.172195,16.895269,14.114334,508.231723
min,0.010767,0.010881,0.20072,20.011738,18.0,38.0
25%,0.140794,0.131564,0.353609,35.655167,29.0,466.5
50%,0.253715,0.252823,0.514049,51.130712,42.0,931.0
75%,0.370674,0.37304,0.648557,64.666258,54.0,1375.25
max,0.499989,0.498916,0.799658,79.997108,65.0,1797.0


In [5]:
data.isnull().sum()

Theme                 0
Click Through Rate    0
Conversion Rate       0
Bounce Rate           0
Scroll_Depth          0
Age                   0
Location              0
Session_Duration      0
Purchases             0
Added_to_Cart         0
dtype: int64

In [6]:
data.duplicated().sum()

0

In [7]:
data.head()

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,Chennai,1535,No,Yes
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,Pune,303,No,Yes
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,Chennai,563,Yes,Yes
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,Pune,385,Yes,No
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,New Delhi,1437,No,No


In [8]:
data.dtypes

Theme                  object
Click Through Rate    float64
Conversion Rate       float64
Bounce Rate           float64
Scroll_Depth          float64
Age                     int64
Location               object
Session_Duration        int64
Purchases              object
Added_to_Cart          object
dtype: object

In [9]:
for i in data.columns:
  if data[i].dtype != 'object':
    fig = px.box(x=data[i], labels={'x':i})
    fig.show()

# Function Scatter Plot

In [10]:
def relationship(data, x, y, refers):
  fig = px.scatter(data_frame=data, x=x, y=y, trendline='ols', color=refers)
  fig.show()

relationship(data, 'Age', 'Session_Duration', 'Theme')

# Function Bar Plot

In [11]:
light_theme = data[data['Theme'] == 'Light Theme']
dark_theme = data[data['Theme'] == 'Dark Theme']

def bar_plot(x, xax_title, title):
  fig = go.Figure()

  fig.add_trace(go.Histogram(x=light_theme[x], name='Light Theme', opacity=0.6))
  fig.add_trace(go.Histogram(x=dark_theme[x], name='Dark Theme', opacity=0.6))

  fig.update_layout(
      title_text= title,
      xaxis_title_text=xax_title,
      yaxis_title_text='Frequency',
      barmode='group',
      bargap=0.1
  )

  fig.show()

In [12]:
bar_plot('Click Through Rate','Click Through Rate', 'Click Through Rate by Theme')

In [13]:
bar_plot('Age','Age', 'Age by Theme')

In [14]:
bar_plot('Location','Location', 'Location by Theme')

# Comparison of Both Themes Based on Purchases

In [15]:
light_theme_conversion = light_theme[light_theme['Purchases'] == 'Yes'].shape[0]
light_theme_total = light_theme.shape[0]
print(f'Conversion {light_theme_conversion}, Total: {light_theme_total}')

Conversion 258, Total: 486


In [16]:
dark_theme_conversion = dark_theme[dark_theme['Purchases'] == 'Yes'].shape[0]
dark_theme_total = dark_theme.shape[0]
print(f'Conversion {dark_theme_conversion}, Total: {dark_theme_total}')

Conversion 259, Total: 514


In [18]:
light_theme_conversion_rate = light_theme_conversion / light_theme_total
dark_theme_conversion_rate = dark_theme_conversion / dark_theme_total
print(f'Crt Light: {light_theme_conversion_rate}, Crt Dark: {dark_theme_conversion_rate}')

Crt Light: 0.5308641975308642, Crt Dark: 0.5038910505836576


In [17]:
conversion_count = [light_theme_conversion, dark_theme_conversion]
sample_sizes = [light_theme_total, dark_theme_total]
print(f'Conversion Count: {conversion_count}, Sample Size: {sample_sizes}')

Conversion Count: [258, 259], Sample Size: [486, 514]


In [19]:
zstats, pval = proportions_ztest(conversion_count, sample_sizes)
print(f"Light Conversion Rate: {light_theme_conversion_rate}")
print(f"Dark Conversion Rate: {dark_theme_conversion_rate}")
print(f"A/B Testing (z-Statistic: {zstats}, P-Value: {pval})")

Light Conversion Rate: 0.5308641975308642
Dark Conversion Rate: 0.5038910505836576
A/B Testing (z-Statistic: 0.8531246206222649, P-Value: 0.39359019934127804)


In the comparison of conversion rates based on purchases from both themes, we conducted an A/B test to determine if there is a statistically significant difference in the conversion rates between the two themes. The results of the A/B test are as follows:

z-statistic: 0.8531
p-value: 0.3936
The z-statistic measures the difference between the conversion rates of the two themes in terms of standard deviations. In this case, the z-statistic is approximately 0.8531. The positive z-statistic value indicates that the conversion rate of the Light Theme is slightly higher than the conversion rate of the Dark Theme.

The p-value represents the probability of observing the observed difference in conversion rates or a more extreme difference if the null hypothesis is true. The null hypothesis assumes that there is no statistically significant difference in conversion rates between the two themes. In this case, the p-value is approximately 0.3936.


Since the p-value is greater than the typical significance level of 0.05 (commonly used in A/B testing), we do not have enough evidence to reject the null hypothesis. It means that the observed difference in conversion rates between the two themes is not statistically significant. The results suggest that any observed difference in the number of purchases could be due to random variation rather than a true difference caused by the themes. In simpler terms, based on the current data and statistical analysis, we cannot confidently say that one theme performs significantly better than the other in terms of purchases.

## **Comparison of Both Themes based on Session Duration**

The session duration is also an important metric to determine how much users like to stay on your website. Now I’ll perform a two-sample t-test to compare the session duration from both themes:

In [20]:
light_theme_session_duration = light_theme['Session_Duration']
dark_theme_session_duration = dark_theme['Session_Duration']

In [22]:
# calculate the average from light and dark theme
light_theme_avg_duration = light_theme_session_duration.mean()
dark_theme_avg_duration = dark_theme_session_duration.mean()

print(f'Average Duration Light Theme: {light_theme_avg_duration}, Average Duration dark Theme: {dark_theme_avg_duration}')

Average Duration Light Theme: 930.8333333333334, Average Duration dark Theme: 919.4824902723735


In [25]:
# perform two sample t-test for session duration
tstat, pval = stats.ttest_ind(light_theme_session_duration, dark_theme_session_duration)

print("A/B Testing for Session Duration - t-statistic:", tstat, " p-value:", pval)

A/B Testing for Session Duration - t-statistic: 0.3528382474155483  p-value: 0.7242842138292167


In the comparison of session duration from both themes, we performed an A/B test to determine if there is a statistically significant difference in the average session duration between the two themes. The results of the A/B test are as follows:

## **t-statistic: 0.3528, p-value: 0.7243**

The t-statistic measures the difference in the average session duration between the two themes, considering the variability within the datasets. In this case, the t-statistic is approximately 0.3528. A positive t-statistic value indicates that the average session duration of the Light Theme is slightly higher than the average session duration of the Dark Theme.

The p-value represents the probability of observing the observed difference in average session duration or a more extreme difference if the null hypothesis is true. The null hypothesis assumes there is no statistically significant difference in average session duration between the two themes. In this case, the p-value is approximately 0.7243.


Since the p-value is much greater than the typical significance level of 0.05, we do not have enough evidence to reject the null hypothesis. It means that the observed difference in average session duration between the two themes is not statistically significant. The results suggest that any observed difference in session duration could be due to random variation rather than a true difference caused by the themes. In simpler terms, results indicate that the average session duration for both themes is similar, and any differences observed may be due to chance.

## **Summary**

From the Hypothesis Testing on session duration and purchases, the result from the samples is, theres is no significant difference from both parameters on Light Theme and Dark Theme

In This Section, we use this articles as the references using two sampled test

https://www.statology.org/hypothesis-test-python/