A cooking website sells cooking equipment. They want to improve their product page. The product page has a rail with media (images or videos). Up until now, they have had a horizontal rail, a UX designer suggested having a vertical one. As a data analyst in the company, you suggested doing an AB test. You now have the results of the test and need to advise the company on whether they should ship the feature (vertical media rail) or not.


What is the business problem you are trying to solve?
Why does it matter to the company?
What are the benefits and drawbacks of the new variant in your opinion?
What is the hypothesis you are trying to test? What is the null hypothesis, what is the alternative hypothesis?

Why is an AB test a good method for testing the hypothesis? 
Propose another method that could have been used and compare pros and cons of an AB test vs this method.
What metrics did you select in the AB test, and why? What is the primary metric? Are there other metrics you wished you had?
How do the distribution of your metrics compare between the two variants? Show me plots comparing the distributions.
What test did you apply? How did you specify your test and why (type of test, underlying distribution, parameters of the distribution, sample size, p-value…)
What is the result of the test? 
What do you recommend the company to do?

![image.png](attachment:image.png)

Chat

S8 Formulate and define a hypothesis for a research question:

Business Problem: The business wants to improve the product page to enhance user experience and potentially increase sales by optimizing the display of media (images or videos) for cooking equipment.

Importance to the Company: The product page is a critical touchpoint for customers in their purchasing journey. Enhancing this page can lead to better engagement, increased conversion rates, and ultimately higher revenue.

Benefits and Drawbacks of the New Variant:

Benefits:
Vertical media rail might provide a better user experience by allowing users to scroll through media content more intuitively.
It could potentially increase engagement with the media content, leading to more informed purchasing decisions.
Drawbacks:
Vertical layout may not fit well with the overall design of the product page.
It could potentially push down other important elements on the page, reducing their visibility.
Hypothesis:

Null Hypothesis (H0): There is no significant difference in user engagement metrics between the horizontal and vertical media rails.
Alternative Hypothesis (H1): User engagement metrics will be significantly different between the horizontal and vertical media rails.
S9 Apply the most appropriate statistical technique to evaluate a hypothesis, using a data analysis tool:

Why an AB Test is a Good Method:

AB testing allows for a controlled experiment where users are randomly assigned to either the control group (horizontal media rail) or the experimental group (vertical media rail).
It provides a straightforward way to measure the impact of the change by comparing key metrics between the two groups.
Comparison with Another Method:

Proposed Method: Multivariate testing, where multiple elements of the product page are changed simultaneously and their combined impact is measured.
Pros and Cons of AB Test vs. Multivariate Testing:
AB Test:
Pros: Simple setup, clear interpretation of results, focused on a single change.
Cons: Limited in assessing interactions between multiple changes, may not capture holistic impact.
Multivariate Testing:
Pros: Can assess combined impact of multiple changes, captures interactions between elements.
Cons: Complex setup, potentially harder to interpret results, requires larger sample sizes.
Metrics Selected in the AB Test:

Metrics: Click-through rate (CTR), time spent on page, and conversion rate.
Primary Metric: Conversion rate, as it directly measures the effectiveness of the product page in driving sales.
Other Metrics: Bounce rate, engagement with media content.
Comparison of Metric Distributions:

Visualize distributions of metrics (CTR, time spent on page, conversion rate) between horizontal and vertical media rails using histograms or density plots.
Test Applied:

Independent samples t-test for comparing means between two groups.
Specify test: Two-tailed test assuming unequal variances, significance level (alpha) set at 0.05.
Result of the Test:

Conduct the t-test for each metric (CTR, time spent on page, conversion rate).
Determine if there is a statistically significant difference between the two groups for each metric.
Analyze the effect size to understand the practical significance of the differences.
Recommendation:

If the test reveals a statistically significant improvement in the primary metric (conversion rate) with the vertical media rail, recommend implementing the change.
Consider any practical implications, such as design compatibility and user feedback, before making the final decision.

a two sample test determines whether or not two populations parameters such as two means or two proportions are equal to each other.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [3]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Load the data from the CSV file
data = pd.read_csv("G:\My Drive\Hyper Island\Stats\Assessment\assessment_da25.csv")

# Display the first few rows of the dataframe
print(data.head())

# Descriptive statistics
print(data.describe())

# Visualize the distributions of metrics between the two variants
metrics = ['Number of page views', 'GMV (in $)', 'Number of add to cart', 'Clicks on media', 'Time on Page (sec)']

for metric in metrics:
    plt.figure(figsize=(10, 5))
    plt.hist(data[data['Variant'] == 'horizontal'][metric], alpha=0.5, label='Horizontal', color='blue', density=True)
    plt.hist(data[data['Variant'] == 'vertical'][metric], alpha=0.5, label='Vertical', color='red', density=True)
    plt.title(metric)
    plt.legend()
    plt.show()

# Perform independent t-test for each metric
results = {}
for metric in metrics:
    t_stat, p_value = stats.ttest_ind(data[data['Variant'] == 'horizontal'][metric], data[data['Variant'] == 'vertical'][metric], equal_var=False)
    results[metric] = {'t_stat': t_stat, 'p_value': p_value}

# Print the results
for metric, result in results.items():
    print(f"T-test results for {metric}:")
    print(f"T-statistic: {result['t_stat']}")
    print(f"P-value: {result['p_value']}")
    if result['p_value'] < 0.05:
        print("Result is statistically significant at alpha = 0.05")
    else:
        print("Result is not statistically significant at alpha = 0.05")
    print()

# Recommendation
# Assuming conversion rate is represented by 'GMV (in $)'
conversion_rate_p_value = results['GMV (in $)']['p_value']
if conversion_rate_p_value < 0.05:
    print("There is a statistically significant difference in conversion rate (GMV) between horizontal and vertical media rails.")
    print("Recommendation: Implement the vertical media rail.")
else:
    print("There is no statistically significant difference in conversion rate (GMV) between horizontal and vertical media rails.")
    print("Recommendation: No need to change from the horizontal media rail.")


  data = pd.read_csv("G:\My Drive\Hyper Island\Stats\Assessment\assessment_da25.csv")


FileNotFoundError: [Errno 2] No such file or directory: 'G:\\My Drive\\Hyper Island\\Stats\\Assessment\x07ssessment_da25.csv'