In [1]:
import pandas as pd
import numpy as np

## A/B Testing for Business Decision Making

A/B testing compares two versions of a product (e.g., a webpage, feature, or campaign) to determine which performs better. It involves testing a null   hypothesis that there’s no difference between the two groupser?

**Example Problem:** You are testing two versions of a webpage (A and B) to compare their conversion rates. Webpage A had 2000 visitors with 300 conversions, while Webpage B had 1800 visitors with 330 conversions. Is webpage B significantly better?

In [10]:
from statsmodels.stats.proportion import proportions_ztest

In [11]:
# conversion data
success_a, success_b = 300, 330  # conversions
n_a, n_b = 2000, 1800  # visitors

# perform z-test
stat, p_value = proportions_ztest([success_a, success_b], [n_a, n_b])
print(f"Z-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: Webpage B performs significantly better.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")

Z-Statistic: -2.76, P-Value: 0.0058
Reject the null hypothesis: Webpage B performs significantly better.


## One-Sample t-Test for Mean Comparison

A one-sample t-test is used to determine whether the mean of a single sample differs significantly from a known or hypothesized population mean.

**Example Problem:** Your company claims that the average delivery time for online orders is 30 minutes. A random sample of 50 deliveries has an average time of 32 minutes with a standard deviation of 5 minutes. Is the claim accurate?

In [15]:
from scipy.stats import ttest_1samp
import numpy as np

# sample data
sample_times = np.random.normal(32, 5, 50)  # randomly generated data with mean 32
population_mean = 30  # hypothesized mean

# perform one-sample t-test
stat, p_value = ttest_1samp(sample_times, population_mean)
print(f"T-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: The average delivery time is not 30 minutes.")
else:
    print("Fail to reject the null hypothesis: The average delivery time is 30 minutes.")

T-Statistic: 2.04, P-Value: 0.0464
Reject the null hypothesis: The average delivery time is not 30 minutes.


## Two-Sample t-Test for Comparing Means

A two-sample t-test is used to compare the means of two independent groups to determine if there is a statistically significant difference between them.

**Example Problem:** You want to compare the average sales of two stores (Store A and Store B). Store A’s sales data has a mean of $5000 with a   standard deviation of $700 (50 observations), while Store B’s sales data has a mean of $5200 with a standard deviation of $750 (45 observations). Are the sales significantly different?

In [20]:
from scipy.stats import ttest_ind

# sample data
mean_a, std_a, n_a = 5000, 700, 50
mean_b, std_b, n_b = 5200, 750, 45

np.random.seed(42)
sales_a = np.random.normal(mean_a, std_a, n_a)
sales_b = np.random.normal(mean_b, std_b, n_b)

# perform two-sample t-test
stat, p_value = ttest_ind(sales_a, sales_b)
print(f"T-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: The average sales are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in sales.")

T-Statistic: -2.88, P-Value: 0.0049
Reject the null hypothesis: The average sales are significantly different.


## Chi-Square Test for Independence
The Chi-Square test is used to determine if two categorical variables are independent. It is commonly used for surveys or marketing data.

**Example Problem:** You are analyzing customer preferences based on two variables: Gender (Male/Female) and Preferred Product (Product A/Product B). Is there a significant association between gender and product preference?

In [22]:
from scipy.stats import chi2_contingency
import pandas as pd

# contingency table
data = {'Product A': [50, 60], 'Product B': [30, 40]}
df = pd.DataFrame(data, index=['Male', 'Female'])

# perform Chi-Square test
stat, p_value, dof, expected = chi2_contingency(df)
print(f"Chi-Square Statistic: {stat:.2f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: Gender and product preference are associated.")
else:
    print("Fail to reject the null hypothesis: No significant association.")

df

Chi-Square Statistic: 0.04, P-Value: 0.8508
Fail to reject the null hypothesis: No significant association.


Unnamed: 0,Product A,Product B
Male,50,30
Female,60,40


## ANOVA for Comparing Multiple Groups
ANOVA is used to compare the means of more than two groups to determine if at least one group’s mean is significantly different.

**Example Problem:** You are comparing the average monthly sales of three regions (North, South, and West). Generate sales data and check if there is a significant difference in sales across regions.

In [23]:
from scipy.stats import f_oneway

# sample data
np.random.seed(42)
north_sales = np.random.normal(5000, 500, 30)
south_sales = np.random.normal(5200, 600, 30)
west_sales = np.random.normal(4800, 400, 30)

# perform one-way ANOVA
stat, p_value = f_oneway(north_sales, south_sales, west_sales)
print(f"F-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Reject the null hypothesis: At least one region has significantly different sales.")
else:
    print("Fail to reject the null hypothesis: No significant difference in sales across regions.")

F-Statistic: 3.64, P-Value: 0.0304
Reject the null hypothesis: At least one region has significantly different sales.


## Summary

1) A/B Testing: Comparing proportions to assess business strategies.
2) One-Sample t-Test: Validating claims about a population mean.
3) Two-Sample t-Test: Comparing means between two independent groups.
4) Chi-Square Test: Assessing associations between categorical variables.
5) ANOVA: Comparing means across multiple groups.