# Test Statistics

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## T Statistic

A marketing manager wants to determine if there is a significant difference in the effectiveness of two different advertising campaigns in terms of customer engagement. The manager selects a **random sample of 20 customers** who were exposed to each campaign and measures their average time spent on the company's website. The **sample mean time** spent for **Campaign A is 25 minutes with a standard deviation of 5 minutes**, and for **Campaign B, it is 30 minutes with a standard deviation of 6 minutes**.

Is there **sufficient evidence** to conclude that the **average time spent on the website differs between the two advertising campaigns**?

In [2]:
from scipy import stats
from scipy.stats import ttest_ind
from scipy.stats import t

In [3]:
mean_A = 25
std_dev_A = 5
n_A = 20

mean_B = 30
std_dev_B = 6
n_B = 20

x = np.linspace(-5, 5, 1000)

df = n_A + n_B - 2

In [4]:
t_statistic, p_value = stats.ttest_ind_from_stats(mean_A, std_dev_A, n_A, mean_B, std_dev_B, n_B)

print("t-statistic:", t_statistic)
print("p-value:", p_value)

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

t-statistic: -2.862991671569341
p-value: 0.006793241953421031
Reject the null hypothesis


## Z-Test

A company is testing a new advertising strategy to determine if it increases website traffic. They randomly select two groups of users: Group A, who are exposed to the new advertising strategy, and Group B, who are not exposed and serve as the control group. The company wants to know if there is a significant difference in the average number of website visits between the two groups.<br>
Group A (exposed): Average website visits = 120, Standard deviation = 15, Sample size = 50<br>
Group B (control): Average website visits = 110, Standard deviation = 20, Sample size = 50

In [5]:
mean_A = 120
std_dev_A = 15
sample_size_A = 50

mean_B = 110
std_dev_B = 20
sample_size_B = 50

In [6]:
standard_error_diff = np.sqrt((std_dev_A**2 / sample_size_A) + (std_dev_B**2 / sample_size_B))

Z_score = (mean_A - mean_B) / standard_error_diff

alpha = 0.05


critical_Z = stats.norm.ppf(1 - alpha / 2)

if abs(Z_score) > critical_Z:
    print("Reject the null hypothesis. There is a significant difference in website visits between the two groups.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in website visits between the two groups.")

Reject the null hypothesis. There is a significant difference in website visits between the two groups.


## F-Test

An analyst is comparing the performance of two different investment portfolios over the past year. The analyst wants to determine if there is a **significant difference** in the **annual returns between the two portfolios**. The 
annual returns for Portfolio A and Portfolio B are provided below:

| Portfolio | Returns                      |
|:-----------|:------------------------------|
| A         | 10%, 8%, 12%, 9%, 11%        |
| B         | 9%, 11%, 10%, 9%, 12%, 10%   |

Calculate the **F-statistic** to test if there is a significant difference in the annual returns between Portfolio A and Portfolio B.

Determine if there is a significant difference based on the **F-statistic and a significance level of 0.05**.

In [7]:
from scipy.stats import f_oneway
from scipy.stats import f

In [8]:
returns_a = [10, 8, 12, 9, 11]
returns_b = [9, 11, 10, 9, 12, 10]

x = np.linspace(0.01, 10, 1000)

dfn = len(returns_a) - 1
dfd = len(returns_b) - 1

In [9]:
f_statistic, p_value = f_oneway(returns_a, returns_b)

print("F-statistic:", f_statistic)
print("p-value:", p_value)

alpha = 0.05
if p_value < alpha:
    print("There is a significant difference in the annual returns between Portfolio A and Portfolio B.")
else:
    print("There is no significant difference in the annual returns between Portfolio A and Portfolio B.")

F-statistic: 0.0405040504050405
p-value: 0.8449732660946977
There is no significant difference in the annual returns between Portfolio A and Portfolio B.


## Chi-Squre Test

A market researcher wants to determine if there is a **significant relationship between** the **age group** of customers and their preferred **mode of shopping** (online or in-store). The researcher collects data from a **sample of 200 customers** and tabulates the results in a contingency table, as shown below:

|  | Online Shopping | In-Store Shopping |
| :------------|:--------------:| :-------------:|
| Under 30 years old | 50 | 30 |
| 30-50 years old | 40 | 45 |
| Over 50 years old | 20 | 15 |

**Calculate the Chi-Square statistic** to test if there is a significant relationship between the **age group** of customers and their **preferred mode of shopping**. <br>
Determine if there is a significant relationship based on the **Chi-Square statistic and a significance level of 0.05**.

In [10]:
from scipy.stats import chi2_contingency

In [11]:
observed = [[50, 30],
            [40, 45],
            [20, 15]]

In [12]:
chi_square_statistic, p_value, dof, expected = chi2_contingency(observed)

print("Chi-Square Statistic:", chi_square_statistic)
print("p-value:", p_value)

alpha = 0.05
if p_value < alpha:
    print("There is a significant relationship between the age group of customers and their preferred mode of shopping.")
else:
    print("There is no significant relationship between the age group of customers and their preferred mode of shopping.")

Chi-Square Statistic: 4.048892284186402
p-value: 0.13206696927038877
There is no significant relationship between the age group of customers and their preferred mode of shopping.
