### 1 Sample T test:

Suppose a resturant claims that the average waiting time for customers is 8 minutes. The manager of the restuarant collects a random sample of 20 customer waiting times and wants to test if the actual waiting time is significantly different from the claimed 8 minutes.

-  Null hypothesis : (The average waiting time is 8 minutes.)
-  Alternative hypothesis : (The average waiting time  is not 8 minutes.)

In [2]:
import numpy as np
from scipy.stats import ttest_1samp


waiting_times = np.array([9, 14, 8, 10, 8, 9, 8, 20, 10, 17, 9, 13, 10, 8, 9, 16, 8, 10, 18, 15])
claimed_average = 8

t_statistic, p_value = ttest_1samp(waiting_times, claimed_average)

print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The average waiting time is significantly different from 8 minutes.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 8 minutes.")


T-statistic: 4.012327507007837
P-value: 0.00074491808231056
Reject the null hypothesis: The average waiting time is significantly different from 8 minutes.


Suppose a company wants to test the claim that their batteries last more than 40 hours. 
They take a random sample of 20 batteries and find that the mean battery life is 44.9 hours, with a 
standard deviation of 8.9 hours. To test this claim at a significance level of 0.05, they can use a one-sample 
t-test to determine if the sample mean is statistically different from 40 hours

-  Null hypothesis:  The mean battery life is 40 hours.
-  Alternative hypothesis : The mean battery life is significantly different from 40 hours.

In [5]:
import numpy as np
from scipy.stats import ttest_1samp


sample_mean = 44.9  
sample_std_dev = 8.9  
sample_size = 20  


hypothesized_mean = 40


if sample_std_dev == 0:
    print("Cannot perform t-test: Sample standard deviation is zero.")
else:
   
    t_statistic, p_value = ttest_1samp((sample_mean - hypothesized_mean) / (sample_std_dev / np.sqrt(sample_size)), 0)


    print(f'T-statistic: {t_statistic}')
    print(f'P-value: {p_value}')

   
    alpha = 0.05
    if p_value < alpha:
        print("Reject the null hypothesis: The mean battery life is significantly different from 40 hours.")
    else:
        print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 40 hours.")


T-statistic: nan
P-value: nan
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 40 hours.


### 2 Sample T test:

Suppose a company wants to compare the sales performance of two sales teams, A and B, over the last quarter. The company has the following data:
Group A: Sample size (n1) = 25, Sample mean (x1) = 120, Sample standard deviation (s1) = 15
Group B: Sample size (n2) = 30, Sample mean (x2) = 130, Sample standard deviation (s2) = 20

-  Null Hypothesis (H0): The mean sales of the two teams are equal .
-  Alternative Hypothesis (H1): The mean sales of the two teams are not equal .

In [6]:
import numpy as np
from scipy.stats import ttest_ind

# Group A data
n1 = 25
x1 = 120
s1 = 15

# Group B data
n2 = 30
x2 = 130
s2 = 20


t_statistic, p_value = ttest_ind(np.random.normal(loc=x1, scale=s1, size=n1),
                                  np.random.normal(loc=x2, scale=s2, size=n2),
                                  equal_var=False)  


print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The mean sales of the two teams are significantly different.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean sales.")


T-statistic: -1.3272810575693417
P-value: 0.19010614878599683
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean sales.


A company wants to compare the productivity of two different manufacturing plants. They collect data on the number
of units produced by each plant over a period of time and want to determine if there is a significant difference
in their productivity.
Plant A: 
Sample size (n1) = 2.
Sample mean (x1) = 100 units per d.y
Sample standard deviation (s1) = 5 units
Plan B:
Sample size (n2).= 30
Sample mean (x2) = 120 units p.r day
Sample standard deviation (s2) = 7 units

-  Null Hypothesis (H0): The productivity of the two manufacturing plants is equal .
-  Alternative Hypothesis (H1): The productivity of the two manufacturing plants iequs not al .   

In [8]:
import numpy as np
from scipy.stats import ttest_ind

# Plant A data
n1 = 2
x1 = 100
s1 = 5

# Plant B data
n2 = 30
x2 = 120
s2 = 7

t_statistic, p_value = ttest_ind(np.random.normal(loc=x1, scale=s1, size=n1),
                                  np.random.normal(loc=x2, scale=s2, size=n2),
                                  equal_var=False) 

print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The productivity of the two manufacturing plants is significantly different.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in productivity.")

T-statistic: -4.20806626620011
P-value: 0.10698839039968666
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in productivity.


### 1 Sample proportion Test:

 A company that claims to have a customer satisfaction rate of 80%. To validate this claim, the company collects feedback from a random sample of 200 customers and finds that 150 of them are satisfied.
 
- Null Hypothesis (H0): The customer satisfaction rate is 80% . 
- Alternative Hypothesis (H1): The customer satisfaction rate is different from 80%. 

In [10]:
pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.14.0-cp312-cp312-win_amd64.whl.metadata (9.3 kB)
Collecting patsy>=0.5.2 (from statsmodels)
  Downloading patsy-0.5.4-py2.py3-none-any.whl.metadata (3.4 kB)
Downloading statsmodels-0.14.0-cp312-cp312-win_amd64.whl (9.1 MB)
   ---------------------------------------- 0.0/9.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/9.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/9.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/9.1 MB 262.6 kB/s eta 0:00:35
   ---------------------------------------- 0.1/9.1 MB 409.6 kB/s eta 0:00:23
   ---------------------------------------- 0.1/9.1 MB 656.4 kB/s eta 0:00:14
    --------------------------------------- 0.2/9.1 MB 888.4 kB/s eta 0:00:11
    --------------------------------------- 0.2/9.1 MB 888.4 kB/s eta 0:00:11
   - -------------------------------------- 0.3/9.1 MB 785.2 kB/s eta 0:00:12
   - -------------------------------------- 0.3/9.1

In [11]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

total_customers = 200
satisfied_customers = 150
expected_proportion = 0.80  

z_statistic, p_value = proportions_ztest(satisfied_customers, total_customers, expected_proportion)

print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The customer satisfaction rate is significantly different from 80%.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 80%.")


Z-statistic: -1.6329931618554536
P-value: 0.1024704348597491
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 80%.


A manufacturer claims that only 8% of their products are defective. A sample is taken to test this claim from 
random sample 200 products .

- Null Hypothesis (H0): The proportion of defective products 8%.

-  Alternative Hypothesis (H1): The proportion of defective products is different from 8%.

In [12]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

total_products = 200
defective_products = 16  
expected_proportion = 0.08


z_statistic, p_value = proportions_ztest(defective_products, total_products, expected_proportion)
print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The proportion of defective products is significantly different from 8%.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 8%.")


Z-statistic: 0.0
P-value: 1.0
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference from 8%.


### 2 sample proportion Test:

A company wants to compare customer satisfaction rates between two regions (Region A and Region B). They collect 
feedback from customers in each region and categorize them as satisfied or unsatisfied .

Region A:
- Satisfied: 400; Unsatisfied: 100

Region B:
- Satisfied: 350; Unsatisfied: 150

-  Null Hypothesis (H0): The satisfaction rates are the same for both regions
-  Alternative Hypothesis (H1): The satisfaction rates are different for the two regions

In [13]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

# Data for Region A
satisfied_A = 400
unsatisfied_A = 100
total_A = satisfied_A + unsatisfied_A

# Data for Region B
satisfied_B = 350
unsatisfied_B = 150
total_B = satisfied_B + unsatisfied_B

count = np.array([satisfied_A, satisfied_B])
nobs = np.array([total_A, total_B])
z_statistic, p_value = proportions_ztest(count, nobs, alternative='two-sided')

print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The satisfaction rates are significantly different for the two regions.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in satisfaction rates.")


Z-statistic: 3.6514837167011107
P-value: 0.0002607296328553129
Reject the null hypothesis: The satisfaction rates are significantly different for the two regions.


A manufacturing company sources components from two different suppliers. The company wants to test if the 
proportion of defective products differs between the components supplied by Supplier A and Supplier B.

-  Null Hypothesis (H0): The proportions of defective products from Supplier A and Supplier B are the same.
- Alternative Hypothesis (H1): The proportions of defective products from Supplier A and Supplier B are different. 

In [14]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

# Data for Supplier A
defective_A = 20
total_A = 200

# Data for Supplier B
defective_B = 15
total_B = 200


count = np.array([defective_A, defective_B])
nobs = np.array([total_A, total_B])

z_statistic, p_value = proportions_ztest(count, nobs, alternative='two-sided')


print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The proportions of defective products from Supplier A and Supplier B are significantly different.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in proportions of defective products.")


Z-statistic: 0.884747910407618
P-value: 0.37629260924305163
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in proportions of defective products.


### ANOVA (Analysis of Variance) Test:

A marketing team implements three different strategies to promote a product. The company wants to know if there 
are significant differences in the average conversion rates resulting from these strategies.

Strategy A: 20%, 25%, 18%, 22%, 28%

Strategy B: 15%, 18%, 16%, 20%, 25%

Strategy C: 10%, 12%, 8%, 15%, 14%

-  Null Hypothesis 
​The mean conversion rates are equal for all three strategies.
-  Alternative Hypothesis : Atleast one Strategy has a different mean conversion rate.version rate.

In [15]:
import numpy as np
from scipy.stats import f_oneway


strategy_a = np.array([20, 25, 18, 22, 28])
strategy_b = np.array([15, 18, 16, 20, 25])
strategy_c = np.array([10, 12, 8, 15, 14])

f_statistic, p_value = f_oneway(strategy_a, strategy_b, strategy_c)

print(f'F-statistic: {f_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in the mean conversion rates for the three strategies.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean conversion rates.")


F-statistic: 11.340050377833752
P-value: 0.001716348097449154
Reject the null hypothesis: There is a significant difference in the mean conversion rates for the three strategies.


A company has three different departments (Sales, Marketing, and Operations). The management wants to assess 
if there are any significant differences in the average productivity levels (e.g., sales per employee) 
across these departments.

Sales Department: $100,000, $120,000, $90,000, $110,000, $95,000
  
Marketing Department: $80,000, $85,000, $88,000, $92,000, $78,000
   
Operations Department: $75,000, $80,000, $72,000, $85,000, $88,000

-  Null Hypothesis (H0): The mean productivity levels are equal across all three departments.
-  Alternative Hypothesis (H1): At least one department has a different mean productivity level.

In [16]:
import numpy as np
from scipy.stats import f_oneway

sales = np.array([100000, 120000, 90000, 110000, 95000])
marketing = np.array([80000, 85000, 88000, 92000, 78000])
operations = np.array([75000, 80000, 72000, 85000, 88000])

f_statistic, p_value = f_oneway(sales, marketing, operations)

print(f'F-statistic: {f_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in the mean productivity levels across the departments.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean productivity levels.")


F-statistic: 9.994601889338732
P-value: 0.0027865503411931927
Reject the null hypothesis: There is a significant difference in the mean productivity levels across the departments.


### Paired Test:

A company introduces a new training program for employees to improve their performance. The company measures 
the performance of each employee before and after the training to determine if there is a significant improvement.

Before Training: 75, 82, 70, 78, 80
    
After Training: 85, 88, 75, 90, 92

-  Null Hypothesis (H0): The mean performance before training is equal to the mean performance after training.

-  Alternative Hypothesis (H1): The mean performance after training is greater than the mean performance before 
                                 training.

In [17]:
import numpy as np
from scipy.stats import ttest_rel

before_training = np.array([75, 82, 70, 78, 80])

after_training = np.array([85, 88, 75, 90, 92])

t_statistic, p_value = ttest_rel(before_training, after_training)

print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant improvement in performance after training.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant improvement in performance.")


T-statistic: -6.067798762169178
P-value: 0.0037257461638828234
Reject the null hypothesis: There is a significant improvement in performance after training.


A manufacturing company introduces a new quality control process to improve the performance of its suppliers.
The company measures the defect rates from suppliers before and after implementing the new quality 
control measures.

Before (Defect Rates in Percentage): 3%, 5%, 4%, 6%, 2%

After (Defect Rates in Percentage): 1%, 2%, 3%, 1.5%, 2%

-  Null Hypothesis (H0): The mean defect rates from suppliers before and after implementing the new quality
                         control process are equal.

-  Alternative Hypothesis (H1): There is a significant difference in the mean defect rates before and after 
                             implementing the new quality control process.

In [18]:
import numpy as np
from scipy.stats import ttest_rel

before_quality_control = np.array([3, 5, 4, 6, 2])

after_quality_control = np.array([1, 2, 3, 1.5, 2])

t_statistic, p_value = ttest_rel(before_quality_control, after_quality_control)

print(f'T-statistic: {t_statistic}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in the mean defect rates before and after implementing the new quality control process.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean defect rates.")


T-statistic: 2.6887744785908154
P-value: 0.05472761166714993
Fail to reject the null hypothesis: There is not enough evidence to claim a significant difference in mean defect rates.


### Chi-Square Test:

In [None]:
A manufacturing company wants to investigate if there is a significant association between product defects 
(e.g., Yes/No) and different production shifts (e.g., Morning, Afternoon, Night).

-  Null Hypothesis (H0):   There is no significant association between product defects and production shifts. 
-  Alternative Hypothesis (H1): There is a significant association between product defects and production shifts.

In [19]:
import numpy as np
from scipy.stats import chi2_contingency


contingency_table = np.array([[10, 5, 8],  # Yes
                              [20, 15, 12]])  # No

chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

print(f'Chi-squared statistic: {chi2_stat}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between product defects and production shifts.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant association.")


Chi-squared statistic: 1.0252852297255628
P-value: 0.5989107951793198
Fail to reject the null hypothesis: There is not enough evidence to claim a significant association.


A customer service department collects feedback ratings (e.g., Excellent, Good, Poor) from customers using
different service channels (e.g., Phone, Email, Live Chat). The company wants to test if there is a significant 
association between customer feedback ratings and service channels.

-  Null Hypothesis (H0): There is no significant association between customer feedback ratings and service channels. 
-   Alternative Hypothesis (H1): There is a significant association between customer feedback ratings and 
     service channels.

In [20]:
import numpy as np
from scipy.stats import chi2_contingency

contingency_table = np.array([[10, 15, 5],  # Phone
                              [20, 25, 10],  # Email
                              [15, 10, 5]])  # Live Chat


chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

print(f'Chi-squared statistic: {chi2_stat}')
print(f'P-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant association between customer feedback ratings and service channels.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to claim a significant association.")


Chi-squared statistic: 2.361952861952861
P-value: 0.669513633511435
Fail to reject the null hypothesis: There is not enough evidence to claim a significant association.
