<a href="https://colab.research.google.com/github/basava-999/Data-Science/blob/main/statistical_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from scipy import stats


## **( 1 ). Levene_Test**

 **Null Hypothesis        ............... ::** *All the groups have similar variance*

 **Alternative hypothesis ..... ::** *Atleast one of the groups have different variance from rest*

**It is mostly used to test equal variance assumptions that are required for other statistical tests**

In [None]:
# random groups data
a = np.random.normal( 60, 5, 100)
b = np.random.normal( 75, 10, 100)
c = np.random.normal( 90, 2, 100)

# K, N values
K = 3
N = len(a) + len(b) + len(c)

# means of each group
mean_a = np.mean(a)
mean_b = np.mean(b)
mean_c = np.mean(c)

# absolute deviations
abs_dev_a = np.abs( a - mean_a )
abs_dev_b = np.abs( b - mean_b)
abs_dev_c = np.abs( c - mean_c)

# means of absolute deviations
mean_abs_dev_a = np.mean( abs_dev_a )
mean_abs_dev_b = np.mean( abs_dev_b )
mean_abs_dev_c = np.mean( abs_dev_c )

# grand mean
grand_mean = ( mean_abs_dev_a + mean_abs_dev_b + mean_abs_dev_c ) / 3

# calculation of numerator
numerator = len(a)*(mean_abs_dev_a - grand_mean)**2 + len(b)*(mean_abs_dev_b - grand_mean)**2 + len(c)*(mean_abs_dev_c - grand_mean)**2

# denominator calculation
d_a = np.sum( ( abs_dev_a - mean_abs_dev_a)**2 )
d_b = np.sum( ( abs_dev_b - mean_abs_dev_b)**2 )
d_c = np.sum( ( abs_dev_c - mean_abs_dev_c)**2 )

denominator = d_a + d_b + d_c

# df1, df2
df1 = K - 1
df2 = N - K

# Test statistics
statistics = ( df2 * numerator ) / ( df1 * denominator )

# p_value at df1, df2 & statistics
p_value = stats.f.sf( statistics, df1, df2)

statistics, p_value

(np.float64(76.16665736850754), np.float64(1.985604402173233e-27))

**scipy.stats.levene( statistic, df1, df2 , centre = 'mean' )**

In [None]:
stat , p = stats.levene( a, b, c, center = 'mean')
stat, p

(np.float64(68.91151441122352), np.float64(2.599686631568808e-25))

## **( 2 ). Z Test**

The Z-test is a statistical hypothesis test that determines whether there is a significant difference between the sample mean and the population mean, or between the means of two samples.

 It is commonly used when the population is normally distributed

###### **Types of Z-Tests**

*One-Sample Z-Test:_________* Used to compare the sample mean to a known population mean.


*   Comparing the average height of students in a class with the national average.

*   Comparing the mean weight of a sample of fruits to the standard weight of the fruit.

*Two-Sample Z-Test:________* Used to compare the means of two independent samples.


*   Comparing the average sales of two stores to determine if one store is performing better than the other.

*   Comparing the mean exam scores of two different classrooms.


*Z-Test for Proportions:_* Used to test if the proportion of a sample differs significantly from a population proportion.
*   Checking if the proportion of defective products from a factory is above a certain threshold.

*   Determining whether the proportion of people in a city supporting a particular policy is different from the national average.

In [None]:
# One-Sample Z-Test

np.random.seed( 9 )
# H0 : no significant difference
# H1 : significant difference exists


data = np.random.normal( 52, 15, 100)      # mean = 45, std = 15, samples = 50, sample data

population_mean = 60
population_std  = 9.45

sample_mean = np.mean( data )
sample_len  = len( data )

statistics  = ( sample_mean - population_mean ) / ( population_std / np.sqrt(sample_len))  # Z - score
p_value     =  2 * ( 1 - stats.norm.cdf( np.abs( statistics)) )

print( f'Statistics : { round( statistics, 4) }\nP_value    : { p_value }')

alpha = 0.05

if p_value >= alpha:
  print( '\n\nNull Hypothesis Accepted')
  print( 'There is no significant difference between sample mean and population mean')
else:
  print( '\n\nNull Hypothesis Rejected')
  print( 'There is significant difference between sample mean and population mean')

Statistics : -8.6696
P_value    : 0.0


Null Hypothesis Rejected
There is significant difference between sample mean and population mean


#### Applications of One-Sample Z-Test

1. **Manufacturing**: Testing if the average lifespan of bulbs (sample) differs from the claimed 1,000 hours.  
2. **Medical Research**: Checking if the average weight loss in patients after taking a drug differs from the expected 5 kg.  
3. **Education**: Verifying if the average math test score in a class is different from the national average of 70.  
4. **Environmental Studies**: Evaluating if the average AQI in a city differs from the reported safe limit of 50.  

In [None]:
# two-sample Z-Test

np.random.seed( 10 )
# H0 : no significant difference two groups
# H1 : significant difference exists

sample_1 = np.random.normal( 85, 7, 100)
sample_2 = np.random.normal( 78, 3, 100)

sample_1_mean, sample_2_mean = np.mean( sample_1 ), np.mean( sample_2 )
sample_1_std, sample_2_std   = np.std( sample_1 ), np.std( sample_2)
n1, n2 = len( sample_1 ), len( sample_2 )

numerator   = sample_1_mean - sample_2_mean
denominator = np.sqrt( (sample_1_std**2 / n1) + (sample_2_std**2 / n2) )

statistics_2 = numerator / denominator
p_value_2    = 2 * ( 1 - stats.norm.cdf( np.abs( statistics_2 )))

print( f'Statistics : { round( statistics_2, 4) }\nP_value    : { p_value_2 }')

alpha = 0.05

if p_value_2 >= alpha:
  print( '\n\nNull Hypothesis Accepted')
  print( 'There is no significant difference between two groups')
else:
  print( '\n\nNull Hypothesis Rejected')
  print( 'There is significant difference between two groups')


Statistics : 9.9483
P_value    : 0.0


Null Hypothesis Rejected
There is significant difference between two groups



### Applications of Two-Sample Z-Test  

1. **Medical Research**: Comparing the mean blood pressure reduction in two groups after administering different medications.  
2. **Marketing Research**: Testing whether two different advertising campaigns lead to different average sales.  
3. **Education**: Comparing the average test scores of students in two different schools to determine if there’s a significant difference in performance.  
4. **Manufacturing**: Checking whether machines A and B produce products with different average weights.