# Hypothesis testing

Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on a sample of data. In simple terms, it helps us determine if an observed effect or difference is likely to be real and not just due to random chance.

# Performing hypothesis testing involves several key steps.


State the Hypotheses:

Null Hypothesis (H0): This hypothesis usually represents the default or no-effect scenario.
Alternative Hypothesis (H1): This is what you are trying to test or find evidence for.
For example:


0
:
H 
0
​
 : The average weight of the cookies is 50 grams.

1
:
H 
1
​
 : The average weight of the cookies is different from 50 grams.
Choose the Significance Level (

α):

Typically, 

α is set to 0.05, representing a 5% chance of rejecting the null hypothesis when it's true. This is the threshold for considering results statistically significant.
Collect and Analyze Data:

Collect a sample of data relevant to your hypothesis.
Use statistical methods to analyze the data and calculate relevant statistics (e.g., mean, standard deviation).
Select the Appropriate Statistical Test:

The choice of test depends on factors such as the type of data (e.g., categorical or numerical) and the nature of the hypotheses. Common tests include t-tests, chi-square tests, ANOVA, etc.
Calculate the Test Statistic and P-value:

The test statistic is a numerical value calculated from the data that is used to determine the likelihood of obtaining the observed results under the null hypothesis.
The p-value is the probability of observing a result as extreme as, or more extreme than, the one obtained if the null hypothesis is true.
Make a Decision:

If the p-value is less than 

α, you reject the null hypothesis.
If the p-value is greater than 

α, you fail to reject the null hypothesis.
Interpret Results:

If you reject the null hypothesis, you can conclude that there is enough evidence to support the alternative hypothesis.
If you fail to reject the null hypothesis, you do not have enough evidence to support the alternative hypothesis.
Draw Conclusions:

Based on your decision, draw conclusions about the population parameter you are investigating.
Consider Limitations and Further Research:

Reflect on any limitations in your study and consider whether further research is needed.
It's important to note that the steps may vary slightly depending on the specific hypothesis test and the context of the study. Also, statistical software is often used to perform the calculations and analyze the data.

# Z-Test:

Used when you have a large sample size (typically n≥30) and you know the population standard deviation.

# T-Test:

One-Sample T-Test: Used when you have a small sample size (n<30) and you're comparing the mean of a sample to a known or hypothesized population mean.
    
Independent Samples T-Test: 
Used when comparing the means of two independent groups.
    
Paired Samples T-Test: 
Used when comparing the means of two related groups (e.g., before and after measurements).

# Chi-Square Test:

Chi-Square Test for Independence: 

Used for categorical data to determine if there is a significant association between two categorical variables.
    
Chi-Square Goodness-of-Fit Test: 

Used to test whether the observed frequency distribution of a categorical variable matches an expected distribution.

# Regression Analysis:

Used to examine the relationship between one or more independent variables and a dependent variable.

# How to perform Z-Test



Suppose you work for a company that produces light bulbs, and the average lifespan of the bulbs is claimed to be 1200 hours. You've taken a sample of 50 bulbs and want to test if the sample mean lifespan is significantly different from the claimed average.



![04.10.2023_14.57.35_REC.png](attachment:04.10.2023_14.57.35_REC.png)

![04.10.2023_14.58.23_REC.png](attachment:04.10.2023_14.58.23_REC.png)

![04.10.2023_14.59.03_REC.png](attachment:04.10.2023_14.59.03_REC.png)

![04.10.2023_14.59.40_REC.png](attachment:04.10.2023_14.59.40_REC.png)

![04.10.2023_15.00.09_REC.png](attachment:04.10.2023_15.00.09_REC.png)

![04.10.2023_15.00.46_REC.png](attachment:04.10.2023_15.00.46_REC.png)

![04.10.2023_15.01.31_REC.png](attachment:04.10.2023_15.01.31_REC.png)

![04.10.2023_15.02.01_REC.png](attachment:04.10.2023_15.02.01_REC.png)

In [36]:
import numpy as np
import statsmodels.api as sm


#1st Steap create sample Data

sample_size = 50
mean = 1200
std_dev = 10



sample_data = np.random.normal(loc = mean , scale = std_dev , size = sample_size)

sample_data

#Roundup Values

sample_data = np.round(sample_data).astype(int)

sample_data

pop_mean = 1200
alpha = 0.05

z_stats , p_value = sm.stats.ztest(sample_data,value=pop_mean)

print("Z value of Sample Data : " , z_stats)
print("P value of Sample Data : " , p_value)

if p_value < alpha :
    print("We reject Null Hyphothesis.... means buble lifespan is not equal to 1200")
else :
    print("We Accept Null Hyphothesis.... means buble lifespan is  equal to 1200")
    

Z value of Sample Data :  -1.7402086599766535
P value of Sample Data :  0.081822385556228
We Accept Null Hyphothesis.... means buble lifespan is  equal to 1200


# How to perform T-Test with simple example

![04.10.2023_15.25.58_REC.png](attachment:04.10.2023_15.25.58_REC.png)

In [38]:
from scipy import stats

#One sample Test

mean = 200 

sample_data = [168,175,152,160,200,180,120] #sample_data = 7

#H0 = mean = 200gm (claimed by factory)
#H1 = mean != 200gm (we are testing factory clamied)


t_stats,p_value = stats.ttest_1samp(sample_data,mean)

print("T stats of Sample Data : " , t_stats)
print("P value of Sample Data : " , p_value)

alpha = 0.05 

if p_value < alpha :
    print("200 gm wt of chocalte claimed by factory is Wrong")

else :
    print("Wt is may equal to 200 Gm")

T stats of Sample Data :  -3.6903003359636672
P value of Sample Data :  0.01020509379066617
200 gm wt of chocalte claimed by factory is Wrong



![04.10.2023_15.26.50_REC.png](attachment:04.10.2023_15.26.50_REC.png)

In [41]:
grp_A = [78,84,88,74,79]

grp_B = [65,52,44,35,64]


t_stats , p_value = stats.ttest_ind(grp_A,grp_B)

print("T stats of Sample Data : " , t_stats)
print("P value of Sample Data : " , p_value)

alpha = 0.5


if p_value < alpha :
    print("Grp A & Grp B are differnet with their score")

else :
    print("Grp A & Grp B study show same Score")

T stats of Sample Data :  4.564475554331812
P value of Sample Data :  0.0018390712773260337
Grp A & Grp B are differnet with their score
