### Hypothesis testing

In [1]:
import pandas as pd
import scipy
import numpy as np
from scipy import stats

1. Hypothesis Formulation:
- A company claims that their new energy drink increases focus and alertness.
Formulate the null and alternative hypotheses for testing this claim.

### Below are the Null and Alternate hypothesis

* $H_0$: No changes in focus and alertness
* $H_A$: Focus and alertness increased

3. Interpreting p-values:
- In a study investigating the effectiveness of a new teaching method, the calculated
p-value is 0.03. What does this p-value indicate about the null hypothesis?

##### If the significance level $\alpha$ is 0.05 then we can reject null hypothesis since p is lower than alpha. 
##### In case the $\alpha$ is 0.01 then we can accept the null hypothesis since p is higher than alpha.

4. Type I and Type II Errors:
- Describe a scenario in which a Type I error could occur in hypothesis testing. How
does it differ from a Type II error?

##### Type I error are also knows as False positive where the value is negative but the hypothesis test gives positive result. 
##### Example : a non diabetic person detected positive for diabetic test.
##### Type II error are known as False negative where actual value is positive but hypothesis test gives negative result.

5. Right-tailed Hypothesis Testing:
- A manufacturer claims that their new light bulb lasts, on average, more than 1000
hours. Conduct a right-tailed hypothesis test with a significance level of 0.05, given a
sample mean of 1050 hours and a sample standard deviation of 50 hours.

### Null hypothesis
$H_0$ : $\mu$ = 1050
### Alternate hypothesis
$H_A$ : $\mu$ > 1000

### Sample standard deviation
s = 50
### Level of significance
$\alpha$ = 0.05

Sample mean = 1000

z = (1050-1000)/50
z

In [5]:
p_value = (1 - stats.norm.cdf(1.0))
p_value

0.15865525393145707

In [None]:
# Since p_value is more than alpha so we can say that Null hypothesis is true

6. Two-Tailed Hypothesis Testing:
- A researcher wants to determine if there is a difference in mean exam scores between
two groups of students. Formulate the null and alternative hypotheses for this study as a
two-tailed test.

### Mean exam score of Group A = $\mu_a$
### Mean exam score of Group B = $\mu_b$

### Null hypothesis :  $\mu_a$ - $\mu_b$ = 0
### Alternate hypothesis :$\mu_a$ - $\mu_b$ $\neq$ 0


7. One-sample t-test:
- A manufacturer claims that the mean weight of their cereal boxes is 500 grams. A
sample of 30 cereal boxes has a mean weight of 490 grams and a standard deviation of
20 grams. Conduct a one-sample t-test to determine if there is evidence to support the
manufacturer's claim at a significance level of 0.05.

In [6]:
# Sample mean = 490 grams
# Sample size = 30
# Sample standard deviation = 20 grams

### Null hypothesis : Mean weight of cereal boxes $\mu$ $\neq$ 500
### Alternate hypothesis : Mean weight of cereal boxes $\mu$ = 500

In [7]:
T_stat = (490 - 500)/(30/np.sqrt(30))
T_stat

-1.8257418583505538

In [10]:
2*stats.t.cdf(-1.83, df = 29)

0.07754707910240866

In [None]:
# Since alpha is less than pvalue we accept the null hypothesis

8 . Two-sample t-test:
- A researcher wants to compare the mean reaction times of two different groups of
participants in a driving simulation study. Group A has a mean reaction time of 0.6
seconds with a standard deviation of 0.1 seconds, while Group B has a mean reaction
time of 0.55 seconds with a standard deviation of 0.08 seconds. Conduct a two-sample
t-test to determine if there is a significant difference in mean reaction times between the
groups at a significance level of 0.01.

### Null hypothesis : Mean reaction time of group A $\mu_A$ = Mean reaction time of group B $\mu_B$ 
### Alternate hypothesis : Mean reaction time of group A $\mu_A$ $\neq$ Mean reaction time of group B $\mu_B$ 

In [14]:
# Lets take the sample size of 30
# uA = 0.6, S.D of A = 0.1, uB = 0.55, S.D of B = 0.08, Alpha = 0.01

In [None]:
# Calculate the t-statistic and p-value

# t_statistic, p_value = stats.ttest_ind_from_stats(mean_A, std_dev_A, n_A, mean_B, std_dev_B, n_B)

In [11]:
t_statistic, p_value = stats.ttest_ind_from_stats(0.6,0.1,30,0.55,0.08,30)

In [12]:
t_statistic

2.138497306920751

In [13]:
p_value

0.03669984926105974

In [None]:
# Since p value is more than alpha so null hypothesis is accepted

9. Process Control Example:
- A call center manager implements a new training program aimed at reducing call
waiting times. The average waiting time before the training program was 4.5 minutes, and
after the program, it is measured to be 4.0 minutes with a standard deviation of 0.8
minutes. Conduct a hypothesis test to determine if there is evidence that the training
program has reduced waiting times, using a significance level of 0.05.

### Null hypothesis : Mean waiting time before training $\mu_bt$ = Mean waiting time after training $\mu_at$ 
### Alternate hypothesis : Mean waiting time before training $\mu_bt$ > Mean waiting time after training $\mu_at$  

In [None]:
# Lets take the sample size of 30
# ubt = 4.5, S.D = 0.8, uat = 4, Alpha = 0.05

In [18]:
std_error = ((0.8 ** 2) / 30 + (0.8 ** 2) / 30) ** 0.5
std_error

0.2065591117977289

In [19]:
mean_difference = 4.5 - 4
mean_difference

0.5

In [20]:
t_stat = mean_difference/std_error
t_stat

2.4206145913796355

In [23]:
df = 30-1
p_value = 2*(1-stats.t.cdf(t_stat,df))
p_value
             

0.021985085440789698

In [None]:
# Since p value is less than alpha so Alternate hypothesis is accepted

10. Interpreting Results:
- After conducting a hypothesis test, the calculated p-value is 0.02. What can you
conclude about the null hypothesis based on this result, assuming a significance level of
0.05?

In [None]:
# Since pvalue of 0.02 is less than significance level of 0.05 so alternate hypotheis is accepted