Suppose a company is evaluating the impact of a new training program on the productivity of its employees. The company has data on the average productivity of its employees before implementing the new training program. The average productivity was 50 units per day with a known pop standard deviation of 5 units. After implementing the training program, the company measures the productivity of a random sample of 30 employees. The sample employees have an average productivity of 53 units per day. The company wants to know if the new training program has significantly improved the productivity of the employees.

Ho= The sample employees have an average productivity of 53 units per day

H1 = The sample employees have not an average productivity of 53 units per day

In [7]:
import numpy as np
from statsmodels.stats.weightstats import ztest

sample_data = list(np.random.randint(50, 59, size = 30))
print(sample_data)

pm = 50
pop_std_dev = 5

standardized_sample_data = [(x - pm) / (pop_std_dev / np.sqrt(len(sample_data))) for x in sample_data]

Z_statistic, p_value = ztest(standardized_sample_data, value = 0)

print("Z-statistic:", Z_statistic)
print("p-value:", p_value)


[np.int64(56), np.int64(55), np.int64(57), np.int64(54), np.int64(55), np.int64(55), np.int64(51), np.int64(56), np.int64(55), np.int64(51), np.int64(58), np.int64(52), np.int64(57), np.int64(53), np.int64(56), np.int64(51), np.int64(51), np.int64(53), np.int64(55), np.int64(54), np.int64(54), np.int64(51), np.int64(56), np.int64(57), np.int64(53), np.int64(55), np.int64(52), np.int64(54), np.int64(53), np.int64(55)]
Z-statistic: 11.119161097842847
p-value: 1.0121924761107612e-28


NOTE:- here we are rejecting Ho as per p value <0.05


The company claims that their new algorithm can process a specific dataset in an average of 20 minutes, which is faster than the current average processing time of 22 minutes using the standard algorithm. To validate this claim, a data scientist decides to conduct a t-test.
The data scientist collects a sample of processing times using the new algorithm. The sample consists of 10 processing times (in minutes):

Sample Data : 19,18,21,20,19,22,18,17,21,20

The data scientist wants to test if the new algorithm significantly reduces the processing time compared to the standard average of 22 minutes. The hypothesis for the t-test would be set up as follows:
- `Null Hypothesis (H₀)`: The mean processing time using the new algorithm is equal to 22 minutes. (μ≥22)
- `Alternative Hypothesis (H₁)`: The mean processing time using the new algorithm is less than 22 minutes. (μ<22)

In [9]:
from scipy.stats import ttest_1samp
pm = 22
sample_data = [19,18,21,20,19,22,18,17,21,20]

t_statistic, p_value = ttest_1samp(sample_data, pm)



In [11]:
print( t_statistic)


-5.0


In [12]:
print("p-value:", p_value)

p-value: 0.0007389679098032424


NOTE:- Rejected Null Hypothesis

#### **Two-Sample t-test**

A two-sample t-test is often used to determine if there is a significant difference between the means of two groups.

A researcher wants to evaluate whether there is a significant difference in the average test scores of students who used two different study methods (Method A and Method B) for an exam. The researcher randomly selects two independent groups of students: one group uses Method A, and the other uses Method B. The test scores for each group are recorded as follows:

- Method A Scores: [78, 84, 92, 88, 75, 80, 85, 90, 87, 79]
- Method B Scores: [82, 88, 75, 90, 78, 85, 88, 77, 92, 80]

Ho = there is a significant difference in the average test scores of students who used two different study methods (Method A and Method B) for an exam

H1 = there is no a significant difference in the average test scores of students who used two different study methods (Method A and Method B) for an exam

In [19]:
import numpy as np
from scipy import stats

# Sample data for Method A and Method B
Method_A_Scores = [78, 84, 92, 88, 75, 80, 85, 90, 87, 79]
Method_B_Scores = [82, 88, 75, 90, 78, 85, 88, 77, 92, 80]

# Step 1: Perform Shapiro-Wilk test for normality
shapiro_a = stats.shapiro(Method_A_Scores)
shapiro_b = stats.shapiro(Method_B_Scores )

print(shapiro_a)
print(shapiro_b)


ShapiroResult(statistic=np.float64(0.9634209538811345), pvalue=np.float64(0.8240632802317183))
ShapiroResult(statistic=np.float64(0.9429848795756282), pvalue=np.float64(0.5866793733496176))


In [20]:
levene_test = stats.levene(Method_A_Scores, Method_B_Scores)
print(levene_test.pvalue)

0.6860370886859155


In [21]:
t_statistic, p_value = stats.ttest_ind(Method_A_Scores, Method_B_Scores)
print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 0.11617981913799512
p-value: 0.9087964141018375


note:- HERE WE CANT REJECT HO AS PER P VALUE


**Problem Statement:** A researcher wants to investigate whether different diets lead to different weight loss outcomes. Three different diet plans (Diet A, Diet B, and Diet C) are tested on groups of participants, and the weight loss (in kilograms) is recorded after a month. The researcher wants to determine if there is a significant difference in mean weight loss among the three diet plans.

**Hypotheses:**
- Null Hypothesis ($H_0$): There is no difference in the mean weight loss among the three diet plans. ($\mu_A = \mu_B = \mu_C$)
- Alternative Hypothesis ($H_1$): At least one diet plan has a different mean weight loss compared to the others.

In [24]:
import numpy as np
from scipy.stats import f_oneway, levene

In [25]:
# Sample data: weight loss for each diet plan
diet_A_loss = np.random.normal(loc=5, scale=1.5, size=30)  # Mean 5 kg, std dev 1.5
diet_B_loss = np.random.normal(loc=6, scale=1.5, size=30)  # Mean 6 kg, std dev 1.5
diet_C_loss = np.random.normal(loc=4.5, scale=1.5, size=30)  # Mean 4.5 kg, std dev 1.5

In [26]:
#step 1: shapiro wilk test to perform nomality
shapiro_a = stats.shapiro(diet_A_loss)
shapiro_b = stats.shapiro(diet_B_loss)
shapiro_c = stats.shapiro(diet_C_loss)

In [27]:
print(shapiro_a)
print(shapiro_b)
print(shapiro_c)

ShapiroResult(statistic=np.float64(0.984762641470921), pvalue=np.float64(0.9330790727509193))
ShapiroResult(statistic=np.float64(0.9477378872478013), pvalue=np.float64(0.14700179115452328))
ShapiroResult(statistic=np.float64(0.9636441423559844), pvalue=np.float64(0.38248066209643683))


In [29]:
levene_test = stats.levene(diet_A_loss, diet_B_loss, diet_C_loss)
print(levene_test.pvalue) ##important cosideration point, so we have the same variances here because leven p values >0.05

0.09627059203395172


In [30]:
anova_statistic, p_value = f_oneway(diet_A_loss, diet_B_loss, diet_C_loss)
print("ANOVA Statistic:", anova_statistic)
print("p-value:", p_value)

ANOVA Statistic: 21.696161937431903
p-value: 2.26790250663781e-08
