In [2]:
import numpy as np
import scipy.stats as stats


# Z Test

Youtube Link: https://www.youtube.com/watch?v=7234-lDesHI&t=318s

A Z-test is a statistical hypothesis test used to determine whether the means of two data sets are significantly different from each other when the sample sizes are large and the population standard deviation is known. It is a parametric test commonly used for comparing a sample mean to a known population mean or comparing the means of two independent samples.

Here are the key components and steps involved in a Z-test: <br/>
**Null Hypothesis (H0):** The null hypothesis in a Z-test typically states that there is no significant difference between the means of the two groups being compared.

**Alternative Hypothesis (Ha):** The alternative hypothesis, also known as the research hypothesis, contradicts the null hypothesis and suggests that there is a significant difference between the means of the two groups.

**Formula:** Z = (X̄ - μ) / (σ / √n)

### Problem

A Factory manufactures blubs with a average warenty of 5 Years with standard deviation 0.50. A workers belive that the blub will manufacture is less than 5 years. He test a sample of 40 blubs and find the average time to be 4.8 years.

a) State Null and Alternate Hypothesis. <br/>
b) At a 2% significance level, is there enough evidence to support the idea that warenty should be revised?

Null Hypothesis: Average Warenty is 5 Years <br/>
Alternative Hypothesis: Average Warenty is less than 5 years

In [3]:
# Value 
population_mean = 5
population_std = 0.50
sample_mean = 4.8
sample_size = 40

# Define the significance level (alpha)
alpha = 0.02

# Find the Z-table value (critical value) for the given alpha
z_critical = stats.norm.ppf(1- alpha/2)
print("Z Critical:", round(z_critical, 2))

z_scores = (sample_mean - population_mean) / (population_std/ np.sqrt(sample_size))
print("Z Scores:", z_scores)
lower_z_critical = -z_scores
upper_z_critical = z_scores

if z_scores < lower_z_critical or z_scores > upper_z_critical :
    print("We Reject the Null Hypothesis")
else:
    print("We Fail to reject the Null Hypothesis")

Z Critical: 2.33
Z Scores: -2.5298221281347057
We Reject the Null Hypothesis


# T Test

A t-test is a statistical hypothesis test used to determine whether there is a significant difference between the means of two groups. It is commonly used when you have a sample from each of the two groups and you want to infer whether any observed differences between the sample means are likely to reflect true differences in the population means or if they could have occurred due to random sampling variation.

1. The single-sample t-test compares one sample to a known or hypothesized population mean. <br/>
2. The independent samples t-test compares the means of two independent groups.<br/>
3. The paired samples t-test compares the means of two related or paired samples, considering the differences between them.

## One Sample TTest

Purpose: Used to determine whether the mean of a single sample is significantly different from a known or hypothesized population mean. <br/>
Scenario: You have one sample, and you want to test if its mean is significantly different from a specific value (the population mean).<br/>
Assumption: Assumes that the sample data follows a normal distribution, and the population standard deviation is usually unknown.<br/>
Formula: Uses the t-statistic formula: t = (X̄ - μ) / (s / √n), where X̄ is the sample mean, μ is the population mean, s is the sample standard deviation, and n is the sample size.<br/>

Certainly! Here's a single-sample t-test problem:

**Problem:** A coffee shop claims that the average caffeine content in their regular coffee is 100 milligrams per 8-ounce cup. To test this claim, a coffee enthusiast randomly selects 12 cups of regular coffee from the shop and measures the caffeine content in each cup (in milligrams). The data is as follows:

Caffeine Content (in milligrams):
[105, 98, 102, 100, 99, 97, 101, 103, 96, 104, 100, 98]

Using a significance level of 0.05, test whether the coffee shop's claim that the average caffeine content is 100 milligrams per cup holds true.


In [29]:
# Solve this above problem
# Data
caffeine_content = np.array([105, 98, 102, 100, 99, 97, 101, 103, 96, 104, 100, 98])

# Hypotheses
mu = 100  # Critical avarage caffine content
alpha = 0.05 

# Calculate the sample statistics
sample_mean = np.mean(caffeine_content)
sample_std = np.std(caffeine_content, ddof=1) #ddof=1 Sample Standard
sample_size = len(caffeine_content)

# Perform the single-sample t-test
t_statistic, p_value = stats.ttest_1samp(caffeine_content, mu)
print(f"Calculated t-statistic: {t_statistic:.2f}")
print(f"P-value: {p_value:.4f}")

# Degress of Freedom
degress_of_freedom = sample_size - 1

# Find the critical t-value for a two-tailed test
critical_t_value = stats.t.ppf(1- alpha/2, df= degress_of_freedom)
print("Critical T Value is:", critical_t_value)

# Make a decision based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in caffeine content.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in caffeine content.")

Calculated t-statistic: 0.31
P-value: 0.7655
Critical T Value is: 2.200985160082949
Fail to reject the null hypothesis: There is no significant difference in caffeine content.


## Independent Sample T Test

Youtube Link: https://www.youtube.com/watch?v=NkGvw18zlGQ

A two-sample independent t-test, also known as an independent samples t-test or a two-sample t-test, is a statistical hypothesis test used to determine whether there is a significant difference between the means of two independent groups or samples. In other words, it helps you assess whether the average values (means) of a particular variable are different between two distinct and unrelated groups.

Key characteristics and details of the two-sample independent t-test include:

Two Independent Groups: The test is appropriate when you have two separate groups, and you want to compare the means of a specific variable between these groups. The groups are considered independent because the individuals or items in one group are not related to or paired with those in the other group.

Hypotheses:

Null Hypothesis (H0): The population means of the two groups are equal (μ1 = μ2). <br />
Alternative Hypothesis (H1): The population means of the two groups are not equal (μ1 ≠ μ2) or have a specific directional difference (μ1 > μ2 or μ1 < μ2).

**Problem:** A research study is conducted to compare the performance of two different study techniques, Technique A and Technique B, in improving students' math scores. A sample of 20 students is randomly selected, and half of them are taught using Technique A, while the other half are taught using Technique B. After the study period, their math scores are recorded.

The scores for the two groups are as follows:

Technique A (Group 1):
[78, 82, 75, 88, 92, 80, 85, 89, 79, 87]

Technique B (Group 2):
[85, 88, 70, 79, 92, 84, 76, 90, 78, 86]

Using a significance level of 0.05, determine whether there is a significant difference in the mean math scores between the two study techniques.

In [19]:
# Solution
# Data for Technique A (Group 1 and Group 2)
group_1_score = np.array([78, 82, 75, 88, 92, 80, 85, 89, 79, 87])
group_2_score = np.array([85, 88, 70, 79, 92, 84, 76, 90, 78, 86])

# Hypotheses:
# H0: The population means of the two groups are equal (μ1 = μ2).
# H1: The population means of the two groups are not equal (μ1 ≠ μ2).

alpha = 0.5 # Significance Level

# Perform a two-sample independent t-test
t_statistics, p_value = stats.ttest_ind(group_1_score, group_2_score)

# Determine the degrees of freedom
degress_of_freedom = len(group_1_score) + len(group_2_score) - 2

# Find the critical t-value for a two-tailed test
critical_t_value = stats.t.ppf(1 - alpha/2, df=degress_of_freedom)

# Print the results
print("Group 1 Mean:", np.mean(group_1_score))
print("Group 2 Mean:", np.mean(group_2_score))
print("Calculated t-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)
print("P-value:", p_value)

# Make a decision based on the p-value
if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")


Group 1 Mean: 83.5
Group 2 Mean: 82.8
Calculated t-statistic: 0.3057522187384028
Critical t-value: 0.6883638064662002
P-value: 0.8049445093758416
Fail to reject the null hypothesis


## Paired Sample T Test

A paired sample t-test, also known as a dependent sample t-test or a matched-pairs t-test, is a statistical hypothesis test used to determine whether there is a significant difference between the means of two related or paired samples. In this type of t-test, the data is collected in pairs, and each pair represents a specific relationship or matching between the observations in the two samples.

Key characteristics and details of the paired sample t-test include:

Paired Samples: The test is appropriate when you have two sets of related or paired observations. These pairs could be measurements taken before and after an intervention, measurements on the same individuals or subjects under two different conditions, or any other situation where there is a natural pairing between the observations.

Hypotheses:

Null Hypothesis (H0): The population means of the two related samples are equal (μ1 = μ2). <br />
Alternative Hypothesis (H1): The population means of the two related samples are not equal (μ1 ≠ μ2) or have a specific directional difference (μ1 > μ2 or μ1 < μ2).

***Problem:*** A fitness trainer wants to determine whether a new exercise program is effective in increasing the endurance levels of their clients. To assess this, the trainer measures the endurance levels (in minutes) of 12 clients before they start the program and then again after they complete the program. The data for the endurance levels before and after the program is as follows:

Endurance Levels (in minutes):

Before Program: [15, 18, 12, 14, 17, 16, 20, 13, 15, 19, 14, 18]

After Program: [18, 21, 13, 16, 20, 19, 22, 15, 17, 21, 15, 20]

Using a significance level of 0.05, determine whether there is a significant difference in the mean endurance levels before and after the exercise program.



In [25]:
# Solution
# Data for endurance levels before and after the program
endurance_before = np.array([15, 18, 12, 14, 17, 16, 20, 13, 15, 19, 14, 18])
endurance_after = np.array([18, 21, 13, 16, 20, 19, 22, 15, 17, 21, 15, 20])

# Hypotheses:
# H0: The population means of endurance levels before and after the program are equal (μ1 = μ2).
# H1: The population means of endurance levels before and after the program are not equal (μ1 ≠ μ2).

alpha = 0.05

# Calculate the differences between paired observations
differences = endurance_after - endurance_before

# Perform a paired sample t-test
t_statistic, p_value = stats.ttest_rel(endurance_after, endurance_before)

# Determine the degrees of freedom
degrees_of_freedom = len(differences) - 1

# Find the critical t-value for a two-tailed test
critical_t_value = stats.t.ppf(1- alpha/2, df=degrees_of_freedom)

# Print the result
print("Mean of Differences (After - Before):", np.mean(differences))
print("Calculated t-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)
print("P-value:", p_value)

# Make a decision based on the p-value
if p_value < alpha:
    print("Reject Null Hypothesis")
else:
    print("Fail to Reject the Null Hypothesis")

Mean of Differences (After - Before): 2.1666666666666665
Calculated t-statistic: 10.457195665017968
Critical t-value: 2.200985160082949
P-value: 4.721070329086275e-07
Reject Null Hypothesis
