**THEORY QUESTION**

**Question 1: What is hypothesis testing in statistics?**

**Answer**

Hypothesis testing in statistics is a method to make decisions or draw conclusions about a population based on sample data.

In simple words: It helps us check if a claim (assumption) about a population is true or not using data.

Steps of Hypothesis Testing:

State hypotheses

Null hypothesis (H₀): Assumes no effect or no difference. (e.g., "The average height of students = 165 cm")

Alternative hypothesis (H₁): Assumes there is an effect or a difference. (e.g., "The average height ≠ 165 cm")

Choose significance level (α):

Commonly 0.05 (5% chance of error).

Collect data & calculate test statistic:

e.g., Z-test, t-test, chi-square, depending on data type.

Find p-value or critical value:

p-value: Probability of getting the observed result if H₀ is true.

Make decision:

If p ≤ α, reject H₀ → evidence supports H₁.

If p > α, fail to reject H₀ → not enough evidence against H₀.

Example:

Suppose a company claims the average battery life of its phone is 10 hours. You test 30 phones and find the average is 9.5 hours.

H₀: Average = 10 hours

H₁: Average ≠ 10 hours

Run a test → if p-value < 0.05 → Reject H₀ (claim is false).




**Question 2: What is the null hypothesis, and how does it differ from the alternative hypothesis?**

**Answer**

**Null Hypothesis (H₀)**

The null hypothesis is the default belief. It says there is no effect, no difference, or no change. Example: “The average exam score of students is 70.”

**Alternative Hypothesis (H₁)**

The alternative hypothesis is the opposite of H₀. It says there is an effect, a difference, or a change. Example: “The average exam score of students is not 70.”

**Main Difference**

H₀ (null): Assumes nothing unusual is happening.

H₁ (alternative): Suggests something new or different is happening.

Simple Analogy

Imagine a court case:

H₀: The person is innocent.

H₁: The person is guilty. We always start by assuming innocence (H₀) and only reject it if evidence strongly supports guilt (H₁).

**Question 3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.**

**Answer**

Significance Level (α)

The significance level (denoted by α) is the threshold for deciding whether to reject the null hypothesis (H₀).

It represents the probability of making a Type I error → rejecting H₀ when it is actually true.

Common values: 0.05 (5%), 0.01 (1%), 0.10 (10%).

Role in Hypothesis Testing

Before testing, we set α (say 0.05).

After calculations, we get a p-value from the data.

Decision:

If p ≤ α, reject H₀ → evidence supports H₁.

If p > α, fail to reject H₀ → not enough evidence against H₀.

Example

Suppose a company claims its bulbs last 1000 hours. You test samples.

Null hypothesis: Average life = 1000 hours

Alternative: Average life ≠ 1000 hours

You choose α = 0.05 (5% risk of being wrong).

If the p-value = 0.03 → since 0.03 < 0.05, reject H₀ → the claim is likely false.

**Question 4: What are Type I and Type II errors? Give examples of each.**

**Answer**

**Type I Error (False Positive)**

Happens when we reject the null hypothesis (H₀) even though it is actually true.

In simple words: We think there is an effect or difference, but in reality, there isn’t.

Probability of making a Type I error = α (significance level).

Example: A medical test says a healthy person has a disease.

H₀: Person is healthy.

H₁: Person is sick. Rejecting H₀ wrongly → Healthy person is labeled sick.

**Type II Error (False Negative)**

Happens when we fail to reject the null hypothesis (H₀) even though it is false.

In simple words: We fail to detect a real effect or difference.

Probability of making a Type II error = β.

Example: A medical test fails to detect a disease in a sick person.

H₀: Person is healthy.

H₁: Person is sick. Not rejecting H₀ wrongly → Sick person is labeled healthy.

**Question 5: What is the difference between a Z-test and a T-test? Explain when to use each.**

**Answer**

**Z-test**

Used when the sample size is large (n > 30).

Population variance (σ²) or standard deviation (σ) is known.

Assumes data follows a normal distribution.

Example: Checking if the average height of 1000 students = 165 cm, when the population standard deviation is known.

**T-test**

Used when the sample size is small (n ≤ 30).

Population variance (σ²) is unknown → we use the sample standard deviation (s) instead.

Data should be approximately normally distributed.

Example: Checking if the average marks of a class of 25 students = 70, when population variance is unknown.

**Key Difference**

Z-test: Big sample, σ known.

T-test: Small sample, σ unknown.

**Question 6: Write a Python program to generate a binomial distribution with n=10 and p=0.5, then plot its histogram.**

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10      # number of trials
p = 0.5     # probability of success
size = 1000 # number of samples

# Generate binomial distribution
data = np.random.binomial(n, p, size)

# Plot histogram
plt.hist(data, bins=range(n+2), edgecolor='black', align='left')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of successes")
plt.ylabel("Frequency")
plt.show()


**Question 7: Implement hypothesis testing using Z-statistics for a sample dataset in Python. Show the Python code and interpret the results.**


**sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5, 50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9, 50.3, 50.4, 50.0, 49.7, 50.5, 49.9]**



**Answer**

   Interpretation of Results (if you run it)

The code will calculate:

Sample mean (should be close to 50).

Z-statistic (how many standard errors away from 50).

p-value (probability of getting this result if H₀ is true).

If p-value < 0.05 → reject H₀ (mean ≠ 50).

If p-value ≥ 0.05 → fail to reject H₀ (mean is likely 50).

In [None]:
import numpy as np
from scipy.stats import norm

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Hypothesized population mean
mu = 50

# Calculate sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)  # sample standard deviation
n = len(sample_data)

# Z-test statistic
z = (sample_mean - mu) / (sample_std / np.sqrt(n))

# Two-tailed test: calculate p-value
p_value = 2 * (1 - norm.cdf(abs(z)))

print("Sample Mean:", sample_mean)
print("Z-statistic:", z)
print("p-value:", p_value)

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject H₀: Evidence suggests the mean is not 50.")
else:
    print("Fail to Reject H₀: Not enough evidence against mean = 50.")


**Question 8: Write a Python script to simulate data from a normal distribution and calculate the 95% confidence interval for its mean. Plot the data using Matplotlib.**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Generate normal data (mean=50, std=5, sample size=100)
data = np.random.normal(50, 5, 100)

# Calculate mean
mean = np.mean(data)

# Calculate 95% confidence interval
ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=stats.sem(data))

print("Sample Mean:", mean)
print("95% Confidence Interval:", ci)

# Plot histogram
plt.hist(data, bins=10, edgecolor='black')
plt.axvline(mean, color='red', linestyle='dashed', label=f"Mean = {mean:.2f}")
plt.axvline(ci[0], color='green', linestyle='dashed', label=f"Lower CI = {ci[0]:.2f}")
plt.axvline(ci[1], color='green', linestyle='dashed', label=f"Upper CI = {ci[1]:.2f}")
plt.legend()
plt.show()


**Question 9: Write a Python function to calculate the Z-scores from a dataset and visualize the standardized data using a histogram. Explain what the Z-scores represent in terms of standard deviations from the mean.**


**Answer**


A Z-score tells us how many standard deviations a data point is away from the mean.

Z = 0 → the data point = mean

Z = +1 → 1 standard deviation above the mean

Z = -2 → 2 standard deviations below the mean

This is useful to standardize data so we can compare values on the same scale.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Function to calculate Z-scores
def calculate_z_scores(data):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = (data - mean) / std
    return z_scores

# Example dataset
data = np.array([50, 52, 47, 49, 51, 53, 48, 50, 52, 49])

# Calculate Z-scores
z_scores = calculate_z_scores(data)

print("Original Data:", data)
print("Z-scores:", z_scores)

# Plot histogram of Z-scores
plt.hist(z_scores, bins=5, edgecolor='black')
plt.title("Histogram of Z-scores")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.axvline(0, color='red', linestyle='dashed', label="Mean (Z=0)")
plt.legend()
plt.show()
