# Statistics Advanced - 2 (Assignment)

Questions Q1–Q9 with answers and runnable code.

### Q1. What is hypothesis testing in statistics?

**Answer:** Hypothesis testing is a formal procedure to evaluate claims about a population parameter using sample data. You set up a null hypothesis (H0) and an alternative (H1), compute a test statistic from the sample, and use its sampling distribution to decide whether to reject H0.

### Q2. What is the null hypothesis, and how does it differ from the alternative hypothesis?

**Answer:** The **null hypothesis (H0)** is a default statement that there is no effect or no difference (e.g., μ = μ0). The **alternative hypothesis (H1)** is what you want to test for (e.g., μ ≠ μ0, μ > μ0, or μ < μ0).

### Q3. Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.

**Answer:** The significance level (α) is the threshold probability for rejecting H0 when it is actually true (Type I error). Common choices are 0.05 or 0.01. If the p-value ≤ α, you reject H0; otherwise you fail to reject H0.

### Q4. What are Type I and Type II errors? Give examples of each.

**Answer:** - **Type I error (α):** Rejecting H0 when it is true. Example: concluding a drug works when it doesn't.
- **Type II error (β):** Failing to reject H0 when it is false. Example: concluding a drug doesn't work when it does.

### Q5. What is the difference between a Z-test and a T-test? Explain when to use each.

**Answer:** A **Z-test** is used when the population standard deviation is known (or sample size is large, n > 30) and the sampling distribution is approximately normal. A **T-test** is used when the population standard deviation is unknown and the sample is small; it uses the t-distribution with n-1 degrees of freedom.

### Q6. Generate a binomial distribution with n=10 and p=0.5, then plot its histogram.

**Code and output:**

In [None]:
# Q6. Generate a binomial distribution with n=10 and p=0.5, then plot its histogram.
import numpy as np
import matplotlib.pyplot as plt

n = 10
p = 0.5
size = 1000  # number of experiments
samples = np.random.binomial(n, p, size=size)

# Print basic stats
print("Sample size:", size)
print("Unique outcomes (0..10):", np.unique(samples))
print("Mean (empirical):", samples.mean())
print("Variance (empirical):", samples.var())

# Plot histogram
plt.figure(figsize=(8,4))
plt.hist(samples, bins=range(n+2), align='left', edgecolor='black')
plt.xticks(range(n+1))
plt.title(f'Binomial(n={n}, p={p}) - {size} samples')
plt.xlabel('Number of successes')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
plt.show()

### Q7. Implement hypothesis testing using Z-statistics for a sample dataset.

**Test:** H0: μ = 50 vs H1: μ ≠ 50

**Code and output:**

In [None]:
# Q7. Implement hypothesis testing using Z-statistics for a sample dataset.
# Provided sample_data - we will test H0: mu = 50 versus H1: mu != 50
import numpy as np
from math import sqrt
from scipy import stats

sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

x = np.array(sample_data)
n = len(x)
xbar = x.mean()
s = x.std(ddof=1)  # sample standard deviation

# For a Z-test we either need population sigma or use large-sample approximation.
# Here n is reasonably large; we will use sample std as an estimate and perform z-approximation.
mu0 = 50.0  # null hypothesis mean
se = s / np.sqrt(n)
z_stat = (xbar - mu0) / se

# two-sided p-value using standard normal
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"n = {n}, sample mean = {xbar:.4f}, sample sd = {s:.4f}")
print(f"Z-statistic = {z_stat:.4f}, two-sided p-value = {p_value:.4f}")

alpha = 0.05
if p_value < alpha:
    print(f"Reject H0 at alpha={alpha}: sample provides evidence that mean != {mu0}.")
else:
    print(f"Fail to reject H0 at alpha={alpha}: no strong evidence that mean != {mu0}.")

### Q8. Simulate data from a normal distribution and calculate the 95% confidence interval for its mean.

**Code and output:**

In [None]:
# Q8. Simulate data from a normal distribution and calculate a 95% confidence interval for the mean.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)
mu = 100  # true mean
sigma = 15  # true std dev
n = 50
data = np.random.normal(mu, sigma, size=n)

# Sample statistics
xbar = data.mean()
s = data.std(ddof=1)

# 95% CI using t-distribution (since sigma typically unknown)
conf_level = 0.95
df = n - 1
t_crit = stats.t.ppf((1 + conf_level) / 2, df)
se = s / np.sqrt(n)
ci_lower = xbar - t_crit * se
ci_upper = xbar + t_crit * se

print(f"Sample mean = {xbar:.4f}, sample sd = {s:.4f}, n = {n}")
print(f"95% CI for mean: ({ci_lower:.4f}, {ci_upper:.4f})")

# Plot data histogram with mean and CI lines
plt.figure(figsize=(8,4))
plt.hist(data, bins=12, edgecolor='black', alpha=0.7)
plt.axvline(xbar, color='red', linestyle='--', label=f"Mean = {xbar:.2f}")
plt.axvline(ci_lower, color='green', linestyle=':', label=f"95% CI lower = {ci_lower:.2f}")
plt.axvline(ci_upper, color='green', linestyle=':', label=f"95% CI upper = {ci_upper:.2f}")
plt.title("Simulated Normal Data and 95% CI for Mean")
plt.legend()
plt.grid(alpha=0.3)
plt.show()

### Q9. Write a Python function to calculate the Z-scores from a dataset and visualize the standardized data using a histogram.

**Code and output:**

In [None]:
# Q9. Function to calculate Z-scores and visualize standardized data
import numpy as np
import matplotlib.pyplot as plt

def plot_z_scores(data):
    x = np.array(data)
    mean = x.mean()
    std = x.std(ddof=1)
    z_scores = (x - mean) / std
    print(f"Mean = {mean:.4f}, SD = {std:.4f}")
    print("First 10 z-scores:", np.round(z_scores[:10], 3))
    plt.figure(figsize=(8,4))
    plt.hist(z_scores, bins=12, edgecolor='black', alpha=0.7)
    plt.axvline(0, color='red', linestyle='--', label='Mean (z=0)')
    plt.title('Histogram of Z-scores (standardized data)')
    plt.xlabel('Z-score')
    plt.ylabel('Frequency')
    plt.legend()
    plt.grid(alpha=0.3)
    plt.show()
    return z_scores

# Example usage with simulated data
data_example = np.random.normal(50, 5, size=100)
z = plot_z_scores(data_example)

# Explanation:
explanation = """Z-scores indicate how many standard deviations an observation is from the mean.
A z-score of 0 means the value equals the mean; z = 1 means one standard deviation above the mean; z = -2 means two standard deviations below the mean."""
print(explanation)