# Statistics Advanced – 2 (DA-AG-007)
**Total Marks: 100**

Below are Q&A and runnable Python code cells for Questions 6–9.

### Q1: What is hypothesis testing in statistics?

Hypothesis testing is a statistical method used to make decisions about population parameters based on sample data. It sets up two competing statements (null and alternative hypotheses) and uses sample evidence to decide which is more plausible.

### Q2: What is the null hypothesis, and how does it differ from the alternative hypothesis?

- **Null hypothesis (H₀):** A statement of no effect or no difference; observed variation is attributed to chance.
- **Alternative hypothesis (H₁):** A statement that contradicts H₀, indicating a genuine effect or difference.

### Q3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.

The **significance level (α)** is the decision threshold (commonly 0.05). If the p-value ≤ α, reject H₀; otherwise, fail to reject H₀. It represents the probability of making a Type I error.

### Q4: What are Type I and Type II errors? Give examples of each.

- **Type I error (False Positive):** Rejecting H₀ when it is true. Example: Concluding a drug works when it actually does not.
- **Type II error (False Negative):** Failing to reject H₀ when H₁ is true. Example: Concluding a drug does not work when it actually does.

### Q5: What is the difference between a Z-test and a T-test? Explain when to use each.

- **Z-test:** Use when population variance is known or sample size is large (n ≥ 30) and normality holds.
- **T-test:** Use when population variance is unknown and/or sample size is small (n < 30); relies on the t-distribution.

### Q6: Generate a binomial distribution (n=10, p=0.5) and plot its histogram

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate binomial distribution
data = np.random.binomial(n=10, p=0.5, size=1000)

# Plot histogram (no specific colors per instruction)
plt.hist(data, bins=11, edgecolor='black')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()

### Q7: Implement hypothesis testing using Z-statistics for a sample dataset

In [None]:
import numpy as np
import scipy.stats as st

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Hypothesized population mean
mu = 50

mean_sample = np.mean(sample_data)
std_sample = np.std(sample_data, ddof=1)
n = len(sample_data)

# Z-statistic (approximate, since population std unknown but n is moderate)
z_stat = (mean_sample - mu) / (std_sample / np.sqrt(n))
p_value = 2 * (1 - st.norm.cdf(abs(z_stat)))

print("Sample Mean:", round(mean_sample, 4))
print("Z-statistic:", round(z_stat, 4))
print("P-value:", round(p_value, 6))

if p_value < 0.05:
    print("Decision at α=0.05: Reject H0")
else:
    print("Decision at α=0.05: Fail to reject H0")

### Q8: Simulate normal data and calculate the 95% CI for the mean; plot histogram

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st

# Simulate normal data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)

# Mean and standard error
mean_val = np.mean(data)
sem = st.sem(data)

# 95% CI using t critical value (robust across SciPy versions)
df = len(data) - 1
t_crit = st.t.ppf(0.975, df)
margin = t_crit * sem
ci = (mean_val - margin, mean_val + margin)

print("Mean:", round(mean_val, 4))
print("95% Confidence Interval:", (round(ci[0], 4), round(ci[1], 4)))

# Plot histogram
plt.hist(data, bins=30, edgecolor='black', alpha=0.7)
plt.title("Simulated Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

### Q9: Calculate Z-scores and visualize standardized data

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def calculate_z_scores(data):
    data = np.asarray(data, dtype=float)
    mean = np.mean(data)
    std = np.std(data)
    # Avoid division by zero if std is zero
    if std == 0:
        return np.zeros_like(data)
    z_scores = (data - mean) / std
    return z_scores

# Example dataset
data = [10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
z_scores = calculate_z_scores(data)

print("Z-scores:", np.round(z_scores, 4))

# Plot histogram
plt.hist(z_scores, bins=10, edgecolor='black', alpha=0.7)
plt.title("Z-score Standardized Data")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()