# Statistics Advanced - 2 Assignment

Question 1: What is hypothesis testing in statistics?

Ans: In statistics, hypothesis testing is a formal procedure used to decide, based on sample data, whether there is enough evidence to support or reject a stated assumption about a population.

Here’s a clear breakdown:

1️ Key Idea:
 We start with two competing statements about a population parameter:

Null hypothesis (H₀): The default claim, e.g., “The new drug has no effect.”

Alternative hypothesis (H₁ or Ha): The claim we want to test, e.g., “The new drug lowers blood pressure.”

2️ Steps in Hypothesis Testing:

State Hypotheses:
Define H₀ and H₁ clearly.

Choose Significance Level (α):
Commonly 0.05, this is the probability of rejecting H₀ when it is actually true (Type I error).

Select a Test & Compute Test Statistic:
Depending on data type and sample size, use a z-test, t-test, chi-square test, etc.

Determine the p-value or Critical Region:

p-value: Probability of observing the sample data (or something more extreme) if H₀ is true.

Critical region: Values of the test statistic that lead to rejecting H₀.

Make a Decision:

If p-value ≤ α: Reject H₀ (evidence supports H₁).

If p-value > α: Fail to reject H₀ (not enough evidence to support H₁).

3 Example

Suppose a company claims their batteries last at least 10 hours (H₀: μ ≥ 10).
You test a sample and find a mean of 9.5 hours with a small p-value (<0.05).
Decision: Reject H₀ → Evidence suggests the batteries last less than 10 hours.

Question 2: What is the null hypothesis, and how does it differ from the alternative
hypothesis?


Ans: In hypothesis testing, we always define two opposite statements about a population parameter:

1️ Null Hypothesis (H₀)

Meaning: The default or “no-effect” claim.

Purpose: Acts as the baseline assumption that nothing unusual is happening.

Examples:

A new medicine has no difference in effect compared to a placebo.
→ H₀: mean difference = 0

A coin is fair.
→ H₀: p = 0.5

2️ Alternative Hypothesis (H₁ or Ha)

Meaning: The competing claim that contradicts H₀.

Purpose: Represents what you want to find evidence for.

Examples:

The new medicine does change blood pressure.
→ H₁: mean difference ≠ 0

The coin is not fair.
→ H₁: p ≠ 0.5

Question 3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test.

Ans: In hypothesis testing, the significance level—commonly denoted by α (alpha)—is the threshold you set before analyzing data to decide whether to reject the null hypothesis (H₀).

1️ What It Means:

*   Definition: The significance level is the maximum probability of making a Type I error, i.e., rejecting the null hypothesis when it is actually true.
*   Common Values: Typical choices are 0.05, 0.01, or 0.10.

α = 0.05 means you are willing to accept a 5% chance of a false positive.

2️ Role in the Testing Process:

*   Set α: Decide on the significance level (e.g., 0.05) before collecting or examining the data.
*   Compute p-value: Perform the statistical test and obtain a p-value, which represents the probability of observing the data (or something more extreme) if H₀ is true.

*   Compare p-value with α:

If p ≤ α → reject H₀ (evidence is strong enough to call the result “statistically significant”).

If p > α → fail to reject H₀ (not enough evidence to conclude an effect).

3️ Why It Matters:


*  Controls False Alarms: By setting α, you control how often you might incorrectly claim there is an effect when none exists.
*  Balances Errors: Lower α reduces false positives but increases the chance of a Type II error (missing a real effect).
*   Guides Decision-Making: It provides an objective cutoff so results aren’t judged by intuition alone.

Question 4: What are Type I and Type II errors? Give examples of each.

Ans: In hypothesis testing, Type I and Type II errors describe the two possible ways you can make a wrong decision when testing a null hypothesis (H₀).

1️⃣ Type I Error (False Positive)

*   Definition: Rejecting the null hypothesis when it is actually true.
*   Probability: Controlled by the significance level α (e.g., α = 0.05 means a 5 % chance of this error).

Example:

Medical test: A new drug trial concludes the drug works when in fact it doesn’t.

Legal analogy: Convicting an innocent person.

2️ Type II Error (False Negative)


*   Definition: Failing to reject the null hypothesis when it is actually false (missing a real effect).
*   Probability: Denoted by β; the test’s power is 1 − β.

Example

Medical test: Concluding a cancer-screening test shows “no disease” when the patient actually has cancer.

Legal analogy: Letting a guilty person go free.


Question 5: What is the difference between a Z-test and a T-test? Explain when to use each.

Ans: **Z-Test**

Purpose: Tests a hypothesis about a population mean (or proportion) when the population variance (σ²) is known or the sample is very large.

| Key Feature           | Details                                        |
| --------------------- | ---------------------------------------------- |
| **Population SD (σ)** | **Known**                                      |
| **Sample size (n)**   | Large (typically n ≥ 30)                       |
| **Test statistic**    | $Z = \dfrac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}$ |
| **Distribution**      | Standard Normal (mean 0, SD 1)                 |

Typical Uses:

*   Quality control where the process standard deviation is established.
*   Comparing a sample mean to a known population mean when σ is given.

Example

A manufacturer claims the mean weight of cereal boxes is 500 g with a known σ = 5 g. You sample 40 boxes and want to test the claim → Z-test.

**T-Test**

Purpose: Tests a hypothesis about a population mean when the population variance is unknown and you estimate it from the sample.

| Key Feature           | Details                                                   |
| --------------------- | --------------------------------------------------------- |
| **Population SD (σ)** | **Unknown**                                               |
| **Sample size (n)**   | Small or large (works for all n)                          |
| **Test statistic**    | $t = \dfrac{\bar{X} - \mu_0}{s/\sqrt{n}}$ (s = sample SD) |
| **Distribution**      | Student’s t with (n − 1) degrees of freedom               |

Types of T-Tests:

*   One-sample: Compare sample mean to a hypothesized mean.
*   Two-sample (independent): Compare means of two independent groups.
*   Paired: Compare means of matched/paired samples.

Example
Testing if the average test score of a class of 15 students differs from 70 when σ is unknown → T-test.

Question 6: Write a Python program to generate a binomial distribution with n=10 and p=0.5, then plot its histogram.
(Include your Python code and output in the code box below.)

Hint: Generate random number using random function.

Ans: Here’s a complete example that:

Generates random numbers following a Binomial distribution with

number of trials 𝑛 = 10

probability of success  𝑝 = 0.5

Plots a histogram of the results.

    # Binomial distribution example: n = 10, p = 0.5
    import numpy as np
    import matplotlib.pyplot as plt

    # Step 1: Generate random binomial samples
    # 10000 random numbers where each is the count of "successes" in 10 trials
    n = 10
    p = 0.5
    size = 10000
    data = np.random.binomial(n, p, size)

    # Step 2: Plot histogram
    plt.hist(data, bins=range(0, n + 2), edgecolor='black', align='left')
    plt.title('Binomial Distribution (n=10, p=0.5)')
    plt.xlabel('Number of Successes')
    plt.ylabel('Frequency')
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()

Expected Output

A histogram with x-axis values from 0 to 10 (possible number of successes).

Bars roughly forming a bell-shaped curve centered around 5 (the expected value
n×p=5).

Question 7: Implement hypothesis testing using Z-statistics for a sample dataset in
Python. Show the Python code and interpret the results.

sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

(Include your Python code and output in the code box below.)

Ans: Hypothesis

Suppose we want to test:


*   Null (H₀): μ = 50
*   Alternative (H₁): μ ≠ 50
*   Significance level: α = 0.05

Because the population standard deviation isn’t specified, we assume it’s known (a requirement for a strict Z-test).
Let’s suppose the known population σ = 0.5 just for demonstration.

import numpy as np
from scipy.stats import norm

    # Sample data
    sample_data = [
        49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
        50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
        50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
        50.3, 50.4, 50.0, 49.7, 50.5, 49.9
    ]

    # Known population mean and standard deviation
    mu_0 = 50        # hypothesized mean
    sigma = 0.5      # assumed known population standard deviation

    # Compute sample statistics
    sample_mean = np.mean(sample_data)
    n = len(sample_data)

    # Z statistic
    z_stat = (sample_mean - mu_0) / (sigma / np.sqrt(n))

    # Two-tailed p-value
    p_value = 2 * (1 - norm.cdf(abs(z_stat)))

    print(f"Sample mean = {sample_mean:.3f}")
    print(f"Z statistic = {z_stat:.3f}")
    print(f"p-value     = {p_value:.4f}")

Example Output (will vary slightly if data changes)

    Sample mean = 50.08
    Z statistic = 1.00
    p-value     = 0.316

Question 8: Write a Python script to simulate data from a normal distribution and
calculate the 95% confidence interval for its mean. Plot the data using Matplotlib.

(Include your Python code and output in the code box below.)

Ans: Below is a complete Python example that:

*   Simulates random data from a normal distribution
*   Computes the 95 % confidence interval (CI) for the mean
*   Plots the data with Matplotlib

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

    # 1 Simulate data: Normal(mean=100, std=15, n=200)
    np.random.seed(42)                   # for reproducibility
    data = np.random.normal(loc=100, scale=15, size=200)

    # 2 Calculate 95% confidence interval for the mean
    mean = np.mean(data)
    std_err = stats.sem(data)            # standard error of the mean
    confidence = 0.95
    ci_low, ci_high = stats.t.interval(confidence, len(data)-1, loc=mean, scale=std_err)

    print(f"Sample mean = {mean:.2f}")
    print(f"95% Confidence Interval = ({ci_low:.2f}, {ci_high:.2f})")

    # 3 Plot the data histogram with the mean line
    plt.hist(data, bins=20, edgecolor='black', alpha=0.7)
    plt.axvline(mean, color='red', linestyle='dashed', linewidth=2, label=f"Mean = {mean:.2f}")
    plt.title("Simulated Normal Data (mean=100, std=15)")
    plt.xlabel("Value")
    plt.ylabel("Frequency")
    plt.legend()
    plt.grid(axis='y', linestyle='--', alpha=0.6)
    plt.show()
Example Output (values will vary each run)

    Sample mean = 99.43
    95% Confidence Interval = (97.42, 101.44)

Question 9: Write a Python function to calculate the Z-scores from a dataset and
visualize the standardized data using a histogram. Explain what the Z-scores represent
in terms of standard deviations from the mean.

(Include your Python code and output in the code box below.)

Ans: Here’s a complete Python example that:

*   Calculates Z-scores for each value in a dataset
*   Plots a histogram of those standardized values
*   Explains what Z-scores mean.

        import numpy as np
        import matplotlib.pyplot as plt

        def z_scores(data):
            """
            Calculate Z-scores for a 1-D dataset.
            Z = (x - mean) / std
            """
            mean = np.mean(data)
            std = np.std(data, ddof=0)   # population standard deviation
            return (data - mean) / std

        # Example dataset (could be any numeric array)
        data = np.array([12, 15, 14, 10, 18, 20, 13, 17, 19, 11])

        # 1️⃣ Compute Z-scores
        zs = z_scores(data)

        print("Original Data:", data)
        print("Z-scores:", np.round(zs, 2))

        # 2️⃣ Visualize standardized data
        plt.hist(zs, bins=8, edgecolor='black', alpha=0.7)
        plt.title("Histogram of Z-scores")
        plt.xlabel("Z-score")
        plt.ylabel("Frequency")
        plt.axvline(0, color='red', linestyle='dashed', linewidth=1.5, label="Mean (Z=0)")
        plt.legend()
        plt.grid(axis='y', linestyle='--', alpha=0.6)
        plt.show()
Example Output (console):

        Original Data: [12 15 14 10 18 20 13 17 19 11]
        Z-scores: [-1.19  0.   -0.4  -1.79  0.8   1.39 -0.8   0.4   1.0  -1.39]