# **Statistics Advance - 2 Assignment Questions**





# Question1: Define the z-statistic and explain its relationship to the standard normal distribution. How is the z-statistic used in hypothesis testing

### **Question 1:**

### **Definition of the Z-Statistic:**

The **z-statistic**, also known as the **z-score**, is a standardized value that measures how many standard deviations a data point (or sample statistic) is from the population mean. It is calculated using the formula:

$$
z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}
$$

Where:

* $\bar{x}$ = sample mean
* $\mu$ = population mean
* $\sigma$ = population standard deviation
* $n$ = sample size

For a **single data point**, the formula is:

$$
z = \frac{x - \mu}{\sigma}
$$

---

### **Relationship to the Standard Normal Distribution:**

* The **standard normal distribution** is a special case of the normal distribution with a **mean of 0** and a **standard deviation of 1**.
* The **z-statistic transforms any normal distribution into the standard normal distribution** by rescaling it. This allows comparisons across different datasets.
* When data is standardized using the z-score, it can be interpreted using the standard normal distribution curve (bell-shaped curve).

---

### **Z-Statistic in Hypothesis Testing:**

The **z-statistic** plays a key role in **z-tests**, which are a type of hypothesis test used when:

* The population standard deviation ($\sigma$) is known.
* The sample size is large (typically $n \geq 30$).

#### **Steps in Hypothesis Testing using Z-statistic:**

1. **State the Hypotheses:**

   * Null hypothesis ($H_0$): Usually claims no effect or no difference.
   * Alternative hypothesis ($H_1$): Indicates the effect or difference you're testing for.

2. **Calculate the Z-statistic** using the formula.

3. **Determine the Critical Value(s):**

   * Based on the chosen significance level ($\alpha$), e.g., 0.05.
   * Use the **z-table** (standard normal table) to find the cutoff value.

4. **Make a Decision:**

   * If the computed z falls in the **rejection region** (e.g., $|z| > 1.96$ for a two-tailed test at 5% significance), **reject $H_0$**.
   * Otherwise, **fail to reject $H_0$**.

---

### **Example:**

Suppose you want to test if the average height of a population is 170 cm. You collect a sample of 50 people, and the sample mean height is 172 cm. The population standard deviation is 5 cm.

$$
z = \frac{172 - 170}{\frac{5}{\sqrt{50}}} \approx 2.83
$$

This z-score corresponds to a p-value less than 0.005, so you would **reject the null hypothesis** at the 0.05 level.

---

### **Summary:**

The **z-statistic** is a tool for comparing a sample to a known population using the **standard normal distribution**. In hypothesis testing, it helps determine whether an observed effect is statistically significant.


#  Question2 : What is a p-value, and how is it used in hypothesis testing? What does it mean if the p-value is very small (e.g., 0.01)?


### **What is a p-value?**

A **p-value** (probability value) is the **probability of observing a test statistic as extreme as, or more extreme than, the one observed**, **assuming that the null hypothesis is true**.

It quantifies how likely your sample data is under the assumption that the null hypothesis $(H_0)$ is correct.

---

### **How is the p-value used in hypothesis testing?**

In hypothesis testing, the **p-value** helps you decide whether to reject the null hypothesis $(H_0)$. The steps are:

1. **State your hypotheses:**

   * $H_0$: The null hypothesis (e.g., no effect, no difference).
   * $H_1$: The alternative hypothesis (e.g., there is an effect or difference).

2. **Determine the significance level ($\alpha$)**:

   * Common choices: 0.05, 0.01, or 0.10.
   * This is the maximum probability you're willing to accept for making a **Type I error** (rejecting a true $H_0$).

3. **Calculate the test statistic** (e.g., z, t, etc.) and find the **p-value**.

4. **Compare the p-value to $\alpha$**:

   * If **p-value ≤ α**: **Reject $H_0$** (evidence suggests the result is statistically significant).
   * If **p-value > α**: **Fail to reject $H_0$** (no sufficient evidence to claim a significant effect).

---

### **What does a small p-value (e.g., 0.01) mean?**

* A **small p-value** (like 0.01) indicates that the observed result is **highly unlikely** under the assumption that the null hypothesis is true.
* In practical terms:

  * There is **strong evidence against $H_0$**.
  * You would **reject the null hypothesis** at common significance levels (e.g., 0.05 or 0.01).

#### **Example Interpretation:**

If you get a p-value of 0.01:

* There’s only a **1% chance** of obtaining a result this extreme **if $H_0$ were true**.
* You conclude that the observed effect is **statistically significant**.

---

### **Summary:**

* The **p-value** measures the strength of evidence against the null hypothesis.
* A **very small p-value (e.g., 0.01)** means your sample result is unlikely to occur by random chance under $H_0$, so you **reject the null hypothesis**.
* It helps in deciding whether the result of a test is statistically significant.


# Question3: Compare and contrast the binomial and Bernoulli distributions.

### **Comparison of Binomial and Bernoulli Distributions**

The **Bernoulli** and **Binomial** distributions are both **discrete probability distributions** that model outcomes of **binary experiments** (i.e., experiments with two possible outcomes: success or failure). However, they differ in scale and application.

---

###  **Bernoulli Distribution**

#### **Definition:**

The **Bernoulli distribution** models the outcome of a **single trial** with two possible outcomes:

* **Success (1)** with probability $p$
* **Failure (0)** with probability $1 - p$

#### **Characteristics:**

* Number of trials: **1**

* Random variable $X \in \{0, 1\}$

* Probability mass function (PMF):

  $$
  P(X = x) = p^x(1 - p)^{1 - x}, \quad x \in \{0, 1\}
  $$

* Mean: $\mu = p$

* Variance: $\sigma^2 = p(1 - p)$

---

###  **Binomial Distribution**

#### **Definition:**

The **Binomial distribution** models the number of **successes in $n$ independent Bernoulli trials**, each with the same success probability $p$.

#### **Characteristics:**

* Number of trials: **$n \geq 1$**

* Random variable $X \in \{0, 1, 2, \dots, n\}$

* PMF:

  $$
  P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}, \quad k = 0, 1, \dots, n
  $$

* Mean: $\mu = np$

* Variance: $\sigma^2 = np(1 - p)$

---

###  **Key Differences**

| Feature               | **Bernoulli Distribution** | **Binomial Distribution**              |
| --------------------- | -------------------------- | -------------------------------------- |
| **Number of trials**  | 1                          | $n \geq 1$                             |
| **Range of outcomes** | 0 or 1                     | 0 to $n$                               |
| **Models**            | Single trial outcome       | Number of successes in multiple trials |
| **Mean**              | $p$                        | $np$                                   |
| **Variance**          | $p(1 - p)$                 | $np(1 - p)$                            |
| **PMF simplicity**    | Simpler, only two outcomes | Uses binomial coefficient              |

---

###  **Relationship Between Them:**

A **Bernoulli distribution is a special case** of the binomial distribution where:

$$
n = 1
$$

So, every Bernoulli distribution **is a binomial**, but not every binomial is a Bernoulli.

---

###  **Example:**

* **Bernoulli Example**: Toss a coin once — success = heads (1), failure = tails (0).
* **Binomial Example**: Toss a coin 10 times — count how many heads (successes) you get.

---

### **Summary:**

| Distribution  | Best For                                                                         |
| ------------- | -------------------------------------------------------------------------------- |
| **Bernoulli** | Modeling a single yes/no event                                                   |
| **Binomial**  | Modeling the total number of successes across multiple independent yes/no events |


# Question 4: Under what conditions is the binomial distribution used, and how does it relate to the Bernoulli distribution?


### **Conditions for Using the Binomial Distribution:**

The **binomial distribution** is used when the following **four conditions** (called the **BINS** conditions) are met:

####  **1. Binary Outcomes**

Each trial results in only **two possible outcomes**:

* **Success** (often coded as 1)
* **Failure** (often coded as 0)

####  **2. Independent Trials**

The outcome of one trial does **not affect** the outcome of another. All trials are **statistically independent**.

####  **3. Number of Trials is Fixed (n)**

The number of trials, $n$, is **fixed in advance**.

####  **4. Same Probability (p)**

Each trial has the **same probability** of success, denoted by $p$.

---

### **How Binomial Distribution Relates to the Bernoulli Distribution:**

The **binomial distribution is built on the Bernoulli distribution**.

* A **Bernoulli distribution** models the outcome of a **single trial** (with success probability $p$).
* A **binomial distribution** models the **sum of multiple independent Bernoulli trials** with the same probability of success.

####  **In other words:**

$$
\text{If } X_i \sim \text{Bernoulli}(p), \text{ for } i = 1, 2, ..., n \quad \text{(independent)}
$$

$$
\text{Then } X = \sum_{i=1}^{n} X_i \sim \text{Binomial}(n, p)
$$

So, the **binomial distribution is the distribution of the total number of successes** in $n$ independent Bernoulli trials.

---

###  **Example:**

* Tossing a fair coin:

  * **Bernoulli**: Toss once → Heads (1) or Tails (0)
  * **Binomial**: Toss 10 times → Count how many heads

---

### **Summary Table:**

| Feature      | **Bernoulli Distribution** | **Binomial Distribution**          |
| ------------ | -------------------------- | ---------------------------------- |
| Trials       | 1                          | $n \geq 1$                         |
| Outcome      | 0 or 1                     | Count of successes (0 to $n$)      |
| Use Case     | Single yes/no event        | Multiple independent yes/no events |
| Relationship | Base case                  | Sum of $n$ Bernoulli trials        |

---

**Conclusion:**
Use the **binomial distribution** when you're repeating the **same Bernoulli trial multiple times independently**, and you want to know the **number of successes**.


 # Question5: What are the key properties of the Poisson distribution, and when is it appropriate to use this distribution?


###  **Key Properties of the Poisson Distribution:**

The **Poisson distribution** is a **discrete probability distribution** that models the number of times an **event occurs in a fixed interval of time or space**, given a constant mean rate of occurrence.

####  **Key Properties:**

1. **Discrete Events**

   * It counts the **number of occurrences** of an event (0, 1, 2, ...).

2. **Single Parameter (λ)**

   * The distribution is defined by one parameter:

     $$
     \lambda = \text{average rate of occurrence (mean number of events per interval)}
     $$
   * λ is both the **mean** and the **variance** of the distribution.

3. **Probability Mass Function (PMF):**

   $$
   P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, ...
   $$

   where:

   * $k$ = number of occurrences
   * $e \approx 2.718$

4. **Events are Independent**

   * The occurrence of one event does not affect the probability of another.

5. **Constant Rate**

   * Events occur at a **constant average rate** (λ), not in bursts or clusters.

6. **No Simultaneous Events**

   * In a very small time interval, **only one event** can occur.

---

###  **When is the Poisson Distribution Appropriate?**

Use the Poisson distribution when:

* You are **counting events** (not measuring continuous quantities).
* Events occur **independently** and at a **constant average rate**.
* You are considering a **fixed interval** of **time, space, distance, or area**.

---

###  **Common Examples:**

* Number of emails received in an hour
* Number of phone calls at a call center per minute
* Number of cars passing through a toll booth in 10 minutes
* Number of typos on a single page of a book
* Number of accidents at an intersection per week

---

###  **Summary Table:**

| Property            | Poisson Distribution                          |
| ------------------- | --------------------------------------------- |
| **Type**            | Discrete                                      |
| **Parameter**       | $\lambda$ (mean rate of occurrence)           |
| **Mean**            | $\lambda$                                     |
| **Variance**        | $\lambda$                                     |
| **PMF**             | $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$  |
| **Use When**        | Counting events in fixed time/space intervals |
| **Key Assumptions** | Independence, constant rate, rare events      |

---

###  **Tip:**

The **Poisson distribution** is often used to model **rare events** over a large population or time span when each event is **independent and random**.


# Question6: Define the terms "probability distribution" and "probability density function" (PDF). How does a PDF differ from a probability mass function (PMF)?


###  **1. Probability Distribution**

A **probability distribution** describes how the probabilities of a random variable are **distributed across its possible values**.

* It tells you **what values a random variable can take** and **how likely each value is**.
* There are two main types of probability distributions:

  * **Discrete** (uses a PMF)
  * **Continuous** (uses a PDF)

---

###  **2. Probability Density Function (PDF)**

A **Probability Density Function (PDF)** is used to describe the **probability distribution of a continuous random variable**.

#### **Properties:**

* The PDF, denoted $f(x)$, **does not give probabilities directly**.

* Instead, the **area under the curve** of the PDF over an interval gives the probability that the variable falls within that interval:

  $$
  P(a \leq X \leq b) = \int_a^b f(x)\, dx
  $$

* The **total area under the curve** is 1:

  $$
  \int_{-\infty}^{\infty} f(x)\, dx = 1
  $$

* $f(x) \geq 0$ for all $x$

---

###  **3. Probability Mass Function (PMF)**

A **Probability Mass Function (PMF)** is used for **discrete random variables** and gives the **exact probability** that the random variable equals a specific value:

$$
P(X = x) = p(x)
$$

#### **Properties:**

* $p(x) \geq 0$
* $\sum_{x} p(x) = 1$

---

###  **Difference Between PDF and PMF**

| Feature                        | **PDF (Probability Density Function)**       | **PMF (Probability Mass Function)** |
| ------------------------------ | -------------------------------------------- | ----------------------------------- |
| **Used For**                   | Continuous random variables                  | Discrete random variables           |
| **Gives**                      | Density (area under curve gives probability) | Exact probability of a value        |
| **Probability of exact value** | 0 (e.g., $P(X = 3) = 0$)                     | Non-zero (e.g., $P(X = 3) = 0.2$)   |
| **Total (Sum/Area)**           | Area under curve = 1                         | Sum of probabilities = 1            |
| **Example**                    | Normal distribution, Exponential             | Binomial, Poisson, Bernoulli        |

---

###  **In Simple Terms:**

* **PDF**: Used for continuous variables like height or weight; probability is found **over intervals**.
* **PMF**: Used for discrete variables like number of heads in coin tosses; probability is found **at exact values**.


# Question7: Explain the Central Limit Theorem (CLT) with example


###  **Central Limit Theorem (CLT): Definition**

The **Central Limit Theorem (CLT)** is a fundamental principle in statistics that states:

> **"Regardless of the original distribution of the population, the sampling distribution of the sample mean will approach a normal distribution as the sample size becomes large enough."**

---

###  **Key Points of CLT:**

1. **Applies to Sample Means** (or sums).
2. **Works even if the original population is not normally distributed**, as long as the sample size is sufficiently large.
3. The **larger the sample size**, the **closer the sampling distribution of the mean** is to a normal distribution.
4. The **mean of the sampling distribution** is equal to the **population mean (μ)**.
5. The **standard deviation** of the sampling distribution (called **standard error**) is:

   $$
   \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
   $$

   Where:

   * $\sigma$ = population standard deviation
   * $n$ = sample size

---

###  **Why is CLT Important?**

* It allows us to use **normal distribution techniques (like z-scores)** even when the population is not normal.
* It's the **basis for many statistical tests and confidence intervals**.

---

###  **Example:**

Let’s say the **population** of students’ **scores** in a national exam is **skewed**, with:

* Mean $\mu = 70$
* Standard deviation $\sigma = 15$

####  Step 1: Take One Sample

If you take **one sample of 5 students**, their average might vary a lot and not follow a normal distribution.

####  Step 2: Take Many Samples

Now, suppose you:

* Take **1000 random samples**, each of **size 30**.
* Compute the **mean score** for each sample.

According to the **CLT**, the distribution of these 1000 sample means will:

* Be approximately **normal** (even if original scores were skewed)
* Have a **mean ≈ 70**
* Have a **standard error** of:

  $$
  \sigma_{\bar{x}} = \frac{15}{\sqrt{30}} \approx 2.74
  $$

So, even though individual scores are **not normally distributed**, the **sample means are** (thanks to the CLT).

---

###  **Visual Summary:**

| Aspect | Population Distribution       | Sample Means Distribution |
| ------ | ----------------------------- | ------------------------- |
| Shape  | Could be skewed or non-normal | Approximately normal      |
| Mean   | $\mu$                         | $\mu$                     |
| Spread | $\sigma$                      | $\frac{\sigma}{\sqrt{n}}$ |

---

###  **Conclusion:**

The **Central Limit Theorem** makes it possible to:

* Use **normal distribution tools** for **inference**, even when working with non-normal populations.
* **Estimate probabilities** and **construct confidence intervals** for population parameters using sample statistics.


# Question8: Compare z-scores and t-scores. When should you use a z-score, and when should a t-score be applied instead?

### **Question: Compare Z-scores and T-scores — When to Use Each**

---

###  **Z-score vs T-score: Overview**

Both **z-scores** and **t-scores** are used in **hypothesis testing and confidence intervals** to determine how far a sample statistic (like a sample mean) is from the population mean — **in terms of standard errors**.

| Feature            | **Z-Score**                                                  | **T-Score**                                                       |
| ------------------ | ------------------------------------------------------------ | ----------------------------------------------------------------- |
| Used with          | **Large samples** or **known population standard deviation** | **Small samples** and **unknown population standard deviation**   |
| Distribution       | **Standard Normal Distribution** (mean = 0, SD = 1)          | **Student’s t-distribution** (wider, mean = 0)                    |
| Formula            | $z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}$                | $t = \frac{\bar{x} - \mu}{s / \sqrt{n}}$                          |
| Variability        | Less variability, narrower tails                             | More variability, heavier tails (accounts for sample uncertainty) |
| Degrees of Freedom | Not applicable                                               | $df = n - 1$                                                      |

---

###  **When to Use a Z-score:**

Use the **z-score** when:

1. The **population standard deviation (σ)** is known.
2. The **sample size is large** (typically $n \geq 30$), even if $\sigma$ is unknown.
3. The **sampling distribution** is approximately normal.

---

###  **When to Use a T-score:**

Use the **t-score** when:

1. The **population standard deviation (σ)** is **unknown**.
2. The **sample size is small** (typically $n < 30$).
3. The population is **normally distributed** or approximately normal.

---

###  **Why the Difference Matters:**

* **T-scores** are more **conservative** (wider confidence intervals and larger critical values) to adjust for the **extra uncertainty** due to estimating $\sigma$ using the sample.
* As sample size increases, the **t-distribution approaches the standard normal distribution**, and **t-scores converge to z-scores**.

---

###  **Example Comparison:**

#### Situation A: Known σ, $n = 50$

Use **z-score**:

$$
z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
$$

#### Situation B: Unknown σ, $n = 15$

Use **t-score**:

$$
t = \frac{\bar{x} - \mu}{s / \sqrt{n}}, \quad df = 14
$$

---

###  **Summary Table**

| Condition                             | Use Z-score              | Use T-score                      |
| ------------------------------------- | ------------------------ | -------------------------------- |
| Population standard deviation known   | ✅                        | ❌                                |
| Population standard deviation unknown | ❌                        | ✅                                |
| Sample size ≥ 30                      | ✅ (even if σ is unknown) | ✅ (still preferred if σ unknown) |
| Sample size < 30                      | ❌                        | ✅                                |

---

###  **Conclusion:**

* **Z-score**: Use when conditions are ideal (large sample, known σ).
* **T-score**: Use when **real-world data** introduces uncertainty (small sample, unknown σ). It’s more robust in practical situations.
