Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

ans.
### Probability Mass Function (PMF)

* **Definition**: The Probability Mass Function (PMF) is used for **discrete random variables**. It gives the **probability that a discrete random variable is exactly equal to some value**.

* **Mathematical Form**:
  If $X$ is a discrete random variable, then the PMF is defined as:

  $$
  P(X = x) = p(x)
  $$

  where $p(x) \geq 0$ and $\sum p(x) = 1$

* **Example**:
  Consider the roll of a fair six-sided die. Let $X$ be the number that appears.

  $$
  P(X = x) = \frac{1}{6} \quad \text{for } x = 1, 2, 3, 4, 5, 6
  $$

  This is the PMF of the die roll.



### Probability Density Function (PDF)

* **Definition**: The Probability Density Function (PDF) is used for **continuous random variables**. It represents the **density of probability rather than the actual probability**. The **probability that the variable falls within a certain range** is given by the **area under the curve** of the PDF over that range.

* **Mathematical Form**:
  If $X$ is a continuous random variable, then the PDF is defined as:

  $$
  P(a \leq X \leq b) = \int_a^b f(x) \, dx
  $$

  where $f(x) \geq 0$ and $\int_{-\infty}^{\infty} f(x) \, dx = 1$

* **Example**:
  Suppose $X$ follows a uniform distribution between 0 and 1. Then, the PDF is:

  $$
  f(x) =
  \begin{cases}
  1 & \text{if } 0 \leq x \leq 1 \\
  0 & \text{otherwise}
  \end{cases}
  $$

  The probability that $X$ lies between 0.2 and 0.5 is:

  $$
  P(0.2 \leq X \leq 0.5) = \int_{0.2}^{0.5} 1 \, dx = 0.3
  $$



Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

ans.

### **Definition:**

The **Cumulative Distribution Function (CDF)** gives the **probability that a random variable $X$ takes a value less than or equal to a certain value $x$**.

* For a **discrete** random variable:

  $$
  F(x) = P(X \leq x) = \sum_{t \leq x} P(X = t)
  $$

* For a **continuous** random variable:

  $$
  F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t) \, dt
  $$

  where $f(t)$ is the probability density function (PDF).



### **Why is CDF used?**

1. **To find cumulative probability**: It helps determine the probability that a value is **less than or equal to** a specific point.
2. **To compare distributions**: CDFs allow easy comparison between different probability distributions.
3. **To define probability intervals**: You can compute probabilities between two values:

   $$
   P(a < X \leq b) = F(b) - F(a)
   $$
4. **Used in statistical inference** and simulations (e.g., inverse transform sampling).



### **Example 1: Discrete Case (Rolling a die)**

Let $X$ be the outcome of rolling a fair six-sided die.

The PMF is:

$$
P(X = x) = \frac{1}{6} \text{ for } x = 1, 2, 3, 4, 5, 6
$$

The CDF $F(x) = P(X \leq x)$ is:

* $F(1) = P(X = 1) = \frac{1}{6}$
* $F(2) = P(X \leq 2) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$
* $F(3) = \frac{3}{6}$, and so on until
* $F(6) = 1$



### **Example 2: Continuous Case (Uniform Distribution)**

Let $X \sim \text{Uniform}(0, 1)$, so the PDF is:

$$
f(x) = 1 \text{ for } 0 \leq x \leq 1
$$

The CDF is:

$$
F(x) = \int_0^x 1 \, dt = x \text{ for } 0 \leq x \leq 1
$$

So:

* $F(0.2) = 0.2$
* $F(0.5) = 0.5$
* $F(1) = 1$



Q3: What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

ans.

### **Examples of Situations Using Normal Distribution**

The **normal distribution** is commonly used in statistics to model real-world phenomena because many natural processes tend to cluster around a central value with symmetrical variation.

Here are some typical examples:

1. **Heights of people** in a large population (e.g., adult males or females in a country).
2. **Test scores** in standardized exams (e.g., SAT, IQ tests), where most students score around the average.
3. **Measurement errors** or noise in scientific experiments.
4. **Blood pressure or body temperature** of healthy individuals.
5. **Daily returns of stocks or assets** in finance (approximate under certain conditions).


### **Parameters of the Normal Distribution**

The **normal distribution** is defined by **two parameters**:

1. **Mean (μ)** – determines the **center** (location) of the distribution.
2. **Standard Deviation (σ)** – determines the **spread** (width) of the distribution.

The probability density function (PDF) of a normal distribution is:

$$
f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 }
$$


### **Effect of Parameters on the Shape**

| Parameter                  | Role in the Distribution                                                     |
| -------------------------- | ---------------------------------------------------------------------------- |
| **μ (mean)**               | Shifts the entire curve left or right. It is the **peak** of the bell curve. |
| **σ (standard deviation)** | Controls how **wide or narrow** the bell curve is:                           |

* **Larger σ** = flatter and wider curve
* **Smaller σ** = taller and narrower curve |



### **Visual Understanding (Conceptual)**

* A **normal distribution with μ = 0 and σ = 1** is called the **standard normal distribution**.
* Changing **μ** moves the curve along the x-axis.
* Changing **σ** stretches or compresses the curve.



Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.

ans.

### **Importance of Normal Distribution**

The **normal distribution** is one of the most important and widely used probability distributions in statistics and data analysis. Its importance lies in the following reasons:

1. **Natural Occurrence**: Many real-world phenomena naturally follow a normal distribution due to the **Central Limit Theorem**, which states that the sum (or average) of a large number of independent, random variables tends to be normally distributed, regardless of the original distribution.

2. **Statistical Inference**: A large number of statistical methods (confidence intervals, hypothesis testing, regression analysis) assume that the data is normally distributed.

3. **Predictability**: In a normal distribution, we know the proportion of data within certain standard deviations:

   * About **68%** of the data lies within 1 standard deviation (σ) of the mean (μ)
   * About **95%** within 2σ
   * About **99.7%** within 3σ
     This is known as the **Empirical Rule**.

4. **Basis for Standard Scores (Z-scores)**: The normal distribution allows us to calculate Z-scores, which help in comparing data points from different distributions.

5. **Model Simplicity**: The bell-shaped curve is smooth, symmetric, and mathematically tractable, making it easier to use for theoretical and practical applications.



### **Real-Life Examples of Normal Distribution**

Here are some everyday scenarios where normal distribution is often observed or assumed:

1. **Human Heights**: The heights of adult males and females in a population tend to follow a normal distribution, centered around an average height.

2. **IQ Scores**: Intelligence Quotient (IQ) scores are designed to follow a normal distribution with a mean of 100 and a standard deviation of 15.

3. **Exam Scores**: In large-scale testing (like SAT, GRE), student scores are often normally distributed due to the diversity of test-takers.

4. **Measurement Errors**: In scientific experiments and engineering, errors in repeated measurements tend to be normally distributed due to small random fluctuations.

5. **Blood Pressure**: The systolic blood pressure of healthy adults tends to follow a normal distribution in medical studies.

6. **Manufacturing Defects**: Variations in product dimensions (like the length of a metal rod) in a controlled manufacturing process can be modeled using a normal distribution.


Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?

ans.

### **What is Bernoulli Distribution?**

A **Bernoulli distribution** is a **discrete probability distribution** for a **random variable that has only two possible outcomes**:

* **Success (1)** with probability $p$
* **Failure (0)** with probability $1 - p$

It is used to model a **single trial** or **experiment** with only two outcomes.

**Probability Mass Function (PMF):**

$$
P(X = x) =
\begin{cases}
p, & \text{if } x = 1 \\
1 - p, & \text{if } x = 0
\end{cases}
$$

where $0 \leq p \leq 1$



### **Example of Bernoulli Distribution**

* Tossing a coin once:

  * Success = Head (1), Failure = Tail (0)
  * $p = 0.5$, $1 - p = 0.5$
* Checking if a bulb works or not:

  * Success = Working (1), Failure = Defective (0)


### **What is Binomial Distribution?**

A **Binomial distribution** is the probability distribution of the **number of successes in $n$ independent Bernoulli trials**, each with the same probability of success $p$.

**Probability Mass Function:**

$$
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
$$

where:

* $n$ = number of trials
* $k$ = number of successes (0 ≤ $k$ ≤ $n$)
* $p$ = probability of success



###  **Key Differences: Bernoulli vs Binomial Distribution**

| Feature             | **Bernoulli Distribution**      | **Binomial Distribution**                   |
| ------------------- | ------------------------------- | ------------------------------------------- |
| Number of Trials    | 1                               | $n$ (more than 1)                           |
| Possible Outcomes   | 0 or 1                          | 0 to $n$ (number of successes)              |
| Probability Formula | $P(X = x) = p^x(1 - p)^{1 - x}$ | $P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}$ |
| Used For            | Single yes/no experiment        | Repeated, independent Bernoulli trials      |
| Example             | Tossing 1 coin                  | Tossing 10 coins and counting heads         |



Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

ans. ### Q6: Probability That a Random Observation Is Greater Than 60

**Given:**

* Mean $\mu = 50$
* Standard deviation $\sigma = 10$
* We want: $P(X > 60)$
* Assuming the dataset is **normally distributed**



### **Step 1: Convert to Z-score**

We use the **Z-score formula**:

$$
Z = \frac{X - \mu}{\sigma}
$$

Substitute the values:

$$
Z = \frac{60 - 50}{10} = \frac{10}{10} = 1
$$

So, $P(X > 60) = P(Z > 1)$



### **Step 2: Use the Standard Normal Distribution Table**

From the **Z-table**, we find:

$$
P(Z < 1) = 0.8413
$$

Therefore,

$$
P(Z > 1) = 1 - P(Z < 1) = 1 - 0.8413 = 0.1587
$$



### **Final Answer:**

$$
\boxed{P(X > 60) = 0.1587}
$$

So, there is approximately a **15.87%** chance that a randomly selected observation from this normal distribution will be greater than 60.




 Q7.  Explain Uniform Distribution with an Example



A **uniform distribution** is a type of probability distribution in which **all outcomes are equally likely** within a given interval. There are two main types:

1. **Discrete Uniform Distribution** – used when a finite number of outcomes are equally likely.
2. **Continuous Uniform Distribution** – used when any value within an interval is equally likely.



### **Probability Density Function (PDF):**

If a continuous random variable $X$ follows a **uniform distribution** over the interval $[a, b]$, then the probability density function (PDF) is:

$$
f(x) =
\begin{cases}
\frac{1}{b - a}, & \text{for } a \leq x \leq b \\
0, & \text{otherwise}
\end{cases}
$$

* The **mean** of the distribution is $\mu = \frac{a + b}{2}$
* The **variance** is $\sigma^2 = \frac{(b - a)^2}{12}$


### **Example:**

Suppose the arrival time of a bus is **uniformly distributed between 10:00 AM and 10:30 AM**. Let $X$ represent the number of minutes past 10:00 AM that the bus arrives. So:

* $a = 0$, $b = 30$
* The PDF is: $f(x) = \frac{1}{30}$, for $0 \leq x \leq 30$

#### Probability the bus arrives after 10:15 AM (i.e., $X > 15$):

$$
P(X > 15) = \int_{15}^{30} \frac{1}{30} dx = \frac{30 - 15}{30} = \frac{15}{30} = 0.5
$$

So, there is a **50% chance** the bus arrives after 10:15 AM.



Q8: What is the z score? State the importance of the z score.

ans.

 #Z-Score?

A **Z-score** is a statistical measure that indicates how many standard deviations a data point $X$ is away from the mean $\mu$ of a dataset. It is calculated using the formula:

$$
z = \frac{X - \mu}{\sigma}
$$

where $\sigma$ is the standard deviation.

* If $z = 0$, the data point is exactly at the mean.
* If $z > 0$, the data point is above the mean.
* If $z < 0$, the data point is below the mean.



### Importance of the Z-Score

1. **Standardization**: It converts different data points to a common scale, allowing comparison across different datasets or units.

2. **Outlier Detection**: Data points with very high or very low z-scores (usually above 3 or below -3) are considered outliers.

3. **Probability Calculation**: Z-scores are used with the standard normal distribution to find probabilities and percentiles.

4. **Statistical Testing**: Used in hypothesis testing, such as z-tests, to determine how unusual a sample statistic is.

5. **Data Normalization**: Helps in normalizing data for machine learning models and other statistical analyses.


Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.

ans.

###  Central Limit Theorem (CLT)?

The **Central Limit Theorem** states that:

> When you take sufficiently large random samples from any population with a finite mean and variance, the **distribution of the sample means** will approximate a **normal distribution**, regardless of the original population's distribution.

More formally:
If $X_1, X_2, ..., X_n$ are independent and identically distributed random variables with mean $\mu$ and variance $\sigma^2$, then as the sample size $n$ becomes large, the sampling distribution of the sample mean $\bar{X}$ approaches a normal distribution with mean $\mu$ and variance $\frac{\sigma^2}{n}$.



### Significance of the Central Limit Theorem

1. **Justifies the Use of Normal Distribution:**

   * It allows statisticians to use normal distribution approximations for many problems, even when the original data is not normally distributed.

2. **Foundation for Inferential Statistics:**

   * Enables estimation of population parameters and hypothesis testing using sample data.

3. **Simplifies Complex Problems:**

   * Instead of dealing with complicated original distributions, we focus on the distribution of sample means, which is easier to handle.

4. **Allows for Probability Calculations:**

   * With the sample mean distribution being normal, we can calculate probabilities and confidence intervals easily.

5. **Widely Applicable:**

   * The theorem applies to various fields like economics, psychology, engineering, and natural sciences where sample data is analyzed.


Q10: State the assumptions of the Central Limit Theorem.

ans.

The Central Limit Theorem (CLT) relies on a few key assumptions for it to hold true:

1. **Independence**
   The samples must be independent of each other. This means the selection of one observation should not influence another.

2. **Identically Distributed**
   The random variables (data points) should come from the same probability distribution with the same mean $\mu$ and variance $\sigma^2$.

3. **Sample Size**
   The sample size $n$ should be sufficiently large. Although there’s no strict rule, a common guideline is $n \geq 30$ for the CLT to apply well.

4. **Finite Mean and Variance**
   The population from which samples are drawn must have a finite mean and variance. If the variance is infinite, the CLT does not apply.

