#Statistics Advance Part 1

1. What is a random variable in probability theory ?
- In **probability theory**, a **random variable** is a numerical quantity whose value is determined by the outcome of a **random experiment**.

### Simple definition:
A **random variable** assigns a number to each possible outcome of a random process.

---

### Two main types:

1. **Discrete Random Variable**  
   - Takes on **countable** values (like integers).  
   - Example:  
     If you roll a die, the outcome can be 1, 2, 3, 4, 5, or 6.  
     Let \( X \) be the random variable representing the die roll:  
     \( X \in \{1, 2, 3, 4, 5, 6\} \)

2. **Continuous Random Variable**  
   - Takes on values from an **interval or range** (uncountably infinite).  
   - Example:  
     Let \( Y \) be the random variable for the time (in seconds) it takes for a light to turn green.  
     \( Y \in [0, \infty) \)

---

### Notation:
Random variables are usually denoted by uppercase letters like \( X \), \( Y \), or \( Z \).  
- For example:  
  \( P(X = 3) \) is the probability that random variable \( X \) takes the value 3.

---

### Important:
Even though it's called a "variable," it's not *variable* in the usual algebraic sense. It's more like a function that maps outcomes to numbers.

2.  What are the types of random variables ?
- Great question! In **probability theory**, random variables are mainly classified into **two types**, based on the kind of values they can take:

---

### 1. **Discrete Random Variable**

- **Definition**: A random variable that takes on **a finite or countably infinite set** of values.  
- **Values**: Typically whole numbers (integers).  
- **Examples**:  
  - Number of heads in 5 coin tosses  
  - Roll of a standard 6-sided die  
  - Number of customers arriving at a store in an hour  
- **Probability** is described by a **probability mass function (PMF)**  
  - Example:  
    \[
    P(X = x)
    \]

---

### 2. **Continuous Random Variable**

- **Definition**: A random variable that takes on values from an **uncountably infinite set**, typically an **interval** of real numbers.  
- **Values**: Real numbers within a range.  
- **Examples**:  
  - Height of a person  
  - Time taken to complete a task  
  - Temperature in a day  
- **Probability** is described by a **probability density function (PDF)**  
  - You don’t find \( P(X = x) \) (it's always 0), but rather:  
    \[
    P(a \le X \le b) = \int_a^b f(x) \, dx
    \]

---

### (Less Common) Extended Types:
There are also some special cases or combinations, like:

- **Mixed Random Variable**: Has both discrete and continuous components.
- **Multivariate Random Variables**: Random vectors with multiple variables (e.g., \( (X, Y) \)).

---

3.  What is the difference between discrete and continuous distributions ?
- Great follow-up! The key difference between **discrete** and **continuous** distributions lies in the **type of values** the random variable can take and **how probability is assigned**.

Here's a clear breakdown:

---

### 🔹 **Discrete Distribution**

- **Values**: Takes **specific, countable** values (e.g., integers)
- **Probability Representation**:  
  Uses a **Probability Mass Function (PMF)**  
  \[
  P(X = x)
  \]
- **Probability at a Point**:  
  Probability **can be assigned** to exact values  
  (e.g., \( P(X = 2) = 0.25 \))
- **Example Distributions**:
  - Binomial distribution  
  - Poisson distribution  
  - Geometric distribution  
- **Example Scenario**:  
  Number of heads in 10 coin flips

---

### 🔹 **Continuous Distribution**

- **Values**: Takes **any value in a range** (uncountably infinite)  
  Example: \( X \in [0, 1] \) or \( X \in (-\infty, \infty) \)
- **Probability Representation**:  
  Uses a **Probability Density Function (PDF)**  
  \[
  P(a \le X \le b) = \int_a^b f(x) \, dx
  \]
- **Probability at a Point**:  
  Probability of any **exact value is 0**  
  (e.g., \( P(X = 2) = 0 \))
- **Example Distributions**:
  - Normal (Gaussian) distribution  
  - Exponential distribution  
  - Uniform distribution (continuous version)
- **Example Scenario**:  
  Time taken for a task to complete (e.g., between 2 and 5 minutes)

---

### ✅ Summary Table:

| Feature              | Discrete Distribution         | Continuous Distribution           |
|----------------------|-------------------------------|------------------------------------|
| Values Taken         | Countable (e.g., 1, 2, 3)     | Uncountable (real numbers)         |
| Probability of Value | \( P(X = x) > 0 \)            | \( P(X = x) = 0 \)                 |
| Function Used        | PMF (Probability Mass Function) | PDF (Probability Density Function) |
| Example              | Rolling a die                 | Measuring height                   |

---

4. What are probability distribution functions (PDF) ?
- A **Probability Distribution Function (PDF)** describes the **likelihood** of a **continuous random variable** taking on a range of values.

- It gives the **density** of probability at each point, not the exact probability.
- For a PDF \( f(x) \), the probability that the variable falls in an interval \([a, b]\) is:
  \[
  P(a \le X \le b) = \int_a^b f(x) \, dx
  \]
- The **total area under the curve** of a PDF is always **1**.

 Note: For **discrete** variables, we use a **Probability Mass Function (PMF)** instead.

 ----
 5.  How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF) ?
 - Great question! Here's a short and clear comparison:

---

### 🔹 **PDF (Probability Density Function)**
- Used for **continuous** random variables.
- Describes the **probability density** at a specific point.
- You use it to calculate probability over an **interval**:
  \[
  P(a \le X \le b) = \int_a^b f(x)\, dx
  \]
- The **value** of the PDF at a point isn't a probability—only the **area under the curve** is.

---

### 🔹 **CDF (Cumulative Distribution Function)**
- Used for **both discrete and continuous** random variables.
- Describes the **cumulative probability** up to a point:
  \[
  F(x) = P(X \le x)
  \]
- It's a **monotonically increasing** function (never decreases).
- You can get the PDF from the CDF (for continuous variables) by differentiating:
  \[
  f(x) = \frac{d}{dx}F(x)
  \]

---

### ✅ Summary Table:

| Feature | PDF | CDF |
|--------|-----|-----|
| Stands for | Probability Density Function | Cumulative Distribution Function |
| Type of variable | Continuous | Discrete or Continuous |
| Gives | Probability **density** at a point | Probability **up to** a point |
| Range | Can be >1 (but area under curve = 1) | Always between 0 and 1 |
| Relation | \( f(x) = \frac{d}{dx}F(x) \) | \( F(x) = \int_{-\infty}^x f(t)dt \) |

---

6. What is a discrete uniform distribution ?
- Ah yes, the **discrete uniform distribution** — the statistical equivalent of "I don’t care who wins, just pick one."

---

### 🔹 **Definition:**
A **discrete uniform distribution** is a probability distribution where **all outcomes are equally likely** over a **finite set** of values.

---

### 🔸 **Example:**
Rolling a fair 6-sided die:
- Possible outcomes: \( \{1, 2, 3, 4, 5, 6\} \)
- Each value has the **same probability**:
  \[
  P(X = x) = \frac{1}{n} \quad \text{for } x = 1, 2, ..., n
  \]
  Here, \( n = 6 \), so each number has a \( \frac{1}{6} \) chance.

---

### 🔸 **Key Properties:**
- All values in the range have **equal probability**.
- PMF (Probability Mass Function):
  \[
  P(X = x) = \frac{1}{b - a + 1}
  \]
  where \( X \in \{a, a+1, ..., b\} \)

- **Mean**:
  \[
  \mu = \frac{a + b}{2}
  \]
- **Variance**:
  \[
  \sigma^2 = \frac{(b - a + 1)^2 - 1}{12}
  \]

---

So, it's the distribution for when the universe shrugs and says, *“Eh, they all seem fine.”*

-----

7.  What are the key properties of a Bernoulli distribution ?
- Ah, the **Bernoulli distribution** — the simplest and most fundamental distribution in probability. It's all about **yes/no**, **success/failure**, or **1/0** outcomes.

---

### 🔹 **Definition:**
A **Bernoulli distribution** models a random experiment with exactly **two outcomes**:
- Success (1)
- Failure (0)

It’s like flipping a biased coin **once**.

---

### 🔑 **Key Properties:**

| Property          | Description |
|------------------|-------------|
| **Outcomes**      | \( X \in \{0, 1\} \) |
| **Parameter**     | \( p \), the probability of success (1), where \( 0 \le p \le 1 \) |
| **PMF (Probability Mass Function)** |  
\[
P(X = x) =
\begin{cases}
p & \text{if } x = 1 \\
1 - p & \text{if } x = 0
\end{cases}
\] |
| **Mean (Expected value)** | \( E[X] = p \) |
| **Variance**     | \( \text{Var}(X) = p(1 - p) \) |
| **Mode**         | 0 if \( p < 0.5 \), 1 if \( p > 0.5 \), both if \( p = 0.5 \) |

---

### 🧠 Real-life Examples:
- Flipping a (possibly unfair) coin once  
- Did the customer buy the product? (yes = 1, no = 0)  
- Is a light bulb defective? (defective = 1, working = 0)

---

If you repeat Bernoulli trials multiple times, you get the **Binomial distribution** — like its big sibling!

----

8.  What is the binomial distribution, and how is it used in probability ?
- Great question! The **binomial distribution** is one of the most widely used distributions in probability and statistics — it's basically the **repeated Bernoulli distribution**.

---

### 🔹 **Definition:**
A **binomial distribution** models the number of **successes** in a fixed number of **independent** Bernoulli trials (yes/no experiments), each with the **same probability of success**.

---

### 📘 **Scenario:**
You perform an experiment **n times**, each with:
- Probability of success = \( p \)
- Probability of failure = \( 1 - p \)

Let the random variable \( X \) be the number of **successes** in those \( n \) trials.

---

### 🔸 **Probability Mass Function (PMF):**
\[
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
\]
Where:
- \( k \): number of successes
- \( \binom{n}{k} \): combinations ("n choose k")

---

### 🔸 **Key Properties:**

| Property | Formula |
|----------|---------|
| **Mean** | \( \mu = np \) |
| **Variance** | \( \sigma^2 = np(1 - p) \) |
| **Range of X** | \( 0 \le X \le n \) |

---

### 🧠 **Real-Life Examples:**
- Flipping a coin 10 times: how many heads?
- Sending 100 emails: how many get replies?
- Testing 20 products: how many are defective?

---

9. What is the Poisson distribution and where is it applied ?
- Great question again! The **Poisson distribution** is a classic in probability theory — it's used when you’re counting how often an event occurs **randomly over time or space**.

---

### 🔹 **Definition:**
The **Poisson distribution** models the **number of events** occurring in a fixed interval of **time or space**, assuming:
- Events occur **independently**,
- At a **constant average rate** \( \lambda \) (lambda),
- And **not simultaneously** (no two at the exact same instant).

---

### 🔸 **Probability Mass Function (PMF):**
\[
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
\]
Where:
- \( k \): number of events (0, 1, 2, ...)
- \( \lambda \): average rate of occurrence
- \( e \): Euler’s number (≈ 2.718)

---

### 🔸 **Key Properties:**

| Property | Formula |
|----------|---------|
| **Mean** | \( \mu = \lambda \) |
| **Variance** | \( \sigma^2 = \lambda \) |
| **Support** | \( k = 0, 1, 2, ... \) (non-negative integers) |

---

10. What is a continuous uniform distribution ?
- A **continuous uniform distribution** is a distribution where a continuous random variable is **equally likely to take any value** within a given interval \([a, b]\).

---

### 🔹 **Key Properties:**
- **PDF (Probability Density Function):**
  \[
  f(x) = \frac{1}{b - a} \quad \text{for } a \le x \le b
  \]
- **Mean:**  
  \[
  \mu = \frac{a + b}{2}
  \]
- **Variance:**  
  \[
  \sigma^2 = \frac{(b - a)^2}{12}
  \]

---

 **Example:**
If a bus arrives at **any time between 0 and 10 minutes**, and all times are equally likely, then the wait time follows a **uniform distribution** on \([0, 10]\).

---

**In short**: It's like a flat, horizontal line over the interval — everything is equally likely!

----

11. What are the characteristics of a normal distribution ?
- The **normal distribution** (also called Gaussian distribution) is a **bell-shaped**, symmetric probability distribution used widely in statistics.

---

### 🔹 **Key Characteristics:**

- **Symmetric** around the **mean** \( \mu \)
- **Mean = Median = Mode**
- **Shape**: Bell curve  
- **Spread** determined by **standard deviation** \( \sigma \)
- **Total area under the curve = 1**
- Most values lie within:
  - \( \mu \pm 1\sigma \) → ~68%
  - \( \mu \pm 2\sigma \) → ~95%
  - \( \mu \pm 3\sigma \) → ~99.7% (**Empirical Rule**)

---

### 📘 **PDF:**
\[
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{- \frac{(x - \mu)^2}{2\sigma^2}}
\]

---

**Applications:**
- Heights, weights, test scores, measurement errors — anything naturally varying.

---

**In short**: It’s the go-to distribution for modeling **natural, symmetric data** centered around an average.

----

12.  What is the standard normal distribution, and why is it important ?

- The **standard normal distribution** is a special case of the normal distribution with:

- **Mean \( \mu = 0 \)**
- **Standard deviation \( \sigma = 1 \)**

---

### 🔹 **Why it's important:**

- It allows **any normal distribution** to be **standardized** using:
  \[
  Z = \frac{X - \mu}{\sigma}
  \]
- This **Z-score** tells how many standard deviations a value is from the mean.
- Used for:
  - **Probability calculations**
  - **Comparing scores across different normal distributions**
  - **Statistical inference and hypothesis testing**

---

 **In short**:  
It’s the foundation for many statistical methods and makes complex data easier to analyze using a single reference curve.

----

13. What is the Central Limit Theorem (CLT), and why is it critical in statistics ?

- The **Central Limit Theorem (CLT)** states that:

> **The sampling distribution of the sample mean** (or sum) of a large number of **independent, identically distributed (i.i.d.) random variables** approaches a **normal distribution**, regardless of the original population's distribution.

---

### 🔹 **Why it’s critical:**

- It **justifies using normal distribution** in many statistical methods.
- Allows **inference** (e.g., confidence intervals, hypothesis tests) even when the data is **not normally distributed**.
- Makes analyzing **real-world data** easier and more reliable.

---

 **In short**:  
The CLT is the reason **normal distribution shows up everywhere** in statistics — it turns messy data into manageable bell curves!

----

14.  How does the Central Limit Theorem relate to the normal distribution ?

- The **Central Limit Theorem (CLT)** relates to the **normal distribution** by stating that:

> As the sample size increases, the **distribution of the sample mean** approaches a **normal distribution**, even if the original data is not normal.

---

### 🔹 Key Connection:

- The CLT explains **why** the **normal distribution** is so common in statistics.
- It allows us to **approximate probabilities** using the **standard normal distribution** for large samples.

---

**In short**:  
The CLT shows that **normal distribution naturally arises** when averaging large samples — it's the bridge between real-world data and the bell curve.

----

15.  What is the application of Z statistics in hypothesis testing ?
- **Z-statistics** are used in **hypothesis testing** to determine how far a sample statistic is from the population parameter, in terms of **standard deviations**.

---

### 🔹 **Applications:**

- Testing **means** when population standard deviation is known
- **Large sample sizes** (typically \( n \geq 30 \))
- Used in **Z-tests** for:
  - One-sample tests
  - Two-sample comparisons
  - Proportions

---

### 🔸 **Formula:**
\[
Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}
\]

---

 **In short**:  
Z-statistics help decide whether to **reject or retain a null hypothesis** by comparing how unusual the sample result is under the assumption the null is true.

----

16. How do you calculate a Z-score, and what does it represent ?
- ### 🔹 **Z-score Formula:**
\[
Z = \frac{X - \mu}{\sigma}
\]

Where:  
- \( X \) = value  
- \( \mu \) = mean  
- \( \sigma \) = standard deviation  

---

### 🔸 **What It Represents:**

A **Z-score** tells you how many **standard deviations** a value \( X \) is from the **mean**.

- \( Z > 0 \): Above the mean  
- \( Z < 0 \): Below the mean  
- \( Z = 0 \): Exactly at the mean  

---

 **In short**:  
The Z-score shows how **unusual or typical** a value is compared to the overall distribution.

-----

17. What are point estimates and interval estimates in statistics ?
- ### 🔹 **Point Estimate:**
A **single value** used to estimate a population parameter.

- Example: Sample mean (\( \bar{x} \)) as an estimate of population mean (\( \mu \))

---

### 🔹 **Interval Estimate:**
A **range of values** (with a confidence level) likely to contain the population parameter.

- Example: 95% confidence interval for \( \mu \):  
  \[
  \bar{x} \pm Z \left( \frac{\sigma}{\sqrt{n}} \right)
  \]

---

**In short**:  
- **Point estimate** gives a best guess.  
- **Interval estimate** gives a range with a confidence level for better accuracy.

----

18. What is the significance of confidence intervals in statistical analysis ?

- ### 🔹 **Significance of Confidence Intervals:**

Confidence intervals provide a **range of values** that likely contain the **true population parameter**, with a certain **level of confidence** (e.g., 95%).

---

### 🔸 **Why It's Important:**

- Adds **precision** to point estimates  
- Reflects **uncertainty** in sample data  
- Helps in **decision-making** and **hypothesis testing**

---

 **In short**:  
Confidence intervals show **how reliable** your estimate is and give a **range of plausible values** for the true parameter.

----

19. What is the relationship between a Z-score and a confidence interval ?

- ### 🔹 **Relationship Between Z-score & Confidence Interval:**

The **Z-score** determines the **critical value** used to calculate a **confidence interval** for a population parameter when the population standard deviation is known.

---

### 🔸 **Example:**
For a **95% confidence interval**, the Z-score is **1.96**, meaning:
\[
\text{CI} = \bar{x} \pm Z \left( \frac{\sigma}{\sqrt{n}} \right)
\]

---

 **In short**:  
The Z-score sets the **margin of error** in a confidence interval, based on the desired **confidence level**.

----

20. How are Z-scores used to compare different distributions ?

- ### 🔹 **Z-scores & Comparing Distributions:**

**Z-scores standardize values** from different distributions by converting them to a **common scale** (mean = 0, standard deviation = 1).

---

### 🔸 **Why It Helps:**
- Allows comparison of values from **different units**, **scales**, or **distributions**
- Shows which value is more **extreme** or **typical** relative to its own distribution

---

 **In short**:  
Z-scores let you **fairly compare** values across different datasets by showing how far each is from its mean in standard deviation units.

----

21. What are the assumptions for applying the Central Limit Theorem ?

- ### 🔹 **Assumptions of the Central Limit Theorem (CLT):**

1. **Independent observations** – Each sample must be independent of others.  
2. **Identically distributed** – All observations come from the same population.  
3. **Sample size** – Should be **large** (usually \( n \geq 30 \)); smaller if the population is already normal.  
4. **Finite mean and variance** – The population must have a defined mean and variance.

---

**In short**:  
CLT requires **independent, identically distributed** data with a **large enough sample size** to ensure the sample mean approximates a **normal distribution**.

----
22. What is the concept of expected value in a probability distribution?
- ### 🔹 **Expected Value (EV):**

The **expected value** is the **long-run average** or **mean** of a random variable in a probability distribution.

---

### 🔸 **Formula:**

- **Discrete**:  
  \[
  E(X) = \sum x_i \cdot P(x_i)
  \]

- **Continuous**:  
  \[
  E(X) = \int_{-\infty}^{\infty} x \cdot f(x) \, dx
  \]

---

 **In short**:  
The expected value tells you what you’d **expect on average** if you repeated a random experiment **many times**.

----

23. How does a probability distribution relate to the expected outcome of a random variable ?

- ### 🔹 **Relationship:**

A **probability distribution** assigns probabilities to all possible values of a **random variable**, and the **expected outcome** (expected value) is the **weighted average** of those values.

---

### 🔸 **Key Idea:**

The **expected value** uses the **probabilities** from the distribution to predict the **average result** over many trials.

---

 **In short**:  
The **probability distribution** shapes how likely each outcome is, and the **expected value** summarizes the average result you'd expect in the long run.
----


# Practical Questions Solutions

1. Write a Python program to generate a random variable and display its value .

#Example: From a Normal Distribution

In [None]:
import random

# Generate a random variable (normal distribution with mean=0, std=1)
random_variable = random.gauss(0, 1)

# Display the value
print(f"Generated Random Variable (Normal): {random_variable:.2f}")


2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF) .

#Python Code: Discrete Uniform PMF Plot

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

# Define parameters for the discrete uniform distribution
low = 1    # inclusive lower bound
high = 7   # exclusive upper bound, so this gives values from 1 to 6 (like a die)

# Generate values (PMF is same for all in range)
x = np.arange(low, high)
pmf = randint.pmf(x, low, high)

# Plotting the PMF
plt.stem(x, pmf, basefmt=" ", use_line_collection=True)
plt.title('PMF of Discrete Uniform Distribution (1 to 6)')
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.ylim(0, 1)
plt.grid(True)
plt.show()


3.  Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution .

#Python Function for Bernoulli PDF

In [None]:
def bernoulli_pdf(x, p):
    """
    Calculate the PDF (PMF) of a Bernoulli distribution.

    Parameters:
        x (int): Outcome (0 or 1)
        p (float): Probability of success (1), where 0 <= p <= 1

    Returns:
        float: Probability of observing outcome x
    """
    if x not in [0, 1]:
        return 0  # Invalid value for Bernoulli
    return p if x == 1 else (1 - p)

# Example usage:
p = 0.7
print("P(X=0):", bernoulli_pdf(0, p))
print("P(X=1):", bernoulli_pdf(1, p))


#Explanation:
- The Bernoulli distribution has only two outcomes:
   - P(X=1)=p
   - P(X=0)=1−p
   - This function handles both and returns the correct probability.

4.  Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram .

#Python Script: Simulate and Plot Binomial Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
n = 10      # number of trials
p = 0.5     # probability of success
size = 1000 # number of simulations

# Simulate binomial distribution
data = np.random.binomial(n, p, size)

# Plot histogram
plt.hist(data, bins=range(n+2), align='left', rwidth=0.8, color='skyblue', edgecolor='black')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.xticks(range(n + 1))
plt.show()


5.  Create a Poisson distribution and visualize it using Python .

# Python Script: Poisson Distribution Visualization



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

# Parameter for Poisson distribution
lambda_ = 4  # average rate (mean number of events)

# Generate range of x values
x = np.arange(0, 15)

# Calculate PMF for each x
pmf = poisson.pmf(x, mu=lambda_)

# Plot the PMF
plt.stem(x, pmf, basefmt=" ", use_line_collection=True)
plt.title(f'Poisson Distribution (λ = {lambda_})')
plt.xlabel('Number of Events (k)')
plt.ylabel('Probability P(X = k)')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


6. Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete
uniform distribution .

# Python Program: Discrete Uniform CDF Plot

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

# Parameters for discrete uniform distribution (like a fair die)
low = 1    # inclusive lower bound
high = 7   # exclusive upper bound → gives values from 1 to 6

# Define x values (possible outcomes)
x = np.arange(low, high)

# Compute the CDF
cdf = randint.cdf(x, low, high)

# Plotting the CDF
plt.step(x, cdf, where='post', color='blue', label='CDF')
plt.title('CDF of Discrete Uniform Distribution (1 to 6)')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.xticks(x)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.show()


7.  Generate a continuous uniform distribution using NumPy and visualize it .

# Python Script: Continuous Uniform Distribution Visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import uniform

# Define parameters for the uniform distribution
a = 0     # lower bound
b = 10    # upper bound
size = 1000  # number of samples

# Generate data from continuous uniform distribution
data = np.random.uniform(a, b, size)

# Plot histogram of the generated data
plt.hist(data, bins=30, density=True, alpha=0.6, color='skyblue', edgecolor='black', label='Simulated Data')

# Plot theoretical PDF
x = np.linspace(a, b, 1000)
pdf = uniform.pdf(x, loc=a, scale=b - a)
plt.plot(x, pdf, 'r-', lw=2, label='Theoretical PDF')

# Customize plot
plt.title('Continuous Uniform Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend()
plt.show()


8. Simulate data from a normal distribution and plot its histogram .

#Python Script: Normal Distribution Histogram

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters for the normal distribution
mean = 0       # Mean (μ)
std_dev = 1    # Standard deviation (σ)
size = 1000    # Number of samples

# Generate random samples from a normal distribution
data = np.random.normal(loc=mean, scale=std_dev, size=size)

# Plot histogram
plt.hist(data, bins=30, density=True, alpha=0.6, color='lightgreen', edgecolor='black')
plt.title('Histogram of Simulated Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


9. Write a Python function to calculate Z-scores from a dataset and plot them .

#Python Function: Calculate & Plot Z-scores

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def calculate_and_plot_zscores(data):
    """
    Calculates Z-scores for the dataset and plots them.

    Parameters:
        data (list or np.array): Dataset (numeric values)
    """
    data = np.array(data)
    mean = np.mean(data)
    std = np.std(data)

    # Calculate Z-scores
    z_scores = (data - mean) / std

    # Plot Z-scores
    plt.figure(figsize=(10, 4))
    plt.plot(z_scores, marker='o', linestyle='-', color='teal', label='Z-score')
    plt.axhline(0, color='gray', linestyle='--')
    plt.axhline(1, color='red', linestyle='--', label='Z = ±1')
    plt.axhline(-1, color='red', linestyle='--')
    plt.title('Z-scores of Dataset')
    plt.xlabel('Index')
    plt.ylabel('Z-score')
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.legend()
    plt.show()

    return z_scores

# Example usage
sample_data = np.random.normal(50, 10, 100)
z_scores = calculate_and_plot_zscores(sample_data)


10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution .

#Python Code: CLT Simulation Using Exponential Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
sample_size = 30     # Number of data points in each sample
num_samples = 1000   # Number of samples to draw
lambda_exp = 1.0     # Rate for exponential distribution

# Simulate CLT
sample_means = []

for _ in range(num_samples):
    sample = np.random.exponential(scale=1/lambda_exp, size=sample_size)
    sample_means.append(np.mean(sample))

# Plot histogram of sample means
plt.hist(sample_means, bins=30, density=True, color='skyblue', edgecolor='black')
plt.title(f'Central Limit Theorem (n={sample_size}, samples={num_samples})')
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem .

#Python Script: CLT Simulation Using Normal Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters
population_mean = 100
population_std = 15
sample_size = 30
num_samples = 1000

# Simulate multiple samples and compute their means
sample_means = []

for _ in range(num_samples):
    sample = np.random.normal(loc=population_mean, scale=population_std, size=sample_size)
    sample_mean = np.mean(sample)
    sample_means.append(sample_mean)

# Plot histogram of the sample means
plt.hist(sample_means, bins=30, density=True, color='lightcoral', edgecolor='black')
plt.title(f'CLT Verification (Normal Dist, n={sample_size}, Samples={num_samples})')
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


12. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1) .

#Python Function: Plot Standard Normal Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def plot_standard_normal_distribution():
    """
    Calculates and plots the standard normal distribution (mean = 0, std = 1).
    """
    # Generate x values (covering most of the distribution)
    x = np.linspace(-4, 4, 1000)

    # Calculate the PDF of the standard normal distribution
    pdf = norm.pdf(x, loc=0, scale=1)

    # Plot the PDF
    plt.plot(x, pdf, color='navy', lw=2, label='Standard Normal PDF')
    plt.title('Standard Normal Distribution (μ=0, σ=1)')
    plt.xlabel('Z')
    plt.ylabel('Probability Density')
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.legend()
    plt.fill_between(x, pdf, alpha=0.2, color='navy')  # Shade under the curve
    plt.show()

# Call the function
plot_standard_normal_distribution()


13.  Generate random variables and calculate their corresponding probabilities using the binomial distribution .

#Python Script: Binomial Random Variables and Probabilities

In [None]:
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt

# Parameters
n = 10      # number of trials
p = 0.5     # probability of success
size = 10   # number of random variables to generate

# Step 1: Generate binomial random variables
random_vars = np.random.binomial(n, p, size)
print("Random Binomial Variables:", random_vars)

# Step 2: Calculate PMF (probability) for each unique value
unique_values = np.unique(random_vars)
probabilities = binom.pmf(unique_values, n, p)

# Display results
for val, prob in zip(unique_values, probabilities):
    print(f"P(X = {val}) = {prob:.4f}")

# Optional: Plot the PMF
x = np.arange(0, n + 1)
pmf = binom.pmf(x, n, p)

plt.stem(x, pmf, basefmt=" ", use_line_collection=True)
plt.title(f'Binomial Distribution PMF (n={n}, p={p})')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()


14.  Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal
distribution .

#Python Program: Z-score Calculation and Visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def calculate_z_score(x, mean, std):
    """
    Calculate Z-score for a given data point.

    Parameters:
        x (float): The data point
        mean (float): The mean of the dataset
        std (float): The standard deviation of the dataset

    Returns:
        float: Z-score
    """
    z = (x - mean) / std
    return z

def plot_z_score(z):
    """
    Plot the standard normal distribution and mark the Z-score.

    Parameters:
        z (float): Z-score to plot
    """
    x_vals = np.linspace(-4, 4, 1000)
    y_vals = norm.pdf(x_vals)

    plt.plot(x_vals, y_vals, label='Standard Normal Curve', color='blue')
    plt.fill_between(x_vals, 0, y_vals, where=(x_vals <= z), color='skyblue', alpha=0.5, label=f'Area ≤ Z={z:.2f}')
    plt.axvline(z, color='red', linestyle='--', label=f'Z = {z:.2f}')
    plt.title('Z-score on Standard Normal Distribution')
    plt.xlabel('Z')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.show()

# Example usage
x = 115          # data point
mean = 100       # population mean
std_dev = 15     # standard deviation

z_score = calculate_z_score(x, mean, std_dev)
print(f"Z-score for x = {x}: {z_score:.2f}")

# Plot the Z-score on the standard normal distribution
plot_z_score(z_score)


15. Implement hypothesis testing using Z-statistics for a sample dataset .

#Implement hypothesis testing using Z-statistics for a sample dataset

In [None]:
import numpy as np
from scipy.stats import norm

# Sample data
sample = np.array([101, 98, 95, 100, 104, 99, 97, 103, 96, 100])
sample_mean = np.mean(sample)
sample_size = len(sample)
sample_std = np.std(sample, ddof=1)  # sample standard deviation (with Bessel's correction)

# Known population values
population_mean = 100
population_std = 15  # If known, use this; otherwise, use sample_std for large n

# Z-test calculation
z_stat = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))

# p-value (two-tailed test)
p_value = 2 * (1 - norm.cdf(abs(z_stat)))

# Results
print(f"Sample Mean: {sample_mean:.2f}")
print(f"Z-Statistic: {z_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

# Significance level
alpha = 0.05

# Decision
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")


16. Create a confidence interval for a dataset using Python and interpret the result .

#Python Script: Confidence Interval for the Mean

In [None]:
import numpy as np
import scipy.stats as stats

# Example dataset (sample)
data = np.array([101, 98, 95, 100, 104, 99, 97, 103, 96, 100])

# Sample statistics
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)  # sample standard deviation
n = len(data)

# Confidence level
confidence = 0.95
alpha = 1 - confidence

# Calculate the margin of error using the t-distribution
t_critical = stats.t.ppf(1 - alpha/2, df=n-1)
margin_of_error = t_critical * (sample_std / np.sqrt(n))

# Confidence Interval
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

# Print result
print(f"Sample Mean: {sample_mean:.2f}")
print(f"{int(confidence*100)}% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


17.  Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean .

#Python Code: Generate Normal Data & Confidence Interval

In [None]:
import numpy as np
from scipy import stats

# Step 1: Generate data from a normal distribution
np.random.seed(0)  # For reproducibility
population_mean = 100
population_std = 15
sample_size = 50

data = np.random.normal(loc=population_mean, scale=population_std, size=sample_size)

# Step 2: Calculate sample statistics
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)  # Sample standard deviation
confidence = 0.95
alpha = 1 - confidence

# Step 3: Compute confidence interval using t-distribution
t_critical = stats.t.ppf(1 - alpha / 2, df=sample_size - 1)
margin_of_error = t_critical * (sample_std / np.sqrt(sample_size))

ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

# Step 4: Output
print(f"Sample Mean: {sample_mean:.2f}")
print(f"{int(confidence * 100)}% Confidence Interval for the Mean: ({ci_lower:.2f}, {ci_upper:.2f})")


18. Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution .


#Python Script: PDF of Normal Distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters for the normal distribution
mean = 100
std_dev = 15

# Generate x values (covering most of the distribution)
x = np.linspace(mean - 4*std_dev, mean + 4*std_dev, 1000)

# Calculate PDF
pdf = norm.pdf(x, loc=mean, scale=std_dev)

# Plot the PDF
plt.plot(x, pdf, color='darkgreen', lw=2, label=f'N({mean}, {std_dev}²)')
plt.title('Probability Density Function (Normal Distribution)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend()
plt.fill_between(x, pdf, alpha=0.2, color='green')
plt.show()


19.  Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution .

#Python Script: Poisson CDF Calculation & Interpretation

In [None]:
from scipy.stats import poisson

# Parameters for the Poisson distribution
lambda_rate = 4  # average rate (mean number of occurrences)
x = 6            # value to calculate CDF at

# Calculate CDF
cdf_value = poisson.cdf(x, mu=lambda_rate)

# Display result
print(f"CDF at x = {x} for Poisson(λ = {lambda_rate}) is: {cdf_value:.4f}")


#Optional: Visualize the CDF

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Range of values
x_vals = np.arange(0, 15)
cdf_vals = poisson.cdf(x_vals, mu=lambda_rate)

# Plot
plt.step(x_vals, cdf_vals, where='post', label='Poisson CDF', color='blue')
plt.title('Cumulative Distribution Function (Poisson)')
plt.xlabel('Number of Events (x)')
plt.ylabel('Cumulative Probability')
plt.grid(True, linestyle='--', alpha=0.6)
plt.axvline(x, color='red', linestyle='--', label=f'x = {x}')
plt.axhline(cdf_value, color='green', linestyle='--', label=f'CDF(x={x}) = {cdf_value:.2f}')
plt.legend()
plt.show()


20. Simulate a random variable using a continuous uniform distribution and calculate its expected value .

#Python Script: Continuous Uniform Random Variable & Expected Value

In [None]:
import numpy as np

# Define the interval [a, b]
a = 10
b = 20

# Step 1: Simulate a random variable from Uniform(a, b)
random_value = np.random.uniform(a, b)

# Step 2: Calculate the expected value of Uniform(a, b)
expected_value = (a + b) / 2

# Output
print(f"Simulated Random Value: {random_value:.4f}")
print(f"Expected Value of Uniform({a}, {b}): {expected_value:.2f}")



21. Write a Python program to compare the standard deviations of two datasets and visualize the difference .

#Python Program: Compare Standard Deviations

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Create two datasets (you can also use your own data)
np.random.seed(42)
data1 = np.random.normal(loc=50, scale=5, size=100)   # Dataset with lower std dev
data2 = np.random.normal(loc=50, scale=15, size=100)  # Dataset with higher std dev

# Step 2: Calculate standard deviations
std1 = np.std(data1, ddof=1)
std2 = np.std(data2, ddof=1)

print(f"Standard Deviation of Dataset 1: {std1:.2f}")
print(f"Standard Deviation of Dataset 2: {std2:.2f}")

# Step 3: Visualize the difference

# Histogram comparison
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(data1, bins=15, alpha=0.7, label=f'Dataset 1 (std={std1:.2f})', color='skyblue')
plt.hist(data2, bins=15, alpha=0.7, label=f'Dataset 2 (std={std2:.2f})', color='orange')
plt.title('Histogram Comparison')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.5)

# Boxplot comparison
plt.subplot(1, 2, 2)
plt.boxplot([data1, data2], labels=['Dataset 1', 'Dataset 2'])
plt.title('Boxplot Comparison')
plt.ylabel('Value')
plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()


22. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution .

#Python Script: Range & IQR from Normal Distribution

In [None]:
import numpy as np
import scipy.stats as stats

# Step 1: Generate dataset from a normal distribution
np.random.seed(0)
data = np.random.normal(loc=100, scale=15, size=100)

# Step 2: Calculate range
data_range = np.max(data) - np.min(data)

# Step 3: Calculate IQR (Q3 - Q1)
q1 = np.percentile(data, 25)
q3 = np.percentile(data, 75)
iqr = q3 - q1

# Step 4: Print results
print(f"Range: {data_range:.2f}")
print(f"IQR (Interquartile Range): {iqr:.2f}")


23. Implement Z-score normalization on a dataset and visualize its transformation .

# Python Script: Z-Score Normalization & Visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Generate original dataset (e.g., from normal distribution with offset)
np.random.seed(42)
data = np.random.normal(loc=70, scale=10, size=100)

# Step 2: Apply Z-score normalization
mean = np.mean(data)
std = np.std(data, ddof=1)
z_scores = (data - mean) / std

# Step 3: Plot original vs normalized data
plt.figure(figsize=(12, 5))

# Original data histogram
plt.subplot(1, 2, 1)
sns.histplot(data, kde=True, color='skyblue')
plt.title('Original Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.axvline(mean, color='red', linestyle='--', label=f'Mean = {mean:.2f}')
plt.legend()

# Z-score normalized data histogram
plt.subplot(1, 2, 2)
sns.histplot(z_scores, kde=True, color='salmon')
plt.title('Z-score Normalized Data')
plt.xlabel('Z-score')
plt.ylabel('Frequency')
plt.axvline(0, color='red', linestyle='--', label='Mean = 0')
plt.legend()

plt.tight_layout()
plt.show()


24. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal
distribution.

#Python Function: Skewness & Kurtosis

In [None]:
import numpy as np
from scipy.stats import skew, kurtosis

def analyze_distribution_stats(mean=0, std_dev=1, size=1000):
    # Generate data from a normal distribution
    data = np.random.normal(loc=mean, scale=std_dev, size=size)

    # Calculate skewness and kurtosis
    data_skewness = skew(data)
    data_kurtosis = kurtosis(data, fisher=True)  # Fisher=True gives excess kurtosis

    # Print results
    print(f"Skewness: {data_skewness:.4f}")
    print(f"Kurtosis (excess): {data_kurtosis:.4f}")

    # Interpretation
    if abs(data_skewness) < 0.5:
        print("→ The distribution is approximately symmetric.")
    else:
        print("→ The distribution is skewed.")

    if abs(data_kurtosis) < 0.5:
        print("→ The distribution has normal-like tails (mesokurtic).")
    elif data_kurtosis > 0:
        print("→ The distribution has heavy tails (leptokurtic).")
    else:
        print("→ The distribution has light tails (platykurtic).")

# Example usage
analyze_distribution_stats(mean=100, std_dev=15, size=1000)


# NOTE = Questions numbring is not sequence wise. So please check questions numbring first


I hope You understand what i try to say you . .