---
---
#Statistics Advanced 1 Theoretical
---
---
###1. What is a random variable in probability theory?
A random variable is a numerical outcome of a random process. It represents possible values from an experiment along with their probabilities.

Example:

Rolling a die → The outcome (1-6) is a random variable.
Measuring temperature → The temperature in °C is a random variable.
Random variables are categorized as discrete (countable values) and continuous (infinite values in a range).

---
###2. What are the types of random variables?
There are two main types of random variables:

Discrete Random Variable – Takes countable values (e.g., number of customers arriving at a store).                  
Continuous Random Variable – Takes infinite values within a range (e.g., height of students in a class).

---
###3. What is the difference between discrete and continuous distributions?
Difference Between Discrete and Continuous Distributions

| Feature               | Discrete Distribution            | Continuous Distribution           |
|-----------------------|--------------------------------|----------------------------------|
| **Definition**       | Takes countable values        | Takes infinite values in a range |
| **Probability Computation** | Summation (ΣP(X))        | Integration (∫f(x)dx)            |
| **Example**         | Number of customers in a store | Height of students in a class   |
| **Common Distributions** | Binomial, Poisson          | Normal, Exponential             |
| **Probability Value** | Exact probability \( P(X = x) \) | Probability of a range \( P(a \leq X \leq b) \) |

Example: Binomial Distribution (Discrete) vs. Normal Distribution (Continuous).

---
###4. What are probability distribution functions (PDF)?
A Probability Distribution Function (PDF) describes the likelihood of a random variable taking specific values.

For discrete variables, it's called the Probability Mass Function (PMF):
P(X=x)                                                              
For continuous variables, it's the Probability Density Function (PDF):
f(x)                                                         
The probability is obtained by integrating over an interval:
P(a≤X≤b)=∫
a
b
​
 f(x)dx
Example: The normal distribution's PDF is:
𝑓
(
𝑥
)
=
1
𝜎
2
𝜋
𝑒
−
(
𝑥
−
𝜇
)
2
2
𝜎
2
f(x)=
σ
2π
​

1
​
 e
−
2σ
2

(x−μ)
2

---
###5. How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?
The CDF gives the probability that a random variable is less than or equal to a given value:
F(x)=P(X≤x)
It is obtained by integrating the PDF:
F(x)=∫
−∞
x
​
 f(t)dt

✅ Key Difference:

PDF describes density of probability.                              
CDF describes cumulative probability (total probability up to a point).     
Example: For a normal distribution, the CDF tells us the probability of a value being below a threshold.

---
###6. What is a discrete uniform distribution?
A discrete uniform distribution is one where all possible outcomes have equal probability.

Example: A fair die roll (1-6) follows a uniform distribution, with probability:
P(X=x)=
6
1
​
 ,x∈{1,2,3,4,5,6}                 
✅ Used in random sampling and simulations.

---
###7. What are the key properties of a Bernoulli distribution?
A Bernoulli distribution models a single trial with two outcomes:

Success (
X=1) with probability
p.

Failure(
X=0) with probability
1−p.

Properties:

Mean:
E[X]=p

Variance:
Var(X)=p(1−p)

Example:


Flipping a coin:
P(Heads)=0.5,P(Tails)=0.5.

---
###8. What is the binomial distribution, and how is it used in probability?
A binomial distribution represents the number of successes in n independent Bernoulli trials.

Formula:
P(X=k)=(
k
n
​
 )p
k
 (1−p)
n−k

where
k is the number of successes,
p is success probability, and
n is trials.

Example:

Probability of getting 3 heads in 5 coin flips.

✅ Used in A/B testing, quality control, and risk analysis.

---
###9. What is the Poisson distribution, and where is it applied?
The Poisson distribution models the number of events occurring in a fixed time when events happen independently at a constant rate.

Formula:
P(X=k)=
k!
e
−λ
 λ
k

where
λ is the average number of events per unit time.

Example:

Number of customer arrivals at a store per hour.
Number of defects in a batch of products.

✅ Used in queue theory, network traffic, and event prediction.

---
###10. What is a continuous uniform distribution?
A continuous uniform distribution means all values in a range [a, b] have equal probability density.

PDF:
f(x)=
1/
b−a
 ,a≤x≤b
Example:

Generating random numbers between 0 and 1 in Python.               
✅ Used in simulation, Monte Carlo methods.

---
###11. What are the characteristics of a normal distribution?
The normal distribution is bell-shaped and symmetric.

Properties:

Mean = Median = Mode.                                                   
Defined by mean (
μ) and standard deviation (
σ).                                                                                                              
68-95-99.7 Rule:                                                     
68% of values in 1σ range.                                             
95% in 2σ.                                                          
99.7% in 3σ.                                                             
✅ Used in: Natural processes (heights, IQ scores, stock returns).

---
###12. What is the standard normal distribution, and why is it important?
A standard normal distribution has:

Mean
μ=0
Standard deviation
σ=1.                                                                
✅ Used for Z-score calculations to compare data from different normal distributions.

---
###13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?
The CLT states that for a large enough sample size, the sample mean follows a normal distribution, regardless of the original population distribution.                        

Why is it important?

Enables hypothesis testing and confidence intervals.                 
Justifies using the normal distribution for many real-world datasets.

---
###14. How does the Central Limit Theorem relate to the normal distribution?
The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean will approach a normal distribution as the sample size increases, regardless of the original population distribution.

Even if the population is not normal, the sample means follow a normal distribution when
n is large (
n≥30 is a common rule).                               
This property allows us to apply normal-based inferential statistics (like confidence intervals and hypothesis tests).

---
###15. What is the application of Z statistics in hypothesis testing?
Z-statistics (Z-tests) are used when:
The sample size is large (
n≥30).
The population variance is known or assumed.                       
Applications in Hypothesis Testing:

One-Sample Z-Test → To test if the sample mean differs from a known population mean.                                                 
Two-Sample Z-Test → To compare the means of two independent samples.      
Z-Test for Proportions → Used when comparing proportions in categorical data.

---
###16. How do you calculate a Z-score, and what does it represent?
The Z-score (also called the standard score) measures how many standard deviations a data point is from the mean.                              
Formula:
Z=
σ
X−μ
​

Where:
X = observed value                                               
μ = population mean                                                      
σ = standard deviation                                              
Interpretation:                                                           
Z>0 → Value is above the mean                                             
Z<0 → Value is below the mean                                             
∣Z∣>2 → Value is unusual (outlier possibility)

---
###17. What are point estimates and interval estimates in statistics?
Point Estimate → A single value estimate of a population parameter.         
Example: Sample mean (
x
ˉ
 ) is a point estimate of the population mean (
μ).
Interval Estimate → A range of values where the true parameter is expected to lie, often with a confidence level.                    

Example: Confidence intervals provide an estimate range like
50±2 (48 to 52).

---
###18. What is the significance of confidence intervals in statistical analysis?
A confidence interval (CI) gives a range of plausible values for a population parameter instead of a single value.

Formula for CI (for mean):
CI=
x
ˉ
 ±Z×
n
σ


Where:
x
ˉ = Sample mean

Z = Z-score from standard normal table

σ = Population standard deviation

n = Sample size

---
###19. What is the relationship between a Z-score and a confidence interval?
The Z-score determines the critical value for constructing a confidence interval.            

Common Z-scores for CIs:

90% CI →
Z=1.645

95% CI →
Z=1.96

99% CI →
Z=2.57

Relationship:

Higher Z-score → Wider confidence interval → More certainty but less precision.                                                             
Lower Z-score → Narrower confidence interval → Less certainty but more precision.   

---
###20. How are Z-scores used to compare different distributions?
Z-scores standardize different distributions so they can be compared on the same scale.

Uses:

Comparing student test scores from different exams with different means.
Comparing financial returns across different investment options.

---
###21. What are the assumptions for applying the Central Limit Theorem?
Key Assumptions:

Random Sampling → Data should be randomly selected.                   
Independence → Each sample should be independent of the others.          
Sample Size →
n≥30 for CLT to hold.                                                   
Finite Variance → Population should have finite variance.

---
###22. What is the concept of expected value in a probability distribution?
The expected value (EV) represents the long-run average outcome of a random variable.


Formula:
E(X)=∑XP(X)

---
###23. How does a probability distribution relate to the expected outcome of a random variable?
The probability distribution defines the likelihood of different outcomes, and the expected value gives the weighted average outcome based on those probabilities.

---
---
#Practical Answers:-
---
---
###1. Write a Python program to generate a random variable and display its value.



In [None]:
import numpy as np

random_variable = np.random.rand()  # Generates a random value between 0 and 1
print("Random Variable:", random_variable)

###2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import randint

low, high, size = 1, 6, 1000  # Rolling a fair die (1 to 6)
data = randint.rvs(low, high + 1, size=size)

# PMF Plot
sns.histplot(data, bins=range(low, high + 2), discrete=True, stat="probability", edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Probability')
plt.title('PMF of Discrete Uniform Distribution')
plt.show()

###3. Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution.

In [None]:
from scipy.stats import bernoulli

def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

p = 0.5  # Probability of success
x_values = [0, 1]

for x in x_values:
    print(f"P(X={x}) = {bernoulli_pdf(p, x)}")

###4. Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

n, p = 10, 0.5
data = binom.rvs(n, p, size=1000)

plt.hist(data, bins=range(n+2), density=True, alpha=0.75, edgecolor="black")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.show()

###5. Create a Poisson distribution and visualize it using Python.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

lam = 4  # Average occurrences
data = poisson.rvs(lam, size=1000)

plt.hist(data, bins=range(min(data), max(data) + 2), density=True, alpha=0.7, edgecolor='black')
plt.xlabel("Occurrences")
plt.ylabel("Probability")
plt.title("Poisson Distribution (λ=4)")
plt.show()

###6. Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

low, high = 1, 6
x = np.arange(low, high + 1)
cdf_values = randint.cdf(x, low, high + 1)

plt.step(x, cdf_values, where='post', marker="o", color="red")
plt.xlabel("Values")
plt.ylabel("Cumulative Probability")
plt.title("CDF of Discrete Uniform Distribution")
plt.grid()
plt.show()

###7. Generate a continuous uniform distribution using NumPy and visualize it.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.random.uniform(0, 1, 1000)  # Uniform distribution between 0 and 1
plt.hist(data, bins=30, density=True, alpha=0.7, edgecolor="black")
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Continuous Uniform Distribution")
plt.show()

###8. Simulate data from a normal distribution and plot its histogram.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 0, 1  # Mean and standard deviation
data = np.random.normal(mu, sigma, 1000)

plt.hist(data, bins=30, density=True, alpha=0.7, edgecolor="black")
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Histogram of Normally Distributed Data")
plt.show()

###9.  Write a Python function to calculate Z-scores from a dataset and plot them.

In [None]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

data = np.random.normal(50, 10, 100)  # Sample dataset
z_scores = stats.zscore(data)

plt.hist(z_scores, bins=20, density=True, alpha=0.7, edgecolor="black")
plt.xlabel("Z-Score")
plt.ylabel("Density")
plt.title("Z-Scores of the Dataset")
plt.show()

print("Z-Scores:\n", z_scores)

###10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

sample_size = 30
num_samples = 1000

# Generating skewed distribution (exponential)
original_data = np.random.exponential(scale=2, size=10000)

# Applying CLT: Taking means of samples
sample_means = [np.mean(np.random.choice(original_data, sample_size)) for _ in range(num_samples)]

# Plotting original and sample mean distributions
plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.hist(original_data, bins=30, density=True, alpha=0.7, edgecolor="black")
plt.title("Original Skewed Distribution")

plt.subplot(1,2,2)
plt.hist(sample_means, bins=30, density=True, alpha=0.7, edgecolor="black")
plt.title("Sample Means (CLT Applied)")

plt.show()

###11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 50, 10  # Mean and Standard Deviation
sample_size = 30
num_samples = 1000

# Generate normal samples
sample_means = [np.mean(np.random.normal(mu, sigma, sample_size)) for _ in range(num_samples)]

plt.hist(sample_means, bins=30, density=True, alpha=0.7, edgecolor="black")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.title("Sample Means Distribution (CLT Verification)")
plt.show()

###12. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

x = np.linspace(-4, 4, 1000)
pdf_values = norm.pdf(x, 0, 1)

plt.plot(x, pdf_values, label="Standard Normal Distribution")
plt.xlabel("Z-Score")
plt.ylabel("Probability Density")
plt.title("Standard Normal Distribution")
plt.legend()
plt.show()

###13. Generate random variables and calculate their corresponding probabilities using the binomial distribution.

In [None]:
from scipy.stats import binom

n, p = 10, 0.5
x_values = np.arange(0, n + 1)
probabilities = binom.pmf(x_values, n, p)

for x, prob in zip(x_values, probabilities):
    print(f"P(X={x}) = {prob:.4f}")

###14. Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution.

In [None]:
import numpy as np
from scipy.stats import norm

def calculate_z_score(x, mu, sigma):
    return (x - mu) / sigma

x_value = 60
mu, sigma = 50, 10

z_score = calculate_z_score(x_value, mu, sigma)
p_value = norm.cdf(z_score)

print(f"Z-score: {z_score:.2f}, Probability: {p_value:.4f}")

###15. Implement hypothesis testing using Z-statistics for a sample dataset.

In [None]:
import numpy as np
from scipy.stats import norm

sample_mean = 52
population_mean = 50
sigma = 10
n = 30

z_score = (sample_mean - population_mean) / (sigma / np.sqrt(n))
p_value = 2 * (1 - norm.cdf(abs(z_score)))

print(f"Z-score: {z_score:.2f}, P-value: {p_value:.4f}")
if p_value < 0.05:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

###16. Create a confidence interval for a dataset using Python and interpret the result.

In [None]:
import numpy as np
from scipy.stats import norm

data = np.random.normal(50, 10, 30)
mean = np.mean(data)
std_error = np.std(data, ddof=1) / np.sqrt(len(data))

z_critical = norm.ppf(0.975)
margin_of_error = z_critical * std_error
conf_interval = (mean - margin_of_error, mean + margin_of_error)

print(f"95% Confidence Interval: {conf_interval}")

###17. Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean.

In [None]:
data = np.random.normal(50, 10, 100)
mean, std_dev = np.mean(data), np.std(data, ddof=1)
std_error = std_dev / np.sqrt(len(data))

z_critical = norm.ppf(0.975)
margin_of_error = z_critical * std_error
conf_interval = (mean - margin_of_error, mean + margin_of_error)

print(f"Mean: {mean:.2f}, 95% Confidence Interval: {conf_interval}")

###18. Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

mu, sigma = 50, 10
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)
pdf_values = norm.pdf(x, mu, sigma)

plt.plot(x, pdf_values, label="Normal PDF")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.title("Probability Density Function of Normal Distribution")
plt.legend()
plt.show()

###19. Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution.

In [None]:
from scipy.stats import poisson

lambda_value = 4
x_values = np.arange(0, 10)
cdf_values = poisson.cdf(x_values, lambda_value)

for x, prob in zip(x_values, cdf_values):
    print(f"P(X ≤ {x}) = {prob:.4f}")

###20. Simulate a random variable using a continuous uniform distribution and calculate its expected value.

In [None]:
data = np.random.uniform(0, 10, 1000)
expected_value = np.mean(data)

print(f"Expected Value: {expected_value:.2f}")

###21. Write a Python program to compare the standard deviations of two datasets and visualize the difference.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data1 = np.random.normal(50, 5, 1000)
data2 = np.random.normal(50, 15, 1000)

std1, std2 = np.std(data1), np.std(data2)

plt.hist(data1, bins=30, alpha=0.5, label=f"Std Dev = {std1:.2f}")
plt.hist(data2, bins=30, alpha=0.5, label=f"Std Dev = {std2:.2f}")
plt.legend()
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Comparison of Standard Deviations")
plt.show()

###22. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution.

In [None]:
data = np.random.normal(50, 10, 100)

range_value = np.ptp(data)
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1

print(f"Range: {range_value:.2f}, IQR: {iqr:.2f}")

###23. Implement Z-score normalization on a dataset and visualize its transformation.

In [None]:
from scipy.stats import zscore

data = np.random.normal(50, 10, 100)
normalized_data = zscore(data)

plt.hist(normalized_data, bins=30, alpha=0.7, edgecolor="black")
plt.xlabel("Z-Score")
plt.ylabel("Density")
plt.title("Z-Score Normalized Data")
plt.show()

###24. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution.

In [None]:
from scipy.stats import skew, kurtosis
data = np.random.normal(50, 10, 1000)
skewness = skew(data)
kurt = kurtosis(data)

print(f"Skewness: {skewness:.2f}, Kurtosis: {kurt:.2f}")