# STATISTICS ADVANCE PART 1

1. What is a random variable in probability theory?
-  A random variable is a numerical value determined by the outcome of a random experiment. It maps outcomes of a sample space to real numbers.

2. What are the types of random variables?

- -  Discrete Random Variable – Takes countable values (e.g., number of heads in 5 coin tosses).

 - Continuous Random Variable – Takes any value within a range (e.g., height, weight).

3. Difference between discrete and continuous distributions:

- - Discrete: Probability is assigned to specific values; represented by Probability Mass Function (PMF).

 - Continuous: Probability is spread over intervals; represented by Probability Density Function (PDF).

4. What are probability distribution functions (PDF)?
-  For continuous variables, PDF describes the relative likelihood of the variable taking on a particular value. The area under the curve over an interval gives the probability.

5. How do CDFs differ from PDFs?

-  - PDF: Shows likelihood density at each point (continuous) or probability at each value (discrete).

 - CDF: Shows cumulative probability that the variable is less than or equal to a certain value.

6. What is a discrete uniform distribution?
-  A distribution where all possible discrete outcomes have equal probability.

7. Key properties of a Bernoulli distribution:

- Two possible outcomes (success/failure).

 - Parameter
𝑝
p = probability of success.

 - Mean =
𝑝
p, Variance =
𝑝
(
1
−
𝑝
)
p(1−p).

8. Binomial distribution & usage:
-  Represents number of successes in
𝑛
n independent Bernoulli trials with success probability
𝑝
p. Used in modeling scenarios like number of defective items in a batch.

9. Poisson distribution & application:
-  Models the number of events in a fixed interval given a constant average rate and independence of events. Example: Number of customer arrivals per hour.

10. Continuous uniform distribution:
- - A continuous variable equally likely to take any value between two bounds
𝑎
a and
𝑏
 - b. PDF is constant over [a, b].

 11. Characteristics of a normal distribution:

- - Symmetrical, bell-shaped curve.

 - Mean = Median = Mode.

 - Defined by mean (μ) and standard deviation (σ).

 - Total area under curve = 1.

12. Standard normal distribution & importance:
-  A normal distribution with μ = 0 and σ = 1. Used to standardize values (Z-scores) for comparison.

13. Central Limit Theorem (CLT) & importance:
-  States that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of population distribution. Essential for inference.

14. CLT relation to normal distribution:
-  Even if the original data isn’t normal, the mean of sufficiently large samples follows a normal distribution.

15. Application of Z statistics in hypothesis testing:
-  Z-statistics help determine whether to reject a null hypothesis by comparing the sample mean to the population mean using standard deviation.

16. Z-score calculation & meaning:
- Formula:
𝑍
=
𝑋
−
𝜇
𝜎
Z=
σ
X−μ
​

    Represents how many standard deviations a value is from the mean.

17. Point estimates & interval estimates:

- - Point estimate: Single value estimate of a population parameter.

 - Interval estimate: Range of values (confidence interval) within which the parameter is likely to lie.

18. Significance of confidence intervals:
-  They provide a range with a specified probability (e.g., 95%) of containing the true population parameter.

19. Relationship between Z-score & confidence interval:
-  Z-scores determine the margin of error in a confidence interval calculation.

20. Using Z-scores to compare distributions:
-  By standardizing, Z-scores allow comparison of values from different distributions.

21. Assumptions for CLT:

-  - Random sampling.

 - Independent observations.

 - Finite variance.

 - Large enough sample size.

22. Expected value in a probability distribution:
-  The long-run average value of a random variable, calculated as
𝐸
(
𝑋
)
=
∑
𝑥
𝑃
(
𝑥
)
E(X)=∑xP(x) (discrete) or
𝐸
(
𝑋
)
=
∫
𝑥
𝑓
(
𝑥
)
𝑑
𝑥
E(X)=∫xf(x)dx (continuous).

23. Probability distribution relation to expected outcome:
-  The expected value is the weighted average of all possible outcomes, with probabilities from the distribution as weights.

Q1. Generate a random variable and display its value

In [None]:
import random

# Generate a random integer between 1 and 100
random_variable = random.randint(1, 100)
print("Random Variable Value:", random_variable)


Q2. Generate a discrete uniform distribution and plot PMF

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Discrete uniform distribution (like rolling a dice)
values = np.arange(1, 7)
probabilities = np.ones(len(values)) / len(values)

plt.stem(values, probabilities, use_line_collection=True)
plt.xlabel("Value")
plt.ylabel("Probability")
plt.title("Discrete Uniform Distribution (PMF)")
plt.show()


Q3. Bernoulli distribution PDF function

In [None]:
def bernoulli_pdf(x, p):
    if x not in [0, 1]:
        return 0
    return p**x * (1 - p)**(1 - x)

p = 0.6
print("P(X=1):", bernoulli_pdf(1, p))
print("P(X=0):", bernoulli_pdf(0, p))


Q4. Simulate binomial distribution (n=10, p=0.5) and plot histogram

In [None]:
from scipy.stats import binom

n, p = 10, 0.5
data = binom.rvs(n, p, size=1000)

plt.hist(data, bins=np.arange(-0.5, n+1.5, 1), edgecolor='black')
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.title("Binomial Distribution Simulation")
plt.show()


Q5. Create and visualize Poisson distribution

In [None]:
from scipy.stats import poisson

mu = 3
x = np.arange(0, 10)
pmf = poisson.pmf(x, mu)

plt.stem(x, pmf, use_line_collection=True)
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.title("Poisson Distribution")
plt.show()


Q6. Calculate and plot CDF of a discrete uniform distribution

In [None]:
cdf = np.cumsum(probabilities)

plt.step(values, cdf, where='post')
plt.xlabel("Value")
plt.ylabel("Cumulative Probability")
plt.title("CDF of Discrete Uniform Distribution")
plt.show()


Q7. Generate and visualize continuous uniform distribution

In [None]:
from scipy.stats import uniform

a, b = 0, 10
data_uniform = uniform.rvs(a, b-a, size=1000)

plt.hist(data_uniform, bins=20, edgecolor='black')
plt.title("Continuous Uniform Distribution")
plt.show()


Q8. Simulate data from a normal distribution and plot histogram

In [None]:
mu, sigma = 0, 1
data_normal = np.random.normal(mu, sigma, 1000)

plt.hist(data_normal, bins=30, edgecolor='black')
plt.title("Normal Distribution")
plt.show()


Q9. Calculate Z-scores from a dataset and plot them

In [None]:
from scipy.stats import zscore

data = np.random.randint(50, 100, size=10)
z_scores = zscore(data)

plt.plot(data, z_scores, marker='o')
plt.xlabel("Data")
plt.ylabel("Z-score")
plt.title("Z-scores of Data")
plt.show()


Q10. Implement CLT for a non-normal distribution

In [None]:
samples = 1000
sample_size = 30
means = []

for _ in range(samples):
    sample = np.random.exponential(scale=2, size=sample_size)
    means.append(np.mean(sample))

plt.hist(means, bins=30, edgecolor='black')
plt.title("Central Limit Theorem Simulation")
plt.show()


Q11. Simulate multiple samples from a normal distribution and verify CLT

In [None]:
import numpy as np
import matplotlib.pyplot as plt

sample_means = []
for _ in range(1000):
    sample = np.random.normal(loc=50, scale=10, size=30)
    sample_means.append(np.mean(sample))

plt.hist(sample_means, bins=30, edgecolor='black')
plt.title("CLT Verification from Normal Distribution")
plt.show()


Q12. Plot the standard normal distribution (mean=0, std=1)

In [None]:
from scipy.stats import norm

x = np.linspace(-4, 4, 1000)
plt.plot(x, norm.pdf(x, 0, 1))
plt.title("Standard Normal Distribution")
plt.xlabel("Z")
plt.ylabel("Probability Density")
plt.show()


Q13. Generate random variables and calculate probabilities using the binomial distribution

In [None]:
from scipy.stats import binom

n, p = 10, 0.5
k = np.arange(0, n+1)
binom_probs = binom.pmf(k, n, p)

print("Binomial Probabilities:", binom_probs)


Q14. Calculate the Z-score for a given data point

In [None]:
value = 75
mu = 70
sigma = 5

z_score = (value - mu) / sigma
print("Z-score:", z_score)


Q15. Hypothesis testing using Z-statistics

In [None]:
import numpy as np

# Example values
pop_mean = 100
sigma = 15
sample_mean = 105
n = 50

z_stat = (sample_mean - pop_mean) / (sigma / np.sqrt(n))
print("Z-statistic:", z_stat)


Q16. Create a confidence interval for a dataset

In [None]:
data = np.random.normal(50, 5, 100)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))
z_value = 1.96  # 95% confidence level

ci_lower = mean - z_value * std_err
ci_upper = mean + z_value * std_err

print(f"95% Confidence Interval: ({ci_lower}, {ci_upper})")


Q17. Generate normal distribution data and calculate confidence interval for its mean

In [None]:
data = np.random.normal(100, 15, 200)
mean = np.mean(data)
std_err = np.std(data, ddof=1) / np.sqrt(len(data))
z_value = 1.96

ci_lower = mean - z_value * std_err
ci_upper = mean + z_value * std_err

print(f"Mean: {mean}")
print(f"95% Confidence Interval: ({ci_lower}, {ci_upper})")


Q18. Calculate and visualize the PDF of a normal distribution

In [None]:
x = np.linspace(-4, 4, 1000)
plt.plot(x, norm.pdf(x, 0, 1))
plt.title("PDF of Normal Distribution")
plt.xlabel("X")
plt.ylabel("Probability Density")
plt.show()


Q19. Calculate and interpret the CDF of a Poisson distribution

In [None]:
from scipy.stats import poisson

x = np.arange(0, 10)
cdf = poisson.cdf(x, mu=3)

plt.step(x, cdf, where='post')
plt.title("CDF of Poisson Distribution")
plt.xlabel("X")
plt.ylabel("Cumulative Probability")
plt.show()


Q20. Simulate a continuous uniform distribution and calculate its expected value

In [None]:
from scipy.stats import uniform

data_uniform = uniform.rvs(0, 10, size=1000)
expected_value = np.mean(data_uniform)
print("Expected Value:", expected_value)


Q21. Compare the standard deviations of two datasets and visualize

In [None]:
data1 = np.random.normal(50, 5, 100)
data2 = np.random.normal(50, 10, 100)

std1 = np.std(data1)
std2 = np.std(data2)

print("Std Dev Dataset 1:", std1)
print("Std Dev Dataset 2:", std2)

plt.bar(['Dataset 1', 'Dataset 2'], [std1, std2], color=['blue', 'orange'])
plt.title("Standard Deviation Comparison")
plt.show()


Q22. Calculate the range and interquartile range (IQR) of a dataset

In [None]:
data = np.random.normal(50, 5, 1000)

data_range = np.ptp(data)
q75, q25 = np.percentile(data, [75, 25])
iqr = q75 - q25

print("Range:", data_range)
print("IQR:", iqr)


Q23. Calculate skewness and kurtosis of a dataset

In [None]:
from scipy.stats import skew, kurtosis

data = np.random.normal(0, 1, 1000)

print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))
