
### 1. **What is a random variable in probability theory?**
   A random variable is a numerical outcome of a random process or experiment. It assigns numerical values to each possible outcome of a random event.

---

### 2. **What are the types of random variables?**
   - **Discrete Random Variables**: Take specific, countable values (e.g., 1, 2, 3).
   - **Continuous Random Variables**: Take any value within a given range (e.g., 1.5, 2.7, π).

---

### 3. **What is the difference between discrete and continuous distributions?**
   - **Discrete distributions** represent probabilities of discrete random variables (e.g., Binomial, Poisson).
   - **Continuous distributions** represent probabilities over continuous ranges, described by probability density functions (e.g., Normal, Uniform).

---

### 4. **What are probability distribution functions (PDF)?**
   A PDF describes the likelihood of a continuous random variable taking a specific value within a range. For discrete random variables, this is referred to as the probability mass function (PMF).

---

### 5. **How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?**
   - **PDF**: Shows the probability of a variable taking a specific value (for discrete) or falling within a small range (for continuous).
   - **CDF**: Represents the cumulative probability that a variable will be less than or equal to a given value.

---

### 6. **What is a discrete uniform distribution?**
   A discrete uniform distribution assigns equal probabilities to all possible outcomes in a finite set (e.g., rolling a fair die).

---

### 7. **What are the key properties of a Bernoulli distribution?**
   - Only two outcomes: success (1) or failure (0).
   - Probability of success: \( P(X=1) = p \), and failure: \( P(X=0) = 1-p \).
   - Mean: \( p \), Variance: \( p(1-p) \).

---

### 8. **What is the binomial distribution, and how is it used in probability?**
   The binomial distribution models the number of successes in \( n \) independent trials with probability \( p \) of success in each trial.  
   - Mean: \( np \), Variance: \( np(1-p) \).

---

### 9. **What is the Poisson distribution and where is it applied?**
   The Poisson distribution models the number of events occurring in a fixed interval of time or space, assuming a constant average rate and independence of events.  
   Example: Counting the number of calls at a call center.

---

### 10. **What is a continuous uniform distribution?**
   A continuous uniform distribution assigns equal probabilities over an interval \([a, b]\), meaning the probability is constant across the range.

---

### 11. **What are the characteristics of a normal distribution?**
   - Symmetrical and bell-shaped.
   - Defined by mean (\( \mu \)) and standard deviation (\( \sigma \)).
   - Total area under the curve equals 1.

---

### 12. **What is the standard normal distribution, and why is it important?**
   - A standard normal distribution is a normal distribution with mean = 0 and standard deviation = 1.
   - It allows for standardizing values (Z-scores) and is widely used in hypothesis testing and confidence intervals.

---

### 13. **What is the Central Limit Theorem (CLT), and why is it critical in statistics?**
   - The CLT states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution.
   - It's critical because it enables statistical inference using normal approximation.

---

### 14. **How does the Central Limit Theorem relate to the normal distribution?**
   The CLT shows that averages of independent, identically distributed variables form a normal distribution as the sample size grows.

---

### 15. **What is the application of Z statistics in hypothesis testing?**
   Z statistics are used to compare a sample mean to a population mean or between two sample means under the assumption of normality.

---

### 16. **How do you calculate a Z-score, and what does it represent?**
   \( Z = \frac{(X - \mu)}{\sigma} \), where:
   - \( X \) = observed value
   - \( \mu \) = mean
   - \( \sigma \) = standard deviation  
   It represents how many standard deviations a data point is from the mean.

---

### 17. **What are point estimates and interval estimates in statistics?**
   - **Point Estimate**: A single value (e.g., mean, proportion) as an estimate of a parameter.
   - **Interval Estimate**: A range of values (confidence interval) likely containing the parameter.

---

### 18. **What is the significance of confidence intervals in statistical analysis?**
   Confidence intervals provide a range of values within which the true population parameter is likely to lie, with a certain level of confidence (e.g., 95%).

---

### 19. **What is the relationship between a Z-score and a confidence interval?**
   Z-scores determine the critical values for confidence intervals, based on the standard normal distribution.

---

### 20. **How are Z-scores used to compare different distributions?**
   Z-scores standardize data from different distributions, allowing direct comparison by expressing values in terms of standard deviations from their respective means.

---

### 21. **What are the assumptions for applying the Central Limit Theorem?**
   - Independent and identically distributed random variables.
   - Finite variance and mean.
   - Large enough sample size for approximation to normality.

---

### 22. **What is the concept of expected value in a probability distribution?**
   The expected value is the average outcome of a random variable if an experiment is repeated infinitely.  
   \( E(X) = \sum [x \cdot P(x)] \) for discrete variables.

---

### 23. **How does a probability distribution relate to the expected outcome of a random variable?**
   The expected value is calculated using the probability distribution, showing the long-term average of the random variable.

PRACTICAL QUESTIONS


In [None]:
import random

random_variable = random.random()
print("Random Variable:", random_variable)


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters for discrete uniform distribution
low, high, size = 1, 6, 1000  # Simulating a fair die

# Generate random variables
data = np.random.randint(low, high + 1, size)

# PMF calculation
values, counts = np.unique(data, return_counts=True)
pmf = counts / size

# Plotting the PMF
plt.bar(values, pmf, color='skyblue')
plt.title("PMF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.show()


In [None]:
def bernoulli_pdf(p, x):
    if x not in [0, 1]:
        return 0
    return p if x == 1 else 1 - p

# Test the Bernoulli PDF
p = 0.7  # Probability of success
x = 1    # Outcome
print("PDF of Bernoulli Distribution:", bernoulli_pdf(p, x))


In [None]:
from scipy.stats import binom

# Parameters for Binomial distribution
n, p = 10, 0.5
data = binom.rvs(n, p, size=1000)

# Plot the histogram
plt.hist(data, bins=n+1, density=True, color='orange', alpha=0.7)
plt.title("Binomial Distribution Histogram")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.show()


In [None]:
from scipy.stats import poisson

# Parameters for Poisson distribution
mu = 3
data = poisson.rvs(mu, size=1000)

# Plotting the histogram
plt.hist(data, bins=max(data)-min(data)+1, density=True, alpha=0.7, color='green')
plt.title("Poisson Distribution")
plt.xlabel("Number of Events")
plt.ylabel("Probability")
plt.show()


In [None]:
# CDF of discrete uniform distribution
data_sorted = np.sort(data)
cdf = np.arange(1, len(data_sorted)+1) / len(data_sorted)

# Plot CDF
plt.step(data_sorted, cdf, color='purple', where='post')
plt.title("CDF of Discrete Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Cumulative Probability")
plt.show()


In [None]:
from scipy.stats import uniform

# Continuous uniform distribution
data = uniform.rvs(size=1000)
plt.hist(data, bins=20, density=True, color='blue', alpha=0.7)
plt.title("Continuous Uniform Distribution")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()


In [None]:
data = np.random.normal(loc=0, scale=1, size=1000)
plt.hist(data, bins=30, density=True, color='pink', alpha=0.7)
plt.title("Normal Distribution Histogram")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()


In [None]:
from scipy.stats import zscore

data = np.random.normal(loc=50, scale=10, size=1000)
z_scores = zscore(data)

# Plotting
plt.hist(z_scores, bins=30, color='red', alpha=0.7)
plt.title("Z-scores of the Dataset")
plt.xlabel("Z-score")
plt.ylabel("Frequency")
plt.show()


In [None]:
# Generating non-normal data
data = np.random.exponential(scale=2, size=1000)

# Sampling means
sample_means = [np.mean(np.random.choice(data, 30)) for _ in range(1000)]

# Plot sample means
plt.hist(sample_means, bins=30, color='cyan', alpha=0.7)
plt.title("Central Limit Theorem Demonstration")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.show()


In [None]:
import numpy as np
from scipy.stats import norm

# Generate data from a normal distribution
data = np.random.normal(loc=50, scale=10, size=1000)

# Calculate the mean and standard error
mean = np.mean(data)
std_error = np.std(data, ddof=1) / np.sqrt(len(data))

# Confidence interval (95%)
confidence_level = 0.95
z_value = norm.ppf((1 + confidence_level) / 2)
margin_of_error = z_value * std_error
ci_lower, ci_upper = mean - margin_of_error, mean + margin_of_error

print(f"Mean: {mean:.2f}")
print(f"Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Standard normal distribution parameters
x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, loc=0, scale=1)

# Plotting the standard normal distribution
plt.plot(x, pdf, color='blue')
plt.title("Standard Normal Distribution (Mean=0, Std=1)")
plt.xlabel("Z-score")
plt.ylabel("Probability Density")
plt.show()


In [None]:
from scipy.stats import binom

# Parameters
n, p = 10, 0.5

# Generate random variables and calculate probabilities
data = binom.rvs(n, p, size=1000)
probabilities = binom.pmf(range(n + 1), n, p)

# Output
print("Random Variables:", data[:10])
print("Probabilities:", probabilities)


In [None]:
import numpy as np
from scipy.stats import zscore

# Generate a dataset
data = np.random.normal(loc=50, scale=10, size=100)

# Calculate Z-score
z_scores = zscore(data)

# Compare a given data point
data_point = 55
z_score_point = (data_point - np.mean(data)) / np.std(data)

print(f"Z-scores of dataset: {z_scores[:5]}")
print(f"Z-score of data point {data_point}: {z_score_point:.2f}")


In [None]:
import numpy as np
from scipy.stats import norm

# Hypothesis testing using Z-statistics
sample_data = np.random.normal(loc=60, scale=10, size=30)
sample_mean = np.mean(sample_data)
population_mean = 55
population_std = 10
n = len(sample_data)

# Z-test
z_stat = (sample_mean - population_mean) / (population_std / np.sqrt(n))
p_value = 2 * (1 - norm.cdf(abs(z_stat)))

print(f"Z-statistic: {z_stat:.2f}")
print(f"P-value: {p_value:.3f}")
if p_value < 0.05:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")


In [None]:
import numpy as np
from scipy.stats import norm

# Dataset
data = np.random.normal(loc=70, scale=15, size=100)

# Confidence interval calculation
mean = np.mean(data)
std_error = np.std(data, ddof=1) / np.sqrt(len(data))

# 95% confidence interval
z_value = norm.ppf(0.975)
margin_of_error = z_value * std_error
ci_lower, ci_upper = mean - margin_of_error, mean + margin_of_error

print(f"Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


In [None]:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

# Normal distribution
mean, std_dev = 0, 1
x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, loc=mean, scale=std_dev)

# Plot PDF
plt.plot(x, pdf, color='blue')
plt.title("Probability Density Function of Normal Distribution")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.show()


In [None]:
from scipy.stats import poisson

# Parameters
mu = 3
x = 4

# CDF of Poisson distribution
cdf = poisson.cdf(x, mu)
print(f"CDF for x = {x} in Poisson distribution with mu = {mu}: {cdf:.4f}")


In [None]:
import numpy as np

# Continuous uniform distribution
data = np.random.uniform(0, 1, 1000)

# Expected value
expected_value = np.mean(data)
print(f"Expected Value: {expected_value:.2f}")


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate two datasets
data1 = np.random.normal(50, 10, 100)
data2 = np.random.normal(70, 15, 100)

# Calculate standard deviations
std1, std2 = np.std(data1, ddof=1), np.std(data2, ddof=1)

# Plot histograms
plt.hist(data1, alpha=0.5, label=f"Data1 (std={std1:.2f})")
plt.hist(data2, alpha=0.5, label=f"Data2 (std={std2:.2f})")
plt.legend()
plt.title("Comparison of Standard Deviations")
plt.show()


In [None]:
import numpy as np
from scipy.stats import norm

# Generate data
data = np.random.normal(loc=100, scale=15, size=500)

# Calculate mean and standard error
mean = np.mean(data)
std_error = np.std(data, ddof=1) / np.sqrt(len(data))

# Confidence interval (95%)
confidence_level = 0.95
z_value = norm.ppf((1 + confidence_level) / 2)
margin_of_error = z_value * std_error
ci_lower, ci_upper = mean - margin_of_error, mean + margin_of_error

print(f"Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")


In [None]:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

# Normal distribution parameters
mean, std_dev = 0, 1
x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, loc=mean, scale=std_dev)

# Plot PDF
plt.plot(x, pdf, color='blue')
plt.title("Probability Density Function of Normal Distribution")
plt.xlabel("X-values")
plt.ylabel("Probability Density")
plt.show()


In [None]:
from scipy.stats import poisson

# Parameters
mu = 4
x = 6

# Calculate the CDF
cdf = poisson.cdf(x, mu)
print(f"CDF for x = {x} in Poisson distribution with mean (mu) = {mu}: {cdf:.4f}")


In [None]:
import numpy as np

# Simulate random variable
data = np.random.uniform(low=10, high=20, size=1000)

# Calculate expected value
expected_value = np.mean(data)
print(f"Expected Value: {expected_value:.2f}")


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate two datasets
data1 = np.random.normal(loc=50, scale=10, size=500)
data2 = np.random.normal(loc=70, scale=15, size=500)

# Calculate standard deviations
std1 = np.std(data1, ddof=1)
std2 = np.std(data2, ddof=1)

# Plot histograms
plt.hist(data1, bins=20, alpha=0.5, label=f"Data1 (Std: {std1:.2f})", color='blue')
plt.hist(data2, bins=20, alpha=0.5, label=f"Data2 (Std: {std2:.2f})", color='green')
plt.title("Comparison of Standard Deviations")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.legend()
plt.show()


In [None]:
import numpy as np

# Generate dataset
data = np.random.normal(loc=60, scale=12, size=500)

# Calculate range and IQR
data_range = np.max(data) - np.min(data)
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1

print(f"Range: {data_range:.2f}")
print(f"Interquartile Range (IQR): {iqr:.2f}")


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore

# Generate dataset
data = np.random.normal(loc=70, scale=15, size=500)

# Z-score normalization
normalized_data = zscore(data)

# Plot original and normalized datasets
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.hist(data, bins=20, color='blue', alpha=0.7, label='Original Data')
plt.title("Original Data")
plt.legend()

plt.subplot(1, 2, 2)
plt.hist(normalized_data, bins=20, color='green', alpha=0.7, label='Z-score Normalized Data')
plt.title("Normalized Data")
plt.legend()

plt.tight_layout()
plt.show()


In [None]:
import numpy as np
from scipy.stats import skew, kurtosis

# Generate dataset
data = np.random.normal(loc=50, scale=10, size=500)

# Calculate skewness and kurtosis
data_skewness = skew(data)
data_kurtosis = kurtosis(data)

print(f"Skewness: {data_skewness:.2f}")
print(f"Kurtosis: {data_kurtosis:.2f}")
