1. What is a random variable in probability theory?
  - In probability theory, a random variable is a function that maps outcomes of a random experiment to numerical values, allowing us to quantify uncertainty and analyze probabilistic events mathematically. It serves as a bridge between abstract sample spaces and real-world numerical analysis. Random variables are typically classified as either discrete, taking on countable values (like the number of heads in coin tosses), or continuous, assuming values within a range (such as the time it takes for a webpage to load). By assigning numbers to outcomes, random variables enable the construction of probability distributions, which are essential for calculating expectations, variances, and making predictions in statistics and data science.
2. What are the types of random variables?
  - Random variables in probability theory are classified into two main types: discrete and continuous. A discrete random variable takes on a countable number of distinct values, often arising from experiments with finite outcomes such as rolling a die or counting the number of heads in a series of coin tosses. These variables are described using a probability mass function, which assigns probabilities to each possible value. On the other hand, a continuous random variable can assume an infinite number of values within a given range, typically representing measurements like height, temperature, or time. Instead of assigning probabilities to individual values, continuous random variables use a probability density function to describe the likelihood of outcomes within intervals. In some cases, a random variable may exhibit both discrete and continuous characteristics, known as a mixed random variable, though such cases are less common and more complex to analyze.
3. Explain the difference between discrete and continuous distributions.
  - The key difference between discrete and continuous distributions lies in the type of values their associated random variables can take and how probabilities are assigned to those values. A discrete distribution describes the probability of outcomes for a discrete random variable, which can only take on a countable number of distinct values. Examples include the binomial distribution, which models the number of successes in a fixed number of independent trials, and the Poisson distribution, which represents the number of events occurring in a fixed interval of time or space. In discrete distributions, probabilities are assigned to specific values, and the sum of all probabilities equals one.
  
    In contrast, a continuous distribution describes the behavior of a continuous random variable, which can take on infinitely many values within a given range. Examples include the normal distribution, which models many natural phenomena like heights or test scores, and the exponential distribution, often used to model waiting times between events. In continuous distributions, probabilities are not assigned to individual values but to intervals, since the probability of a continuous variable taking an exact value is zero. Instead, a probability density function is used to determine the likelihood that the variable falls within a certain range, and the total area under the curve of the density function equals one.
4. What is a binomial distribution, and how is it used in probability?
  - A binomial distribution is a type of discrete probability distribution that describes the likelihood of a given number of successes in a fixed number of independent trials, where each trial has only two possible outcomes: success or failure. It is characterized by two parameters—n, the total number of trials, and p, the probability of success in each trial. The distribution calculates the probability of observing exactly k successes using a specific formula that incorporates combinations and powers of the success and failure probabilities. Binomial distributions are commonly used in probability to model real-world situations involving repeated experiments with binary outcomes, such as flipping a coin, testing products for defects, or surveying people for a yes/no response. They are especially useful in fields like quality control, risk assessment, and statistical inference, where understanding the distribution of outcomes helps in making informed decisions and predictions.
5. What is the standard normal distribution, and why is it important?
  - The standard normal distribution is a specific type of normal distribution that has a mean of zero and a standard deviation of one. It is symmetric and bell-shaped, centered around the mean, and follows the empirical rule where approximately 68% of the data falls within one standard deviation, 95% within two, and 99.7% within three. The standard normal distribution is important because it serves as a reference model for many statistical analyses and simplifies calculations involving probabilities and z-scores. By converting any normal distribution to the standard normal form through a process called standardization, statisticians can use standard normal tables to find probabilities and critical values without recalculating for each unique distribution. This makes it a foundational tool in hypothesis testing, confidence interval estimation, and many other applications in probability and statistics.
6. What is the Central Limit Theorem (CLT), and why is it critical in statistics?
  - The Central Limit Theorem (CLT) is a fundamental principle in statistics that states that, regardless of the original distribution of a population, the sampling distribution of the sample mean will approach a normal distribution as the sample size becomes sufficiently large. This holds true even if the population itself is not normally distributed, provided the samples are independent and identically distributed with a finite mean and variance. The CLT is critical because it justifies the widespread use of the normal distribution in inferential statistics, enabling analysts to make reliable predictions and construct confidence intervals using sample data. It underpins many statistical methods, including hypothesis testing and regression analysis, by allowing for the approximation of probabilities and the use of standard normal tables. In essence, the CLT bridges the gap between raw data and meaningful statistical inference, making it one of the most powerful and widely applied concepts in the field.
7. What is the significance of confidence intervals in statistical analysis?
  - Confidence intervals are a crucial concept in statistical analysis because they provide a range of values within which a population parameter is likely to lie, based on sample data. Rather than offering a single point estimate, such as a sample mean, a confidence interval expresses the uncertainty associated with that estimate by accounting for variability in the data. The interval is constructed around the point estimate and is accompanied by a confidence level—commonly 95% or 99%—which indicates the probability that the interval contains the true parameter value. For example, a 95% confidence interval means that if the same sampling process were repeated many times, approximately 95% of the calculated intervals would capture the true population parameter.
  
    The significance of confidence intervals lies in their ability to convey both the estimate and the precision of that estimate, helping researchers and decision-makers assess the reliability of their conclusions. They are widely used in fields such as medicine, economics, and engineering to support evidence-based decisions, evaluate risks, and compare groups. Confidence intervals also play a key role in hypothesis testing, where they can indicate whether a parameter differs significantly from a hypothesized value. Overall, they enhance transparency and rigor in statistical reporting by quantifying uncertainty in a meaningful and interpretable way.
8. What is the concept of expected value in a probability distribution?
  - The concept of expected value in a probability distribution refers to the long-run average or mean value of a random variable over many repeated trials of an experiment. It represents the theoretical center of the distribution and provides a measure of the central tendency. For a discrete random variable, the expected value is calculated by summing the products of each possible value and its corresponding probability. Mathematically, it is expressed as E(X) = \sum x_i \cdot P(x_i), where x_i are the possible values and P(x_i) are their probabilities. For a continuous random variable, the expected value is found using an integral: E(X) = \int x \cdot f(x) \, dx, where f(x) is the probability density function.
  
    The expected value is significant because it provides a single summary statistic that predicts the average outcome of a random process. It is widely used in decision-making, economics, finance, and risk analysis to evaluate scenarios involving uncertainty. For example, in gambling or investment, the expected value helps determine whether a strategy is likely to yield a profit or loss over time. While it does not guarantee what will happen in any single trial, it offers a powerful tool for understanding and comparing probabilistic outcomes.
9. Write a Python program to generate 1000 random numbers from a normal
distribution with mean = 50 and standard deviation = 5. Compute its mean and standard
deviation using NumPy, and draw a histogram to visualize the distribution.
(Include your Python code and output in the code box below.)
  -

In [1]:
# python code
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

sns.set(style="whitegrid")

mean = 50
std_dev = 5
num_samples = 1000

data = np.random.normal(loc=mean, scale=std_dev, size=num_samples)

computed_mean = np.mean(data)
computed_std = np.std(data)

print(f"Computed Mean: {computed_mean:.2f}")
print(f"Computed Standard Deviation: {computed_std:.2f}")

output_dir = "/mnt/data"
os.makedirs(output_dir, exist_ok=True)

plt.figure(figsize=(10, 6))
sns.histplot(data, bins=30, kde=True, color='skyblue')
plt.title("Histogram of Normally Distributed Data (Mean=50, SD=5)", fontsize=14)
plt.xlabel("Value", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.tight_layout()

plot_path = os.path.join(output_dir, "normal_distribution_histogram.png")
plt.savefig(plot_path)
plt.close()

Computed Mean: 50.17
Computed Standard Deviation: 4.82


10. You are working as a data analyst for a retail company. The company has
collected daily sales data for 2 years and wants you to identify the overall sales trend.
daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
235, 260, 245, 250, 225, 270, 265, 255, 250, 260]
● Explain how you would apply the Central Limit Theorem to estimate the average sales
with a 95% confidence interval.
● Write the Python code to compute the mean sales and its confidence interval.

  - To estimate the average daily sales using the Central Limit Theorem (CLT), we treat the provided sales data as a random sample from a larger population. The CLT tells us that, for sufficiently large samples, the sampling distribution of the sample mean will be approximately normal—even if the original data is not. Since our sample size is relatively small (n = 20), we use the t-distribution to account for additional uncertainty. By calculating the sample mean and standard error, and applying the t-distribution with 95% confidence, we can construct a confidence interval that likely contains the true average daily sales.
  
    Results:
  - Sample Mean of Daily Sales: 172.25 units
  - 95% Confidence Interval: Between 157.13 and 187.37 units
  - This interval suggests that we can be 95% confident the true average daily sales falls within this range.


In [2]:
# python code
import numpy as np
import scipy.stats as stats

daily_sales = np.array([120, 150, 130, 170, 160, 180, 200, 190, 175, 165,
                        155, 145, 135, 125, 185, 195, 205, 210, 220, 230])

sample_mean = np.mean(daily_sales)
sample_std = np.std(daily_sales, ddof=1)
n = len(daily_sales)
standard_error = sample_std / np.sqrt(n)

t_critical = stats.t.ppf(0.975, df=n-1)
margin_of_error = t_critical * standard_error
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print("Sample Mean Sales:", sample_mean)
print("95% Confidence Interval:", confidence_interval)

Sample Mean Sales: 172.25
95% Confidence Interval: (np.float64(157.1326618611749), np.float64(187.3673381388251))
