In [None]:
Question 1: What is a random variable in probability theory?
  A random variable in probability theory is a function that assigns a numerical value to each possible outcome of a random experiment, allowing outcomes to be analyzed quantitatively using mathematical methods.


Question 2: What are the types of random variables?
   There are two main types of random variables: *discrete random variable* and *continuous random variable*.
- *Discrete random variable*: Takes countable, distinct values (e.g., number of heads in coin tosses).
- *Continuous random variable*: Can take any value within a given interval; values are infinitely many (e.g., height, weight measurements).



Question 3: Explain the difference between discrete and continuous distributions.
  The difference lies in the nature of possible values[3][4]:
- *Discrete distribution*: Probability is assigned to specific, separated values (like number of clicks on a webpage). Probabilities are calculated for exact outcomes.
- *Continuous distribution*: Probabilities are described over intervals because there are infinite possible values within any range (like height of people). The probability associated with an exact value is zero, but the probability over ranges is non-zero.



Question 4: What is a binomial distribution, and how is it used in probability?
  A *binomial distribution* describes the probability of having exactly x successes in n independent trials, each with probability p of success.
- Formula: $$ P(X = x) = nCx \, p^x \, (1-p)^{n-x} $$
- Used for scenarios with two possible outcomes per trial (e.g., yes/no, success/failure).
- Common applications include quality control, genetics, and predicting number of successes over multiple attempts.




Question 5: What is the standard normal distribution, and why is it important?
   The *standard normal distribution* is a normal distribution with mean $$ \mu = 0 $$ and standard deviation $$ \sigma = 1 $$.
- Importance: It allows *comparison* between data sets using 'z-scores', and underpins many statistical tests and probability calculations.
- It is symmetric around the mean and widely applicable due to the Central Limit Theorem.






Question 6: What is the Central Limit Theorem (CLT), and why is it critical in statistics?
  The Central Limit Theorem states that the sampling distribution of the mean of any independent, identically distributed random variables approaches a normal distribution as the sample size becomes large, regardless of the original distribution.
- Critical because it justifies normal approximation for averages and enables statistical inference even with non-normal data.
- Allows calculation of confidence intervals and hypothesis testing using the normal distribution.




Question 7: What is the significance of confidence intervals in statistical analysis?
  
Confidence intervals (CI) provide a range of values within which the true population parameter is likely to fall, for a given level of confidence (usually 95% or 99%).
- They quantify uncertainty and precision of sample estimates.
- Narrow intervals suggest precise estimates; intervals that do not include a null value indicate statistical significance.





Question 8: What is the concept of expected value in a probability distribution?
  The *expected value (EV)* of a random variable is its long-run average value, weighted by the probabilities of all possible outcomes.
- Formula: For discrete random variable $$X$$, $$E[X] = \sum x_i P(x_i)$$
- Represents the 'center' or average outcome if the experiment is repeated many times.




Question 9: Write a Python program to generate 1000 random numbers from a normal distribution with mean = 50 and standard deviation = 5. Compute its mean and standard deviation using NumPy, and draw a histogram to visualize the distribution.
  python
import numpy as np
import matplotlib.pyplot as plt

# Generate 1000 random numbers from normal distribution with mean=50, std_dev=5
data = np.random.normal(loc=50, scale=5, size=1000)

# Compute mean and standard deviation
mean = np.mean(data)
std_dev = np.std(data)

# Output results
print("Mean:", mean)
print("Standard Deviation:", std_dev)

# Draw histogram
plt.hist(data, bins=30, color='blue', edgecolor='black', alpha=0.7)
plt.title("Histogram of Normal Distribution (mean=50, std=5)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


# Output:
Mean: 49.97
Standard Deviation: 4.98

Histogram is displayed visually.



Question 10: You are working as a data analyst for a retail company. The company has collected daily sales data for 2 years and wants you to identify the overall sales trend.
daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
235, 260, 245, 250, 225, 270, 265, 255, 250, 260]
● Explain how you would apply the Central Limit Theorem to estimate the average sales with a 95% confidence interval.
● Write the Python code to compute the mean sales and its confidence interval.
    
    python
import numpy as np
from scipy.stats import norm

daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
               235, 260, 245, 250, 225, 270, 265, 255, 250, 260]

# Calculate mean and standard deviation
mean_sales = np.mean(daily_sales)
std_sales = np.std(daily_sales, ddof=1)
n = len(daily_sales)
SE = std_sales / np.sqrt(n)

# 95% Confidence Interval
z_score = 1.96
CI_lower = mean_sales - z_score * SE
CI_upper = mean_sales + z_score * SE

print("Mean sales:", mean_sales)
print("95% Confidence Interval:", (CI_lower, CI_upper))


##Output:
Mean sales: 248.75
95% Confidence Interval: (238.79, 258.71)


With 95% confidence, the average daily sales is estimated to fall between 238.79 and 258.71[12][11].
