Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with
an example.

### Probability Mass Function (PMF)
The PMF applies to discrete random variables. It gives the probability that a discrete random variable is exactly equal to some value.

**Example:** 
For a fair six-sided die, the PMF for the random variable \( X \) (the number rolled) is:
\[ P(X = k) = \frac{1}{6} \]
for \( k = 1, 2, 3, 4, 5, 6 \).

### Probability Density Function (PDF)
The PDF applies to continuous random variables. It describes the likelihood of a random variable to take on a particular value, but the probability of any single exact value is zero; instead, the area under the curve within an interval gives the probability.

**Example:**
For a standard normal distribution (mean \( \mu = 0 \), standard deviation \( \sigma = 1 \)), the PDF is:
\[ f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \]

The probability that the variable falls within an interval, say \( a \) to \( b \), is found by integrating the PDF over that interval:
\[ P(a \leq X \leq b) = \int_a^b f(x) \, dx \]


Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

### Cumulative Density Function (CDF)
The CDF applies to both discrete and continuous random variables. It gives the probability that a random variable \( X \) takes on a value less than or equal to \( x \).

**Definition:**
For a random variable \( X \), the CDF \( F(x) \) is defined as:
\[ F(x) = P(X \leq x) \]

### Example
For a standard normal distribution (mean \( \mu = 0 \), standard deviation \( \sigma = 1 \)), the CDF is:
\[ F(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}} \, dt \]

### Why CDF is Used?
- **Probability Calculation:** It provides the probability that a variable is within a certain range.
- **Comparison:** It allows for the comparison of distributions.
- **Quantiles:** It is used to determine percentiles and other quantiles.



Q3: What are some examples of situations where the normal distribution might be used as a model?
Explain how the parameters of the normal distribution relate to the shape of the distribution.

### Examples of Normal Distribution Use
- **Heights of People:** Human heights often follow a normal distribution.
- **Test Scores:** Standardized test scores are typically modeled using a normal distribution.


### Parameters of the Normal Distribution
The normal distribution is characterized by two parameters:
1. **Mean (\( \mu \))**: 
   - **Location Parameter:** It determines the center of the distribution. The peak of the bell curve is at \( \mu \).
2. **Standard Deviation (\( \sigma \))**: 
   - **Scale Parameter:** It measures the spread of the distribution. A smaller \( \sigma \) results in a steeper curve, while a larger \( \sigma \) results in a flatter curve.

**Shape Relation:**
- **Mean (\( \mu \))** shifts the distribution left or right.
- **Standard Deviation (\( \sigma \))** affects the width of the bell curve. Larger \( \sigma \) means more spread out, while smaller \( \sigma \) means more concentrated around the mean.


Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal
Distribution.

### Importance of Normal Distribution
- **Central Limit Theorem:** Many statistical methods rely on the normal distribution because of the Central Limit Theorem, which states that the sum of a large number of independent random variables tends to be normally distributed, regardless of the original distribution.
- **Statistical Inference:** It is widely used in hypothesis testing, confidence intervals, and regression analysis.
- **Predictive Modeling:** Many machine learning algorithms assume normality in the data.

### Real-Life Examples of Normal Distribution
1. **Human Heights:** Heights of people within a specific population tend to follow a normal distribution.
2. **Blood Pressure:** The distribution of blood pressure readings among a healthy population.
3. **IQ Scores:** IQ scores are designed to follow a normal distribution, with most people scoring around the average.
4. **Measurement Errors:** Errors in physical measurements in scientific experiments often follow a normal distribution.
5. **Daily Stock Returns:** The daily returns of stock prices are often modeled using a normal distribution for financial analyses.

These examples demonstrate how the normal distribution appears naturally in various fields, making it a crucial concept in statistics and data analysis.


Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli
Distribution and Binomial Distribution?

### Bernoulli Distribution
The Bernoulli distribution is a discrete distribution having two possible outcomes, labeled by 0 and 1 where 1 typically represents "success" and 0 represents "failure". It is characterized by a single parameter \( p \), which is the probability of success.

**Example:**
Flipping a biased coin where the probability of getting heads (success) is 0.3. The random variable \( X \) that equals 1 if heads and 0 if tails follows a Bernoulli distribution.

### Difference Between Bernoulli Distribution and Binomial Distribution
- **Bernoulli Distribution** is a special case of the Binomial distribution where a single trial is conducted (n=1).
- **Binomial Distribution** describes the number of successes in a fixed number \( n \) of independent Bernoulli trials with the same probability of success \( p \). It generalizes Bernoulli distribution to more than one trial.

**In essence:**
- Use **Bernoulli** for the outcome of a single trial (e.g., success/failure).
- Use **Binomial** for the number of successes in multiple trials (e.g., 3 successes in 10 trials).


Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset
is normally distributed, what is the probability that a randomly selected observation will be greater
than 60? Use the appropriate formula and show your calculations.

In [4]:
mean = 50
std = 10
xi = 60

z_score = (xi - mean)/std
print(z_score)

1.0


According to 0.8413% data is leass than 60. So 1-0.8413% will be grater than 60. i.e 0.1587

Q7: Explain uniform Distribution with an example.

### Uniform Distribution
The uniform distribution is a probability distribution where all outcomes are equally likely. It can be either discrete or continuous.

### Example
- **Discrete Uniform Distribution:** Rolling a fair six-sided die. Each face (1 through 6) has an equal probability of \( \frac{1}{6} \).
- **Continuous Uniform Distribution:** Selecting a random point along a ruler from 0 to 1 meter. Any point along the ruler is equally probable.

**Key Features:**
- In a discrete uniform distribution, all outcomes have the same probability.
- In a continuous uniform distribution, every interval of the same length has the same probability of containing the random variable.



Q8: What is the z score? State the importance of the z score."

### Z Score
The z score, also known as a standard score, quantifies the number of standard deviations a data point is from the mean of the data set. It is calculated using the formula:

\[ z = \frac{(X - \mu)}{\sigma} \]

where \( X \) is the data point, \( \mu \) is the mean, and \( \sigma \) is the standard deviation.

### Importance of the Z Score
- **Standardization:** Z scores transform the data into a standard form, making different data sets comparable by removing the effects of the scale and location of the original data.
- **Outlier Detection:** Z scores can help identify outliers, as data points with very high or very low z scores (e.g., more than 3 or less than -3) are typically considered outliers.
- **Normalization:** In statistical analysis and machine learning, z scores are used to normalize data, ensuring that each feature contributes equally to the analysis.
- **Probabilistic Interpretation:** Z scores directly relate to the normal distribution, allowing for the determination of probabilities and critical values in statistical tests.


Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.

### Central Limit Theorem (CLT)
The Central Limit Theorem is a fundamental statistical principle that states if you have a large enough sample size, the distribution of the sample means will be approximately normally distributed, regardless of the shape of the original population distribution. This holds true provided the samples are independent and identically distributed (i.i.d.).

### Significance of the Central Limit Theorem
- **Statistical Inference:** CLT allows for the use of normal probability distributions to estimate probabilities and conduct hypothesis tests about means, even when the original population is not normally distributed.
- **Confidence Intervals:** It underpins the creation of confidence intervals around sample means, which are critical for making decisions based on sample data.
- **Simplification in Analysis:** By ensuring that sample means are normally distributed, CLT simplifies the analysis of data and the application of various statistical tools that assume normality.
- **Applicability:** The theorem is crucial for practical applications because it justifies the use of the normal distribution in many scenarios involving averages and means, thereby broadening the scope of various statistical methodologies.

Overall, the Central Limit Theorem is essential for performing accurate and meaningful statistical analyses when dealing with real-world data.


Q10: State the assumptions of the Central Limit Theorem.

### Assumptions of the Central Limit Theorem
1. **Independence:** The sampled observations must be independent of each other. This means the sampling should be done with replacement, or if without replacement, the sample size should not exceed 10% of the population to avoid dependence.

2. **Identically Distributed:** The data points in the population from which samples are drawn must be identically distributed, meaning they all come from the same distribution, with the same mean and variance.

3. **Sample Size:** The sample size must be sufficiently large. Although "large" is subjective, a common rule of thumb is that the sample size should be at least 30 to sufficiently approximate a normal distribution, particularly if the underlying distribution is not normal.

4. **Finite Variance:** The population from which samples are drawn must have a finite variance. If the variance is infinite, the central limit theorem may not hold.

These assumptions ensure that the theorem applies, allowing the mean of the sample means to approximate a normal distribution regardless of the shape of the population distribution.
