## What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

__Probability Mass Function__ PMF is used for discrete random variables, meaning variables that can only take on specific, finite values. The PMF gives the probability that a discrete random variable takes on a certain value. The sum of all the probabilities in the PMF is 1.
* For example, consider the case of rolling a fair six-sided die. The PMF for this scenario can be written as:

> value on die = {1,2,3,4,5,6}

> pmf = {1/6 , 1/6 , 1/6, 1/6, 1/6, 1/6}

__Probability Mass Function__ PDF, on the other hand, is used for continuous random variables, meaning variables that can take on any value within a certain range or interval. The PDF gives the probability density function of a continuous random variable at any given point within its range. 
* For example of a continuous random variable is height, which can take on any value within a certain range. 

> f(x) = (1/σ√(2π)) * e^(-((x-μ)^2)/(2σ^2))

> where μ is mean and σ is std


## What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

The Cumulative Density Function (CDF) is a statistical function that describes the probability of a random variable taking a value less than or equal to a given value. 

we have a random variable X that represents the score of a student in a math test(from 0 to 10), the pdf of X is:


> X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

> f(X) = { 0.02, 0.05, 0.08, 0.1, 0.12, 0.15, 0.16, 0.1, 0.07, 0.03, 0.02}

* To find the CDF of X, we need to calculate the probability that X is less than or equal to a certain value of x
* eg. to find F(5), we need to add up the probabilities of getting a score of 0, 1, 2, 3, 4, or 5:
  
  F(5) = P(X ≤ 5) = f(0) + f(1) + f(2) + f(3) + f(4) + f(5)
  
  = 0.02 + 0.05 + 0.08 + 0.1 + 0.12 + 0.15
  
  = 0.52

So, if we want to plot CDF, the resulting CDF of X would look like this:

> X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

> F(X) = {0.02, 0.07, 0.15, 0.25, 0.37, 0.52, 0.68, 0.78, 0.85, 0.88, 0.9 }

__Utilization:__ The value of F(x) at any point x represents the probability that the student will get a score less than or equal to x. For example, F(5) = 0.52 means that there is a 52% chance that the student will get a score of 5 or less in the test. Other than calculating percentage we can also use CDF in percentiles, estimaton and hypothesis testing.


## What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

The normal distribution is a commonly used probability distribution in statistics due to its many useful properties. Some examples of situations where the normal distribution might be used as a model are:

* Heights and weights
* IQ scores
* Error terms in regression analysis
* Stock prices
* Quality control
* Natural phenomena

The normal distribution is a continuous probability distribution that is characterized by two parameters, mean (μ) and standard deviation (σ). 

* The mean of the normal distribution determines the location of the center of the distribution, and is also the value where the distribution is symmetric.
* If the mean is shifted to the right, the entire distribution will shift to the right. If the mean is shifted to the left, the entire distribution will shift to the left.
* The standard deviation of the normal distribution determines the width of the distribution. 
* If the standard deviation is large, the distribution will be more spread out, and if it is small, the distribution will be more compact.
* A smaller standard deviation results in a taller, more narrow peak, while a larger standard deviation results in a shorter, flatter peak.
* Together, the mean and standard deviation determine the shape of the normal distribution, and their values can be used to make predictions about the likelihood of events occurring within the distribution. 





## Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.

The normal distribution, is one of the most important probability distributions in statistics due to its many useful properties. 

* It is a well-understood distribution
* It is a convenient model for many natural phenomena (height, weight)
* It has a simple mathematical form
* It is symmetric, unimodal, and bell-shaped

The normal distribution is a commonly used probability distribution in statistics due to its many useful properties. Some examples of situations where the normal distribution might be used as a model are:

* Heights and weights
* IQ scores
* Stock prices
* Quality control
* Blood Pressure
* Test Scores

## What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?

The Bernoulli distribution is a probability distribution that models a single trial of a binary experiment, where the outcome can either be success or failure.

The Bernoulli distribution is characterized by a single parameter p, which represents the probability of success in a single trial. The probability of failure in a single trial is therefore 1 - p. The probability mass function of the Bernoulli distribution is given by:

P(X = 1) = p

P(X = 0) = 1 - p

__Example__ 

A situation that can be modeled using the Bernoulli distribution is whether a customer buys a product after seeing an advertisement.
If we define success as the customer buying the product and failure as the customer not buying the product, then the probability of success depends on the effectiveness of the advertisement, and can be modeled using the Bernoulli distribution with a probability parameter p.


Difference between Bernoulli Distribution and Binomial Distribution are:

* __Number of Trials:__ The Bernoulli distribution models a single trial, while the binomial distribution models a series of independent, identical trials.

* __Parameterization:__ The Bernoulli distribution has a single parameter p, which represents the probability of success in a single trial. The binomial distribution has two parameters: n, which represents the number of trials, and p, which represents the probability of success in each trial.

* __Probability Mass Function:__ PMF of the Bernoulli distribution is given by P(X=1) = p and P(X=0) = 1-p. The PMF of the binomial distribution is given by P(X=k) = (n choose k) * p^k * (1-p)^(n-k), where X is the random variable representing the number of successes in n independent, identical trials.

* __Mean and Variance:__ The mean of the Bernoulli distribution is p, and the variance is p(1-p). The mean of the binomial distribution is np, and the variance is np(1-p).

The Bernoulli distribution is a special case of the binomial distribution where n=1. It models a single trial of a binary experiment, while the binomial distribution models a series of independent, identical trials with a fixed probability of success.

## Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

## Explain uniform Distribution with an example.

The uniform distribution is a probability distribution that models the probability of a continuous random variable taking on any value within a specific range, where each value within that range is equally likely to occur. This means that the probability density function (PDF) of the uniform distribution is constant within the range and zero outside the range.

The PDF of the uniform distribution is given by:

f(x) = 1 / (b - a) if a <= x <= b

f(x) = 0 otherwise

where a and b are the lower and upper bounds of the range.

__Example:__ The time it takes for a traffic light to turn green. If we assume that the time it takes for the traffic light to turn green is equally likely to be any value between 10 and 20 seconds, we can model this situation using the uniform distribution with a = 10 and b = 20.

## What is the z score? State the importance of the z score.

A z-score, also known as a standard score, is a measure of how many standard deviations a data point is from the mean of its distribution.

The formula for calculating the z-score for a data point x in a distribution with mean μ and standard deviation σ is:

z = (x - μ) / σ

The importance of the z-score lies in its ability to standardize data and make it easier to interpret and compare. By converting data to z-scores, we can easily identify extreme values, calculate probabilities, and compare different datasets. Additionally, the z-score is used in hypothesis testing and statistical inference, as it allows us to calculate the probability of obtaining a particular value or set of values given the distribution of the data.

## What is Central Limit Theorem? State the significance of the Central Limit Theorem.

Central limit theorem says that the sampling distributed of the mean will always be normally distributed, regardless of the populaton has a normal, poisson, binomial or any distribution.

As the sample size increases, the distribution of the sample mean approaches a normal distribution with

* mean  = μ and
* standard deviation =  σ/sqrt(n), 

where μ is the population mean, σ is the population standard deviation, and n is the sample size.

The significance of the Central Limit Theorem is that it provides a powerful tool for statistical inference and hypothesis testing. It allows us to use the normal distribution to make inferences about a population, even when the population distribution is not known. This is because the CLT implies that the sample mean will be normally distributed even if the underlying population distribution is not normal, provided that the sample size is sufficiently large. The CLT is used extensively in hypothesis testing, confidence interval estimation, and parameter estimation.


## State the assumptions of the Central Limit Theorem.

The Central Limit Theorem (CLT) is based on some important assumptions that must be satisfied for the theorem to hold. These assumptions are:

Independence: The sample observations must be independent of each other.

Sample size: The sample size must be sufficiently large. As a general rule, a sample size of 30 or more is considered large enough for the CLT to hold.

Population distribution: The population from which the sample is drawn may have any distribution, but it should have a finite mean and a finite variance.

Random sampling: The sample should be drawn at random from the population, so that every member of the population has an equal chance of being selected.