Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

Ans. The Probability Mass Function (PMF) and Probability Density Function (PDF) are both functions used in probability theory to describe the likelihood of different outcomes of a random variable.

The PMF is a function that gives the probability of each possible value that a discrete random variable can take. It is defined as:

P(X = x) = f(x)

where X is the random variable, x is a possible value that X can take, and f(x) is the probability of X taking the value x. The PMF assigns probabilities to discrete values of the random variable, such as the number of heads obtained when flipping a coin a certain number of times.

For example, consider a fair six-sided die. The PMF of this die is given by:

P(X = 1) = 1/6
P(X = 2) = 1/6
P(X = 3) = 1/6
P(X = 4) = 1/6
P(X = 5) = 1/6
P(X = 6) = 1/6

This means that the probability of rolling a 1 on the die is 1/6, the probability of rolling a 2 is 1/6, and so on.

The PDF, on the other hand, is a function that gives the probability density of a continuous random variable at each possible value. It is defined as:

f(x) = dF(x) / dx

where F(x) is the cumulative distribution function (CDF) of the random variable, which gives the probability that X is less than or equal to a given value x. The PDF assigns probabilities to intervals of values of the random variable, such as the heights of people in a certain range.

For example, consider the heights of adult women in a population. The distribution of these heights may be modeled by a normal distribution with mean μ and standard deviation σ. The PDF of this distribution is given by:

f(x) = (1 / (σ * sqrt(2π))) * exp(-(x - μ)² / (2σ²))

This means that the probability density at a height x is given by the above formula, which takes into account the mean and standard deviation of the distribution. The PDF can be used to calculate the probability that a randomly chosen woman's height falls within a certain range, such as between 5'5" and 5'10".





Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

Ans. The Cumulative Density Function (CDF) is a function used in probability theory to describe the probability of a random variable taking on a value less than or equal to a given value. In other words, the CDF gives the probability that a random variable is less than or equal to a certain value. The CDF is defined as:

F(x) = P(X ≤ x)

where X is the random variable, x is a possible value that X can take, and F(x) is the probability that X is less than or equal to x. The CDF is defined for both discrete and continuous random variables.

The CDF is a useful tool in probability theory because it allows us to determine the probability of a random variable taking on a value less than or equal to a certain value, or within a certain range of values. It is also used to calculate other probabilities, such as the probability that a random variable takes on a value greater than a certain value or between two values.

For example, consider a fair six-sided die. The CDF of this die is given by:

F(x) = P(X ≤ x)

Where X is the random variable representing the outcome of a single roll of the die, and x is a possible value that X can take, from 1 to 6. The CDF for the die is a step function, which increases by 1/6 at each possible value of X:

F(1) = 1/6
F(2) = 2/6
F(3) = 3/6
F(4) = 4/6
F(5) = 5/6
F(6) = 6/6 = 1

This means that the probability of rolling a value less than or equal to 1 is 1/6, the probability of rolling a value less than or equal to 2 is 2/6 or 1/3, and so on.

Q3: What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

Ans. The normal distribution, also known as the Gaussian distribution, is one of the most widely used probability distributions in statistics. It is used to model continuous variables that are approximately normally distributed. Some examples of situations where the normal distribution might be used as a model are:

1. Heights of people: The distribution of heights in a population is approximately normally distributed.

2. Weights of objects: The distribution of weights of objects produced by a manufacturing process may be normally distributed.

3. Errors in measurements: The errors in measurements obtained from instruments may be modeled by a normal distribution.

4. Exam scores: The distribution of exam scores in a large class may be approximately normally distributed.

The normal distribution is characterized by two parameters: the mean (μ) and the standard deviation (σ). The mean represents the center of the distribution and the standard deviation represents the spread or width of the distribution. The mean determines the location of the peak of the distribution, while the standard deviation determines the shape of the distribution.

If the standard deviation is small, the distribution will be tall and narrow, with most of the values clustered around the mean. If the standard deviation is large, the distribution will be flatter and wider, with more values spread out from the mean.

The shape of the normal distribution is symmetric, with the mean and the median being the same value. The distribution is also bell-shaped, with most of the values concentrated around the mean and fewer values further away from the mean.

Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?

Ans. Bernoulli distribution is a probability distribution that represents a random experiment with two possible outcomes: success with probability p and failure with probability 1-p. It is named after Swiss mathematician Jacob Bernoulli. A common example of Bernoulli distribution is flipping a coin, where the probability of getting heads (success) is p and the probability of getting tails (failure) is 1-p.

The probability mass function (PMF) of the Bernoulli distribution is given by:

P(X = 1) = p
P(X = 0) = 1-p

where X is a random variable representing the outcome of the experiment.

The main difference between Bernoulli distribution and binomial distribution is that Bernoulli distribution models a single trial or experiment with two possible outcomes, while binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.

Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

Ans. To find the probability that a randomly selected observation from a normally distributed dataset with mean 50 and standard deviation 10 will be greater than 60, we need to calculate the z-score first. The z-score measures how many standard deviations an observation is away from the mean.

The formula for the z-score is:

z = (x - μ) / σ

where x is the value we are interested in, μ is the mean, and σ is the standard deviation.

Substituting the values, we get:

z = (60 - 50) / 10 = 1

Now, we can use a standard normal distribution table or calculator to find the probability that a z-score is greater than 1. From the table, we find that the probability of a z-score greater than 1 is approximately 0.1587.

Therefore, the probability that a randomly selected observation from the given dataset will be greater than 60 is approximately 0.1587 or 15.87%.

Q7. Explain uniform Distribution with an example.

Ans. Uniform distribution is a continuous probability distribution where all values within a given interval have an equal probability of occurring. This means that the probability density function (PDF) of a uniform distribution is constant within the interval and zero outside the interval. The uniform distribution is often used when we assume that the likelihood of an event occurring is equally likely across the range of possible outcomes.

An example of uniform distribution could be the roll of a fair six-sided die. Each of the six numbers on the die has an equal chance of being rolled, so the probability of rolling any given number is 1/6. In this case, the interval is [1, 6], as the die can only show one of the six numbers within that range.

The PDF of a uniform distribution over the interval [a, b] can be expressed as:

f(x) = 1 / (b - a) if a <= x <= b
f(x) = 0 otherwise

where x is a random variable that takes on values within the interval [a, b]. 


Q8: What is the z score? State the importance of the z score.

Ans. The z-score, also known as the standard score, is a statistical measurement that indicates how many standard deviations an observation or data point is from the mean of a distribution. It is calculated as the difference between the value of the observation and the mean of the distribution, divided by the standard deviation of the distribution.

The formula for calculating the z-score is:

z = (x - μ) / σ

where x is the observation, μ is the mean of the distribution, and σ is the standard deviation of the distribution.

The z-score is important because it allows us to compare observations from different distributions on a common scale. By standardizing the data using the z-score, we can easily see how many standard deviations a particular observation is from the mean of its distribution. A z-score of 0 indicates that the observation is at the mean of the distribution, while a positive z-score indicates that the observation is above the mean, and a negative z-score indicates that the observation is below the mean.

The z-score is also used to calculate probabilities associated with a particular observation or range of observations. Since the standard normal distribution (i.e., a normal distribution with a mean of 0 and a standard deviation of 1) is a commonly used distribution, we can use the z-score to determine the probability of observing a particular value or range of values from any normal distribution. By looking up the probability associated with a particular z-score in a standard normal distribution table or using software, we can determine the probability of observing a particular value or range of values in any normal distribution.

Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.

Ans. The Central Limit Theorem (CLT) is a fundamental concept in probability theory and statistics. It states that if we have a large sample size from any population, the sampling distribution of the sample mean will tend to be normally distributed, regardless of the shape of the population distribution, provided that the sample size is large enough. Specifically, the CLT states that the mean of a large random sample will be approximately normally distributed, with a mean equal to the population mean, and a standard deviation equal to the population standard deviation divided by the square root of the sample size.

The significance of the Central Limit Theorem is that it allows us to make statistical inferences about a population based on a sample, even if we do not know the distribution of the population. For example, we can estimate the population mean or the proportion of the population with a certain characteristic using a sample mean or proportion, respectively, and calculate the margin of error and confidence intervals around these estimates. This is because the CLT guarantees that the sampling distribution of the sample mean or proportion will be approximately normal, regardless of the distribution of the population.

Q10: State the assumptions of the Central Limit Theorem.

Ans. The Central Limit Theorem (CLT) relies on the following assumptions:

Random sampling: The samples must be drawn at random from the population of interest, with each individual having an equal chance of being included in the sample.

Independence: The observations in the sample must be independent of each other. This means that the value of one observation does not affect the value of another observation in the sample.

Finite population or large sample size: The CLT holds true for both finite populations and infinitely large populations. However, for finite populations, the sample size must be small relative to the population size. In practice, this means that the sample size should be less than 10% of the population size.

Sample size: The sample size should be large enough so that the sampling distribution of the sample mean approaches a normal distribution. A commonly used rule of thumb is that the sample size should be at least 30.

If these assumptions are met, then the CLT can be applied to estimate the parameters of interest for the population, such as the population mean or proportion, using the sample mean or proportion.