# Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

Probability Mass Function (PMF):

PMF is used for discrete random variables.
It assigns probabilities to each possible outcome.
Example: Rolling a fair six-sided die, the PMF would assign a probability of 1/6 to each of the six possible outcomes (1, 2, 3, 4, 5, 6).
Probability Density Function (PDF):

PDF is used for continuous random variables.
It represents the probability of a random variable falling within a particular range.
Example: The PDF of a standard normal distribution (bell-shaped curve) shows the probability of a continuous variable taking on various values within the range.

PMF and PDF are both fundamental concepts in probability and statistics used to describe the probability distribution of random variables.

# Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

Cumulative Density Function (CDF):

CDF represents the cumulative probability that a random variable takes on a value less than or equal to a given value.
It's used for both discrete and continuous random variables.
Example: For a six-sided die, the CDF would show how the probability accumulates as you move from 1 to 2, 2 to 3, and so on. For a continuous variable like height, the CDF would indicate the probability of being shorter than a specific height.
Why CDF is used:

CDF provides a comprehensive view of the distribution, allowing you to find probabilities for specific ranges or values.
It's useful for calculating percentiles, determining the probability of an event occurring within a certain range, and making various statistical inferences.

# Q3: What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

Examples of situations where the normal distribution might be used as a model include:

Height of individuals: The heights of people in a population often follow a roughly normal distribution.
Test scores: Test scores on standardized exams, when many people take the test, often approximate a normal distribution.
Measurement errors: In scientific experiments, measurement errors can often be modeled as normally distributed.
Stock market returns: Daily returns of stock prices often exhibit a close-to-normal distribution.
The shape of the normal distribution is characterized by two parameters:

Mean (μ): It represents the central location of the distribution. It's the average value, and it's also the peak of the bell-shaped curve.
Standard Deviation (σ): It determines the spread or dispersion of the distribution. A larger standard deviation leads to a wider and flatter curve, while a smaller standard deviation results in a narrower and taller curve.
Together, the mean and standard deviation uniquely define the normal distribution, and they determine how data is distributed around the mean.

# Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.

The importance of the Normal Distribution lies in its widespread applicability in various real-life scenarios, thanks to the Central Limit Theorem. Here are some key points:

Statistical Inference: Many statistical methods and hypothesis tests are based on the assumption of a normal distribution, allowing us to make accurate predictions and draw meaningful conclusions.

Risk Assessment: It's used in risk analysis and financial modeling to understand the likelihood of certain events, such as stock price movements or loan default rates.

Quality Control: In manufacturing, it helps in monitoring and controlling product quality by modeling the distribution of measurements, such as product dimensions or defects.

Biological and Natural Phenomena: Numerous biological and natural phenomena, like human height, IQ scores, and rainfall amounts, often follow a normal distribution.

Psychology and Social Sciences: In psychology, test scores and personality traits are often assumed to be normally distributed for research and assessment purposes.

# Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?

Bernoulli Distribution:

The Bernoulli Distribution is a probability distribution that represents a random experiment with two possible outcomes: success (usually denoted as 1) and failure (usually denoted as 0). It's characterized by a single parameter, denoted as "p," which represents the probability of success. The probability mass function (PMF) of a Bernoulli distribution is given by:

P(X = 1) = p
P(X = 0) = 1 - p

Example: Consider flipping a fair coin, where "Heads" is considered a success (1) and "Tails" is considered a failure (0). The Bernoulli distribution for this scenario has p = 0.5 because there's an equal probability of getting Heads or Tails.

Difference between Bernoulli and Binomial Distributions:

Number of Trials:

Bernoulli Distribution: Represents a single trial or experiment with two possible outcomes (success or failure).
Binomial Distribution: Represents the number of successes (1s) in a fixed number of independent Bernoulli trials (experiments).
Parameters:

Bernoulli Distribution: Has one parameter (p) representing the probability of success in a single trial.
Binomial Distribution: Has two parameters: "n" (the number of trials) and "p" (the probability of success in each trial).
Random Variable:

Bernoulli Distribution: Represents the outcome of a single trial (0 or 1).
Binomial Distribution: Represents the total count of successes in "n" trials (can take values from 0 to n).
Probability Mass Function (PMF):

Bernoulli Distribution: Describes the probability of success (1) and failure (0) in a single trial.
Binomial Distribution: Describes the probability of getting a specific number of successes (k) in "n" trials.

# Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

In [1]:
import scipy.stats as stats

mean = 50
std_dev = 10
value = 60

z = (value - mean) / std_dev

# Calculate the probability using the cumulative distribution function (CDF)
probability = 1 - stats.norm.cdf(z)

# Print the result
print("The probability that a randomly selected observation will be greater than 60 is:", probability)

The probability that a randomly selected observation will be greater than 60 is: 0.15865525393145707


# Q7: Explain uniform Distribution with an example.

Uniform Distribution is a probability distribution where all possible outcomes are equally likely. In other words, every value within a specified range has the same probability of occurring.

Example: Rolling a fair six-sided die is an example of a discrete uniform distribution. Each of the six sides has an equal probability of 1/6 of showing up, making it a uniform distribution.

# Q8: What is the z score? State the importance of the z score.

A z-score is a statistical measure that indicates how many standard deviations a data point is from the mean of a dataset. Calculated as (X - μ) / σ, where X is the data point, μ is the mean, and σ is the standard deviation, it standardizes data for comparison and analysis. 

The importance of z-scores lies in their role in standardizing data, facilitating cross-dataset comparisons, identifying outliers, aiding in statistical analysis and hypothesis testing, and normalizing data for machine learning. They provide a standardized way to assess the relative position and significance of data points, making them a fundamental tool in statistics and data analysis.

# Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.

The Central Limit Theorem (CLT) is a statistical principle stating that the sampling distribution of the sample mean (or other sample statistics) from a sufficiently large random sample, drawn from any population, will approximate a normal distribution, regardless of the population's original shape, under certain conditions.

Significance:

Statistical Inference: It enables the use of normal distribution for making inferences about population parameters, even when the population distribution is unknown.

Hypothesis Testing: It forms the basis for many hypothesis tests and confidence interval calculations.

Real-world Data: CLT is vital for analyzing and understanding real-world data, where data often approximates a normal distribution, making statistical analysis more applicable.

Quality Control: In manufacturing and quality control, it helps monitor and maintain product quality.

Predictive Modeling: In fields like finance, it underpins risk assessment and predictive modeling.

In summary, the Central Limit Theorem is a fundamental concept with wide-ranging applications in statistics, enabling more robust and meaningful data analysis and inference.

# Q10: State the assumptions of the Central Limit Theorem.

The Central Limit Theorem (CLT) relies on several key assumptions:

Random Sampling: Samples must be drawn randomly from the population, ensuring each item in the population has an equal chance of being included.

Independence: Samples should be independent of each other. The selection of one item should not influence the selection of another.

Sample Size: The sample size should be sufficiently large. While there's no fixed rule, a common guideline is a sample size of at least 30 or more for the CLT to apply.

These assumptions are essential for the CLT to hold and for the sampling distribution of the sample mean to approximate a normal distribution, regardless of the population's shape.