## Q1. What is the Probability density function?


The Probability Density Function (PDF) is a theoretical function that represents the likelihood of a continuous random variable taking on a particular value. Here are the key theoretical points to understand about the PDF:

1. **Definition**:
   - The PDF for a continuous random variable is denoted as \( f(x) \). 
   - The PDF is designed so that the probability that the random variable falls within a certain interval \([a, b]\) is given by the integral of the PDF over that interval. In simpler terms, you sum up the values of the PDF over the range from \(a\) to \(b\).

2. **Properties**:
   - **Non-negativity**: The PDF is always greater than or equal to zero for all possible values of the random variable. This means that the function never dips below the horizontal axis.
   - **Normalization**: The total area under the PDF curve is equal to one. This ensures that the total probability across all possible values of the random variable is one, meaning that the random variable must take on some value within the defined range.

3. **Interpretation**:
   - The value of the PDF at a specific point does not give the probability that the random variable is exactly equal to that point. Instead, it indicates how likely the random variable is to be near that point compared to other points. The higher the value of the PDF at a point, the more likely the random variable is to be close to that point.
   - For continuous random variables, probabilities are given for intervals rather than specific values. This is because the probability of the random variable taking any exact value is theoretically zero.

4. **Examples**:
   - **Normal Distribution**: For a normally distributed random variable with a given mean and standard deviation, the PDF describes the familiar bell-shaped curve. The highest point of the curve corresponds to the mean of the distribution.
   - **Exponential Distribution**: For an exponentially distributed random variable with a given rate parameter, the PDF describes a curve that starts high at zero and gradually decreases, indicating that smaller values are more likely.

5. **Usage**:
   - The PDF is used in many fields such as statistics, physics, engineering, and finance to model and analyze continuous data.
   - It is a crucial concept in probability theory and statistical inference, helping to understand the behavior and distribution of continuous random variables.

In summary, the Probability Density Function is a fundamental theoretical tool for describing how continuous random variables are distributed and for calculating probabilities associated with them.


## Q2. What are the types of Probability distribution?

Probability distributions are mathematical functions that describe the likelihood of different outcomes in an experiment. They can be broadly categorized into two main types: discrete and continuous distributions. Here are some of the key types under each category:

### Discrete Probability Distributions

1. **Binomial Distribution**:
   - Describes the number of successes in a fixed number of independent Bernoulli trials (yes/no experiments), each with the same probability of success.
   - Example: Flipping a coin 10 times and counting the number of heads.

2. **Poisson Distribution**:
   - Describes the number of events occurring in a fixed interval of time or space, assuming the events occur independently and at a constant average rate.
   - Example: The number of emails received in an hour.

3. **Geometric Distribution**:
   - Describes the number of trials needed to get the first success in a series of independent Bernoulli trials.
   - Example: The number of times you roll a die until you get a six.

4. **Negative Binomial Distribution**:
   - Generalizes the geometric distribution to describe the number of trials needed to achieve a specified number of successes.
   - Example: The number of times you roll a die until you get three sixes.

### Continuous Probability Distributions

1. **Normal Distribution**:
   - Describes data that cluster around a mean. The curve is symmetric and bell-shaped.
   - Example: Heights of people, standardized test scores.

2. **Exponential Distribution**:
   - Describes the time between events in a Poisson process, where events occur continuously and independently at a constant average rate.
   - Example: The time until the next earthquake in a region.

3. **Uniform Distribution**:
   - Describes an equal probability for all values within a given interval.
   - Example: The probability of rolling any single number on a fair die.

4. **Gamma Distribution**:
   - Generalizes the exponential distribution and is used to model waiting times with shape and scale parameters.
   - Example: The time required for a certain number of events to occur.

5. **Beta Distribution**:
   - Used to model random variables that are bounded between 0 and 1, useful in Bayesian statistics.
   - Example: The distribution of probabilities in a Bayesian context.

6. **Chi-Square Distribution**:
   - Used primarily in hypothesis testing and in constructing confidence intervals for variance.
   - Example: In a chi-square test for independence.

### Special Cases and Derived Distributions

1. **Bernoulli Distribution**:
   - The simplest discrete distribution, describing a single experiment with two possible outcomes (success/failure).
   - Example: Flipping a single coin.

2. **Multinomial Distribution**:
   - Generalizes the binomial distribution for experiments with more than two possible outcomes.
   - Example: Rolling a die multiple times.

3. **Log-Normal Distribution**:
   - Describes a variable whose logarithm is normally distributed.
   - Example: Modeling stock prices.

4. **Cauchy Distribution**:
   - Describes distributions with heavy tails, meaning they have a higher probability of extreme values.
   - Example: Certain resonance behaviors in physics.

These distributions form the basis of many statistical methods and are essential for understanding and modeling random phenomena in various fields.

## Q3. Write a Python function to calculate the probability density function of a normal distribution with given mean and standard deviation at a given point.

In [None]:
import math

def normal_pdf(x, mean, std_dev):
    """
    Calculate the probability density function of a normal distribution.

    Parameters:
    x (float): The point at which to evaluate the PDF.
    mean (float): The mean (μ) of the normal distribution.
    std_dev (float): The standard deviation (σ) of the normal distribution.

    Returns:
    float: The value of the PDF at point x.
    """
    coefficient = 1 / (std_dev * math.sqrt(2 * math.pi))
    exponent = math.exp(-0.5 * ((x - mean) / std_dev) ** 2)
    return coefficient * exponent

# Example usage
x = 1.0
mean = 0.0
std_dev = 1.0
pdf_value = normal_pdf(x, mean, std_dev)
print(f"The PDF value at x = {x} for a normal distribution with mean = {mean} and std_dev = {std_dev} is {pdf_value}")


## Q4. What are the properties of Binomial distribution? Give two examples of events where binomial distribution can be applied.
### Properties of Binomial Distribution

1. **Discrete Distribution**:
   - The binomial distribution deals with scenarios where outcomes can only be counted in whole numbers.

2. **Two Possible Outcomes**:
   - Each trial in a binomial experiment results in one of two outcomes, often referred to as "success" and "failure."

3. **Fixed Number of Trials**:
   - The number of trials is predetermined and does not change. For example, you might flip a coin 10 times or inspect 50 items.

4. **Independence**:
   - The outcome of any given trial does not affect the outcomes of the other trials. Each trial is independent.

5. **Constant Probability of Success**:
   - The probability of achieving a "success" is the same for each trial. This probability does not change from trial to trial.

6. **Probability Mass Function (PMF)**:
   - The binomial distribution provides the probabilities of obtaining different numbers of successes out of the fixed number of trials. It allows you to calculate the likelihood of getting exactly 0 successes, exactly 1 success, exactly 2 successes, and so on, up to the total number of trials.

7. **Mean and Variance**:
   - The mean (or average) number of successes can be determined, giving an expectation of how many successes you would get in many repeated sets of trials. The variance tells you how spread out the number of successes is likely to be around this mean.

8. **Symmetry**:
   - If the probability of success is exactly 0.5, the distribution is symmetric, meaning it looks the same on both sides of the mean. If the probability is not 0.5, the distribution is skewed, meaning it leans more to one side.

### Examples of Events Where Binomial Distribution Can Be Applied

1. **Quality Control in Manufacturing**:
   - Imagine a factory that produces light bulbs. Each bulb can either be defective (failure) or not defective (success). If you test a fixed number of bulbs, say 100, the binomial distribution can be used to model the number of defective bulbs. Each bulb is an independent trial, and the probability of a bulb being defective remains constant.

2. **Survey Responses**:
   - Suppose a political survey asks a yes/no question about whether respondents support a particular policy. If you survey a fixed number of people, say 500, and each person independently gives a yes or no answer, the binomial distribution can be used to model the number of people who say "yes". Each response is an independent trial, and the probability of a "yes" response is constant for each person surveyed.

### Summary

The binomial distribution is used to model scenarios where you have a fixed number of trials, each with two possible outcomes (success or failure), with each trial being independent and having a constant probability of success. It helps in understanding and calculating the probabilities of different numbers of successes in such scenarios, making it applicable to various real-world situations like quality control in manufacturing and analyzing survey responses.


## Q5. Generate a random sample of size 1000 from a binomial distribution with probability of success 0.4 and plot a histogram of the results using matplotlib.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters for the binomial distribution
n_trials = 10  # number of trials
p_success = 0.4  # probability of success
sample_size = 1000  # size of the sample

# Generate random sample
sample = np.random.binomial(n=n_trials, p=p_success, size=sample_size)

# Plot histogram
plt.hist(sample, bins=range(n_trials + 2), edgecolor='black', align='left')
plt.title('Histogram of Binomial Distribution Sample')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.xticks(range(n_trials + 1))
plt.show()


## Q6. Write a Python function to calculate the cumulative distribution function of a Poisson distribution with given mean at a given point.

In [None]:
from scipy.stats import poisson

def poisson_cdf(mean, k):
    """
    Calculate the cumulative distribution function of a Poisson distribution.

    Parameters:
    mean (float): The mean (λ) of the Poisson distribution.
    k (int): The point at which to evaluate the CDF.

    Returns:
    float: The value of the CDF at point k.
    """
    return poisson.cdf(k, mean)

# Example usage
mean = 3.0
k = 5
cdf_value = poisson_cdf(mean, k)
print(f"The CDF value at k = {k} for a Poisson distribution with mean = {mean} is {cdf_value:.4f}")


## Q7. How Binomial distribution different from Poisson distribution?

The Binomial and Poisson distributions are both used to model different types of discrete random events, but they have distinct characteristics and are used in different scenarios.

### Key Differences Between Binomial and Poisson Distributions

1. **Nature of Trials**:
   - **Binomial Distribution**: Deals with a fixed number of trials or experiments. Each trial has two possible outcomes (success or failure), and the trials are independent of each other. It is used when the number of trials is known and fixed.
   - **Poisson Distribution**: Models the number of events occurring within a fixed interval of time or space. The number of events is not fixed in advance, and it is used when events happen independently at a constant average rate.

2. **Probability of Success**:
   - **Binomial Distribution**: The probability of success is constant across all trials. Each trial has the same probability of success.
   - **Poisson Distribution**: The probability of success is implicitly embedded in the rate of occurrence (mean) of events. The distribution does not require a fixed probability of success per trial.

3. **Number of Trials**:
   - **Binomial Distribution**: Requires a specified number of trials. For instance, you might flip a coin 10 times and want to know the probability of getting exactly 4 heads.
   - **Poisson Distribution**: Does not require a fixed number of trials. It is used for counting the number of occurrences of an event over a continuous range, such as the number of phone calls received at a call center in an hour.

4. **Application**:
   - **Binomial Distribution**: Useful in scenarios where there is a fixed number of trials and a known probability of success, such as quality control testing, survey responses, or any situation with a predetermined number of attempts.
   - **Poisson Distribution**: Suitable for modeling rare events that happen over a fixed interval of time or space, such as the number of accidents at a traffic intersection, the number of emails received in a day, or the number of mutations in a given length of DNA.

5. **Distribution Shape and Behavior**:
   - **Binomial Distribution**: The shape can vary depending on the number of trials and the probability of success. It can be symmetric or skewed.
   - **Poisson Distribution**: Typically skewed, especially when the average rate of occurrence is low. As the mean rate increases, the distribution becomes more symmetric and resembles a normal distribution.

In summary, the Binomial distribution is used for a fixed number of trials with a constant probability of success, while the Poisson distribution is used for counting occurrences of events over a continuous interval with a constant average rate.

## Q8. Generate a random sample of size 1000 from a Poisson distribution with mean 5 and calculate the sample mean and variance.

In [None]:
import numpy as np

# Parameters for the Poisson distribution
mean = 5
sample_size = 1000

# Generate random sample from Poisson distribution
sample = np.random.poisson(lam=mean, size=sample_size)

# Calculate sample mean and variance
sample_mean = np.mean(sample)
sample_variance = np.var(sample)

print(f"Sample Mean: {sample_mean:.4f}")
print(f"Sample Variance: {sample_variance:.4f}")


## Q9. How mean and variance are related in Binomial distribution and Poisson distribution?
In both the Binomial and Poisson distributions, the mean and variance are related, but their relationship differs due to the nature of each distribution.

### Binomial Distribution

1. **Mean and Variance**:
   - **Mean**: The mean (or expected value) of a binomial distribution is given by the product of the number of trials \(n\) and the probability of success \(p\).
   - **Variance**: The variance of a binomial distribution is given by the product of the number of trials \(n\), the probability of success \(p\), and the probability of failure \((1 - p)\).

   **Relationship**:
   - The variance is directly proportional to the mean, but it also depends on the probability of success and failure. Specifically, the variance is always less than or equal to the mean if \(p\) is between 0 and 1.

### Poisson Distribution

1. **Mean and Variance**:
   - **Mean**: The mean (or expected value) of a Poisson distribution is equal to the rate parameter \( \lambda \), which represents the average number of occurrences in a fixed interval of time or space.
   - **Variance**: The variance of a Poisson distribution is also equal to the rate parameter \( \lambda \).

   **Relationship**:
   - In a Poisson distribution, the mean and variance are exactly equal. This means that the dispersion of the distribution is directly related to the average rate of occurrences.

### Summary

- **Binomial Distribution**: The variance is related to the mean but is influenced by both the probability of success and the probability of failure. The variance generally depends on how many trials are conducted and the probability of success for each trial.

- **Poisson Distribution**: The mean and variance are always equal, reflecting a distribution where the number of occurrences is both on average and in variability dictated by the same parameter \( \lambda \). 

In summary, while the mean and variance are connected in both distributions, the specific nature of their relationship differs: the Poisson distribution has equal mean and variance, while the Binomial distribution's variance is influenced by additional factors beyond just the mean.

## Q10. In normal distribution with respect to mean position, where does the least frequent data appear?

In a normal distribution, the least frequent data points appear at the extremes, farthest from the mean. 

### Explanation:

- **Mean Position**: In a normal distribution, the mean is at the center of the distribution, and the data is symmetrically distributed around this mean.
  
- **Frequency Distribution**: The frequency of data points follows a bell-shaped curve. Data points that are closer to the mean occur more frequently, while those further away from the mean occur less frequently.

- **Least Frequent Data**: The data points that are farthest from the mean (i.e., in the tails of the distribution) are the least frequent. As you move away from the mean towards either end of the distribution, the probability density decreases. This results in fewer occurrences of extreme values compared to values closer to the mean.

### Summary

In a normal distribution, the least frequent data points are located at the tails, which are far from the mean. The frequency of data points decreases as you move away from the center of the distribution towards the extremes.