# Probability Distribution

**Agenda**
1. Introduction to Probability Distributions

2. Types of Probability Distributions

3. Discrete Probability Distribution
   * Bernoulli Distribution
   * Binomial Distribution
   * Poisson Distribution
   * Geometric Distribution
   * Negative Binomial Distribution

4. Continuous Probability Distribution
   * Uniform Distribution
   * Normal (Gaussian) Distribution
   * Exponential Distribution
   * Gamma Distribution
   * Beta Distribution
   * Chi-Square Distribution
5. Advanced Topics
    - Central Limit Theorem
    - Relationship Between Distributions
    - Sampling Distributions
    - Student's t-Distribution
    - F-Distribution
    - Bayesian Statistics
    - Lognormal Distribution
    - Pareto Distribution

### In class Notes
- A random variable is a variable that takes on numerical values as a result of a random experiment or measurement; associates a numerical value with each possible outcome.
- The differences between variable and random variable are:
    • Random variable always takes numerical values
    • There is a probability associated with each possible values
- Types of random variables:
    - Discrete Random Variable
    - Continuous Random Variable
- Examples of random variables:
    - Discrete Random Variable:
        1. X= Number of correct answers in a 100-MCQ test= 0, 1, 2, …, 100
        2. X= Number of cars passing a toll both in a day= 0, 1, 2, …, ∞
        3. X= Number of balls required to take the first wicket = 1, 2, 3, …, ∞
    - Continuous Random Variable:
        1. X= Weight of a person. 0<X<∞
        2. X= Monthly Profit. -∞<X<∞
- Probability Distribution: 
    - Distribution of the probabilities among the different values of a random variable.
    - A mathematical function, which yields the probabilities of different possible outcomes for an experiment.
    - 
- Probability distributions describe how the values of a random variable are distributed. They can be broadly categorized into two types: discrete and continuous distributions, depending on whether the random variable is discrete or continuous.

    - Discrete Probability Distribution: 
        - probability distribution of a discrete random variable
        - Deals with discrete random variables (e.g., number of heads in a coin toss).
    - Continuous Probability Distribution: 
        - probability distribution of a continuous random variable
        - Deals with continuous random variables (e.g., height, weight).

- Types of probability distributions:
    - The type of probability distribution depends on the data.
    - Data can either be discrete or continuous
    - probability mass function (PMF): 
        - The probability distribution function of a discrete random variable X is called a pmf and is denoted by p(x)
        - Can be used to calculate the probability for discrete data.
        - eg: User will click an ad from a total of three available ads.
    - Probability Density Function (PDF):
        - The probability distribution function of a continuous random variable X is called a pdf and is denoted by f(x)
        - Returns the likelihood of a continuous random variable.
        - Gives the density over an interval.
        - eg: Probability of a stock price valued between a certain range.
    - Cumulative Distribution Function: describes the probability of a random variable being less than or equal to a specified value.
        - CDFs are used to determine probabilities, calculate percentiles, and understand the distribution of data. 
        - The upper limit of the CDF is equal to 1, meaning the sum of all the probabilities can not exceed 1. And the area under the distribution curve of the CDF is always 1.
        - Answers the question of how much of the data's probability has been accumulated.
        - The gradient of the CDF is the PDF
        - The area to the left of a point on the PDF is the CDF value corresponding to that point.
        - ex: The percentage of customers spending below the threshold amount.
    - Relationship between PDF and CDF
        - PDF => f(x)
        - CDF => F(x)
        - $\frac{dF(x)}{dx} = f(x)$
            - The formula $\frac{dF(x)}{dx} = f(x)$ expresses the relationship between the cumulative distribution function (CDF), $F(x)$, and the probability density function (PDF), $f(x)$, for a continuous random variable. 

            - The CDF, $F(x)$, gives the probability that the random variable $X$ is less than or equal to $x$.
            - The PDF, $f(x)$, describes the likelihood of $X$ taking a specific value (in a density sense).

            **Relationship:**  
            The PDF is the derivative of the CDF. In other words, the rate at which the cumulative probability increases with $x$ is given by the value of the PDF at $x$. Conversely, the CDF is the integral (accumulation) of the PDF up to $x$:
            $$
            F(x) = \int_{-\infty}^{x} f(x) \, dx
            $$

            This means:
            - The PDF shows how probability is distributed locally (at each point).
            - The CDF shows the total probability accumulated up to a point.  
            - The area under the PDF curve up to $x$ equals the CDF at $x$.
    - Expectation = Weighted average of possible outcomes.
    - Mathematical Expectations of Discrete Random variables
        - For a discrete random variable X with pmf p(x), the mathematical expectation of X is-
            $$
            E[X] = \sum_{x} x \cdot p(x)
            $$
        - 
    - Mathematical Expectations of Continuous Random Variable
        - For a continuous random variable X with pdf f(x), the mathematical expectation of X is:
            $$
            E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx
            $$
        - Here, values flow continuously, not in jumps. Replace the sum with an integral
    - Mathematical expectation is also known as population mean or expected value.
    - Discrete uses sums and probability, continuous uses integrals and likelihood.
    - Both give you the “center” or average value you expect in the long run .
    - Mathematical Expectation of \( $X^2$ \):
        - For a discrete random variable \( X \) with pmf \( p(x) \):
            $$
            E[X^2] = \sum_{x} x^2 \cdot p(x)
            $$
        - For a continuous random variable \( X \) with pdf \( f(x) \):
            $$
            E[X^2] = \int_{-\infty}^{\infty} x^2 \cdot f(x) \, dx
            $$
    - Variance:
        $$
        \mathrm{Var}(X) = E[X^2] - (E[X])^2
        $$
    - Standard Deviation:
        $$
        \sigma_X = \sqrt{\mathrm{Var}(X)}
        $$
    - Properties of Mathematical Expectations
        - ![image.png](attachment:image.png)

    - Common Types of Discrete Value Distribution:
        1. Bernoullis Distribution:
            - Bernoulli Trial : A trial that has only two possible outcomes (often called ‘Success’ and ‘Failure’)
            - Single event binary outcome.
            - ex: Success or failure of a product in the market. (1 or 0)
        2. Binomial Distribution
            - Multiple bernoullis
            - Multiple independent events and binary outcome
            - ex: Multiple defected products in a batch
        3. Poisson Distribution
            - Modeling rare events over time or space
            - Events occur over a fixed interval of time.
            - ex: Attacks in the next hour.
        4. Negative Binomial Distribution
            - Sequence of independent bernoullis.
            - Useful when the number of trials are not fixed.
            - ex: Customer churn
    - Common Types of Continuous Value Distribution:
        1. Uniform Distribution
            - ![image-3.png](attachment:image-3.png)
            - f(x) is constant over the range of x.
        2. Normal (Gaussian) Distribution
            - ![image-2.png](attachment:image-2.png)
            - Follows bell curve
            - PDF
                - $$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^-\frac{(x-\mu)^2}{2\sigma^2}$$
                - The 2 influential parameters are 
                    - $\mu$ or the mean
                    - $\sigma$ or the standard deviation
            - 
        3. Exponential Distribution
            - ![image-4.png](attachment:image-4.png)
            - A continuous random variable X is said to follow exponential distribution, if its pdf is:
            $$
            f(x) = 
            \begin{cases}
            \lambda e^{-\lambda x}, & x \geq 0 \\
            0, & x < 0
            \end{cases}
            $$
            - The Exponential distribution is a continuous probability distribution often used to model the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. It is widely used in various fields, including reliability analysis, queuing theory, and survival analysis.
            - Memoryless Property: The Exponential distribution is memoryless, meaning that the probability of an event occurring in the future is independent of how much time has already passed.

            - X is usually time until certain event occurs.
        4. Log normal distribution
            - Modelling positively skewed data.

- Probabilities and percentiles are found by integrating the probability density function.
- Deriving the mean and variance also requires integration.

- Central Limit Theorem
    - " as n increases, the distribution of the sample mean or sum approaches a normal distribution "
    - Central Limit Theorem states that the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases, regardless of the population's distribution




In [1]:
1**0

1