## Probability For Data Science And ML:

![alt text](../Images/math/probability/probability.png)

**Contents:**

- Introduction to Probability
- Probability Rules
- Conditional Probability
- Bayes Theorem
- Combination
- Permutation
- Probability Distribution

### Introduction To Probability:

**Probability:** How likely is the event to occur.

$$
P(A) =\frac  {number\;times\;A\;occures}{total\; number \;of \;possible\; outcomes}
$$

- P(A): denotes the probability of  event A occurring
- The range of probability is between 0 and 1

**Basic Concepts:**

- **Experiment**: Any process that generates well-defined outcomes
- **Outcome**: A result of an experiment
- **Sample Space (S)**: The set of all possible outcomes of an experiment
- **Event**: A subset of the sample space, representing a collection of outcomes

### **Probability Rules**:

1. **Addition Rule:**
- **Disjoint Events (Mutually Exclusive)**: If two events, A and B, are mutually exclusive (they cannot both occur simultaneously), then the probability of either event A or event B occurring is the sum of their individual probabilities:
*P*(*A*∪*B*)=*P*(*A*)+*P*(*B*)
    - *P*(*A*∪*B*) is the probability that either event A or event B (or both) occurs
    - *P*(*A*) is the probability of event A occurring
    - *P*(*B*) is the probability of event B occurring
- **Non-Disjoint Events**: If events A and B are not mutually exclusive (they can both occur simultaneously), then the probability of either event A or event B occurring is the sum of their individual probabilities minus the probability of their intersection:
*P*(*A*∪*B*)=*P*(*A*)+*P*(*B*)−*P*(*A*∩*B*)
    - *P*(*A*∪*B*) is the probability that either event A or event B (or both) occurs
    - *P*(*A*) is the probability of event A occurring
    - *P*(*B*) is the probability of event B occurring
    - *P*(*A*∩*B*) is the probability of both event A and event B occurring simultaneously

1. **Multiplication Rule:** 
- **Independent Events**: If events A and B are independent (the occurrence of one event does not affect the occurrence of the other)(**Independent:** number of outcomes will not reduce), then the probability of both events A and B occurring is the product of their individual probabilities:
*P*(*A*∩*B*)=*P*(*A*)×*P*(*B*)
    - *P*(*A*∩*B*) is the probability of both A and B happening
    - *P*(*A*) is the probability of event A occurring
    - *P*(*B*) is the probability of event B occurring
- **Dependent Events**: If events A and B are dependent (the occurrence of one event affects the occurrence of the other), then the probability of both events A and B occurring is the product of the probability of A and the conditional probability of B given A:
*P*(*A*∩*B*)=*P*(*A*)×*P*(*B*∣*A*)
    - *P*(*A*∩*B*) is the probability of both A and B happening
    - *P*(*A*) is the probability of event A occurring
    - *P*(*B*) is the probability of event B occurring
    - *P*(*B*∣*A*) is the conditional probability of event B occurring given that event A has already occurred

![probchest.png](attachment:probchest.png)

### **Conditional Probability:**

Is defined as the likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome.

**formal:**

$$
𝑃(𝐵∣𝐴)=\frac {𝑃(𝐴∩𝐵)}{𝑃(𝐴)}
$$

- *P*(*B*∣*A*) The conditional probability of event B occurring given that event A has already occurred
- *P*(*A*∩*B*): The probability of both events A and B occurring simultaneously
- *P*(*A*): The probability of event A occurring

### Bayes Theorem: ****

The theorem mathematically describes the relationship between the probability of an event and the probability of its causes or conditions.

Bayes' Theorem states:

$$
𝑃(𝐴∣𝐵)=\frac {𝑃(𝐵∣𝐴)×𝑃(𝐴)}{𝑃(𝐵)}
$$

Where:

- *P*(*A*∣*B*) is the probability of event A occurring given that event B has occurred. This is called the posterior probability
- *P*(*B*∣*A*) is the probability of event B occurring given that event A has occurred. This is called the likelihood
- *P*(*A*) is the prior probability of event A occurring before observing event B
- *P*(*B*) is the probability of observing event B

### **Combination**:

Is a selection of items from a larger set, where the order of selection doesn't matter. Combinations are used when you want to count the number of ways to choose a subset of items from a larger set without regard to the order in which the items are chosen.

For example, if you have a set of 5 distinct items and you want to choose 3 of them, there are 𝐶(5,3) combinations.

![comb.png](attachment:comb.png)


### **Permutation:**

Refers to the arrangement of items from a larger set in a specific order. It's used to count the number of ways to arrange or order a subset of items from a larger set, taking into account the order in which the items are chosen.

For example, if you have a set of 5 distinct items and you want to arrange 3 of them in a specific order, there are 𝑃(5,3).

![perm.png](attachment:perm.png)

### Probability Distribution

⇒ gives the possibility of each outcome of a random experiment or event

### Binomial Distribution

- Describe multi trails
- Gives you only two results (success/failure)
- Understanding the binomial distribution is essential for analyzing experiments with binary outcomes and making predictions based on repeated trials

The formula for the probability mass function (PMF) of the binomial distribution, which gives the probability of observing exactly *k* successes in *n* trials, is:

![alt text](<../Images/math/probability/binomial formula.png>)

The above formula, allowing you to calculate the probability of different numbers of successes in a fixed number of independent trials with two possible outcomes.

Example: 

Suppose we have a biased coin, where the probability of getting heads (*H*) on any single flip is P=0.6. We want to find the probability of getting exactly 3 heads in 5 flips.

- *n*=5 (total number of flips)
- *k*=3 (number of heads)
- P=0.6 (probability of getting heads)
- q = (1−P)=(1−0.6)= 0.4 (probability of getting tails)

$$
P(X=3)=\binom{5}{3}×(0.6)^3×(0.4)^{5−3}
$$

$$
Using \;the\; binomial \;coefficient \;formula \binom{n}{k}=\frac{n!}{k!×(n−k)!}, we \;get:\\ \binom{5}{3}=\frac{5!}{3!×(5−3)!} = \frac{5 \times 4}{2 \times 1} = 10
$$

$$
Now, \;plugging \;in \;the \;values:\\
P(X=3)=10×(0.6)^3×(0.4)^2 = 0.3456
$$

`Note`: So, the probability of getting exactly 3 heads in 5 flips of the biased coin is approximately 0.3456 or 34.56%.

### Bernoulli Distribution

- Describe single trial
- It has two possible outcomes (success/failure)

**Probability Mass Function (PMF):** The probability mass function of the Bernoulli distribution gives the probability of observing a specific outcome *k*, and is given by the formula:

$$
 P(X=k) =
  \begin{cases}
    p     & \text{if $k = 1$}, \\
    q & \text{if $k = 0$}.
  \end{cases}
$$

### Probability Density Function (PDF)

- PDF is derivative of CDF
- Describe the probability distribution of a continuous random variable
- Provides information about the likelihood of the random variable taking on different values within a specific range
- It's commonly denoted as f(x), where *x* is the variable and f(x) gives the probability density at that point. The probability of the variable falling within a certain range is given by the area under the curve of the PDF within that range


![alt text](../Images/math/probability/Probability-Density-Function.png)

Mathematically, the PDF *f*(*x*) of a continuous random variable *X* is defined such that the probability of *X* falling in any interval[*a*,*b*] is given by the integral of *f*(*x*) over that interval:

$$
P(a≤X≤b)=\int_{a}^{b} f(x)\; dx
$$

### Cumulative Distribution Function (CDF)

- Describe the probability distribution of random variables, whether it is discrete or continuous
- Provides information about the probability that a random variable takes on a value less than or equal to a given value
- The CDF of a random variable X is a function that gives the probability that the random variable X will take on a value less than or equal to x

 Mathematically, for a continuous random variable, the CDF is given by:

$$
F(x)=P(X≤x)=\int_{−∞}^{x} f(t)\;dt
$$

where:

- *F*(*x*) is the cumulative distribution function
- *f*(*t*) is the probability density function (PDF) of *X*
- *x* is the value at which the CDF is evaluated

For a discrete random variable, the CDF is defined as:

$$
F(x)=P(X≤x)=\sum_{t≤x}f(t)
$$

- Where the sum is taken over all values of t less than or equal to x

`Note:`The CDF represents the probability distribution of the random variable *X* up to a certain point *x*. It starts from 0 and approaches 1 as x increases.

![alt text](../Images/math/probability/fdf_cdf.png)

### Probability Mass Function

- Probability of discrete variable
- It gives the probability that a discrete random variable is exactly equal to some value
- The Probability Mass Function (PMF) of a discrete random variable X is a function that gives the probability that X takes on a specific value x. Mathematically, it's denoted as p(x) = *P*(*X*=*x*), and it assigns a probability to each possible value that the random variable can take

![alt text](../Images/math/probability/pmf.png)

### Uniform Distribution

- Also known as the rectangular distribution
- The probability for all outcomes are the same
- The distribution is flat, with no peaks

![alt text](<../Images/math/probability/unifrom distribution.jpg>)
![alt text](../Images/math/probability/pdfformulas.webp)
![alt text](../Images/math/probability/cumluformula.webp)

### Normal Distribution or Gaussian Distribution (Bell Curve)

- Shows that data near the mean are more frequent in occurrences than data far from mean
- Mean: center of the bell
- Standard Deviation:  spread of the bell

The probability density function f(x) of the normal distribution is given by:

$$
f(x) = \frac{1}{{\sigma \sqrt {2\pi } }}e ^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}
$$

![alt text](../Images/math/probability/nd.png)

### Standard Normal Distribution

- Where mean is equal to zero and standard deviation is equal to one
- The standard normal distribution is extensively used in statistical hypothesis testing, particularly in calculating *p-*values

When *μ*=0 and S.D. =1, the normal distribution is referred to as the standard normal distribution. Its probability density function simplifies to:

$$
f(x) = \frac{1}{{\sqrt {2\pi } }}e ^{-\frac{1}{2}x^2}
$$

![alt text](../Images/math/probability/snd.png)

### Log-Norm Distribution

- This distribution is applicable when the logarithm of the variable is normally distributed
- The distribution is skewed to the right

![alt text](../Images/math/probability/lognormal-distribution.png)

### Chi-Squared Distribution

- The Chi-squared distribution is a probability distribution that arises when a normal distribution is squared and also in context of hypothesis testing and confidence interval construction. It's a special case of the gamma distribution

### Pareto Distribution

- The Pareto distribution is characterized by a power-law tail, meaning that it exhibits a heavy right tail
- It represents the distribution of wealth, income, or other quantities where a small number of entities hold the majority of the resources