# <center>PROBABILITY</center>

## Introduction to Probability

<b>What is Probability</b>
- Probability is simply how likely an event will occur.
- Whenever we're unsure about the outcome of an event, we can talk about the probabilities of how likely certain outcomes will occur.
- An important skill for data scientists using data affected by chance.
  
P(A) = $\frac{n}{N}$ = $\frac{outcomes\ in\ A}{outcomes\ in\ Sample\ Space}$

<br>

<b>Why Probability?</b>  
- With randomness existing everywhere, the use of probability theory allows for the analysis of chance events.
- Our Aim is to determine the likelihood of an event occuring, often using a numerical scale of between 0 and 1. With the number '0' indicating impossibility & '1' indicating certainty.
- Example: Tossing a coin has 50% probability for heads.

<br>

<b>Examples:</b>

- There are 6 balls, 3 are red, 2 are yellow and 1 is blue.  
    What is the probability of picking a yellow ball?  
    <u>solution</u>  
    P(Yellow) = $\frac{no.\ of\ Yellow\ balls}{total\ no.\ of\ balls}$
                = $\frac{2}{6}$ = 0.33
                
<br>
 
- There is a container full of coloured bottles, red, blue, green and orange. 
    Some of the bottles are picked out and displaced. A person did this 1000 times and got the following results:
    - No. of blue bottles picked: 300
    - No. of red bottles: 200
    - No. of green bottles: 450
    - No. of orange bottles: 50  
    What is the probability that he will pick green bottle?  
    <u>solution</u>  
    P(Green) = $\frac{frequency\ of\ Green\ bottles}{total\ frequency\ of\ bottles}$
                = $\frac{450}{1000}$ = 0.45  
                               
<br>

<b>Probability Theory: Case Study</b>  
- Probability theory is a tool employed by researchers, business, investment analysts and countless others for risk management and scenario analysis.
- Small Business:
    - If a business enterprise expects to receive between 500k and 700k in revenue each month, the linear graph will begin with 500k at the low end and end with 700k at the high end. For a typical probability distribution, the graph will resemble a bell curve, where the least likely outcomes fall nearer the extreme ends of the range and the most likely, nearer to the midpoint of the extremes.  
    
<br>

<b>Usage of Probability in Data Science:</b>
- Confidence intervals in statistics to know the probability of our data lying within the given intervals.
- Probability distribution:
    - Every type of data cannot be analyzed in the same way. Some data follow a Normal distribution, Poission distribution, Bernoulli distribution, Binomial distribution, while others follow Exponential distribution.
- Bayes Theorem - Naive Bayes Algorithm
- Conditional Probability
- Central Limit Theorem
- Markov Chains and Hidden Markov Chains.

##  Key Terminologies

- Random Experiment:
    - In ML, we mostly deal with uncertain events. Random Experiment is an experiment in which outcome is not known with certainty.

- Sample Space: 
    - Its a Universal set that consists of all possible outcomes of an experiment.
    - Example: Outcomes of College Application. S = {admitted, not admitted}

- Event:
    - Subset of sample space. Probability is usually calculated with respect to an event.
    - Example: Chances of getting head on a coin toss.

<br>

<b>Random Variable:</b>
- A variable whsoe value is unknown or a function that assigns values to each of an experiment's outcomes.
- It can be classified as discrete or continuous depending on the values it can take.
- If a random variable X can assume only countable infinite set of values, it is called discrete random variable.
- If a random variable X can take a value from an infinite set of values, it is called continuous random variable.
- Example: Outcome of a coin toss. If the random variable X is the number of heads we get from tossing two coins, the X could be 0, 1 or 2.

<br>

<b>Examples of Discrete Random Variable:</b>
- Credit Rating (finite category)
- No. of orders received.
- Customer churn: the percentage of customers that stopped using your company's product. (Yes or No)
- Fraud Detection (Yes or No)

<br>

<b>Examples of Continuous Random Variable:</b>
- Market share of a company (any value from an infinite set between 0 and 100).
- Percentage of attrition(reduction) of employees.
- Time to failure of an Engineering System.
- Time taken to complete the order.

## Rules of Probability

<b>Rule 1:</b>   
The probability of an impossible event is 0. The probability of a certain event is 1.  
Therefore, for any event A, the range of possible probabilities for all possible events is equal to 1.

<br>

<b>Rule 2:</b>  
For S, the sample space of all possiblilities, P(S) = 1.  
i.e. the sum of all the probabilities for all possible events is equal to 1.

<br>

<b>Rule 3:</b>  
For any event A, P(A') = 1 - P(A).  
Similarly, P(A) = 1 - P(A').

<br>

<b>Rule 4 (Addition Rule OR):</b>  
This is the probability that either one or both events occur.
- If two events, say A and B, are mutually exclusive,  
    i.e. A and B have no outcomes in common then,  
    P(A ∪ B) = P(A) + P(B)
    
- If two events are NOT mutually exclusive then,   
    P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

<br>

<b>Rule 5 (Multiplication Rule AND):</b>  
This is the probability that both events occur.
- P(A ∩ B) = P(A)*P(B|A) or P(B)*P(A|B)
    .
- If A and B are independent, neither event influences or affects the probability that the other event occurs  
    P(A ∩ B) = P(A)*P(B).  
    This particular rule extends to more than two independent events.  
    For example, P(A ∩ B ∩ C) = P(A)*P(B)*P(C)
    
<br>

<b>Rule 6 (Conditional Probability):</b>  
- P(A|B) = $\frac{A\ and\ B}{P(B)}$ or P(B|A) = $\frac{A\ and\ B}{P(A)}$  
    Note: this straight line symbol, |, does not mean divide!  
    This symbols means "conditional" or "given".  
    For instance P(A|B) means the probability that event A occurs given event B has occurred.

## Marginal, Joint & Conditional Probability

<b>Case Study: Netflix:</b>  
Let us look at the frequency table below.  
The table shows the frequency of male and female population that watch the mentioned tv shows.  
<img src='assets/Netflix - Case Study.png' width=400>  

<br>

Since the highest data in the table is 500 (Total sample population), dividing all the data by 500.  
Now we have the values between 0 to 1. The table thus obtained is called Probability Distribution Table.  
<img src='assets/Netflix - Probability Distribution Table.png' width=400>  

<br>

Here, the ones that are highlighted in green are called <u>Marginal Probability</u> and the ones that are highlighted in blue are called <u>Joint Probability</u>.  
For example, the value 0.16 is representing the chances of occuring two events - probability of Male liking Money Heist. Thus called the Joint Probability.  
The value 0.46 is a Marginal Probability because, it is the probability of selecting a Male. It doesnot care about their show preference. It describes the probability of occurance of a single event.

<img src='assets/Netflix - Joint & Marginal Probability.png' width=400>

<br>

<b>Joint Probability vs Marginal Probability vs Conditional Probability:</b>
- Joint Probability:
    - It is a Statistical Measure that calculates the likelihood of two events occuring together and at the same point of time.

- Marginal Probability:
    - It is the probability of an event irrespective of the outcome of another variable.

- Conditional Probability:
    - It is the probability of occurence of an event given that another event has already occured.

<br>

<u>Question 1:</u>  
What is the probability of a Netflix Subscriber being Male?
- P(Male) = 0.46


<u>Question 2:</u>  
What is the probability of a Netflix Subscriber preferring Money Heist?
- P(Money Heist) = 0.4

<u>Question 3:</u>  
What is the probability of a Netflix Subscriber being a Male and preferring Breaking Bad?
- P(Male ∩ Breaking Bad) = 0.2

<u>Question 4:</u>
What is the probability of a Netflix Subscriber being a Female and preferring Breaking Bad?
- P(Female ∪ Breaking Bad) = P(Female) + P(Breaking Bad) - P(Male ∩ Breaking Bad)  
   = 0.54 + 0.25 - 0.05 = 0.74
   
<u>Question 5:</u>
Samrat is a new Netflix Subscriber, what is the chance that he would like Breaking Bad?
- We know, Samrat is a Male and we need to know the probablity of Samrat preferring Breaking bad.  
    Here, we use Conditional Probability.  
    P(Breaking Bad | Male) = $\frac{P(Breaking\ Bad\ ∩\ Male)}{P(Male)}$  = $\frac{0.2}{0.46}$ = 0.43

<u>Question 6:</u>  
Hinata is a new Netflix Subscriber, what is the chance that she would like Money Heist?
- Using Conditional Probability,  
    P(Money Heist | Female) = $\frac{P(Money\ Heist\ ∩\ Female)}{P(Female)}$  = $\frac{0.24}{0.54}$ = 0.44

## Disjoint & Non Disjoint Events

 <b>Disjoint Events:</b>
- Events that cannot happen at the same time.
- Also know as <u>Mutually Exclusive</u> events.
- Examples:
    - The outcome of a single coin toss cannot be a head and a tail.
    - A student cannot both fail and pass a subject.
    - A single card drawn from a deck cannot be an Ace and a Queen at the same time.
<img src='assets/Disjoint Set.png' width=400>

<br>

<u>Union of Disjoint Events:</u>
- For disjoint events A and B, the probability of A or B happening is:
    - P(A ∪ B) = P(A) + P(B)
- Example: Probability of drawing a Jack or a Queen from a deck of card is:
    = P(Jack) + P(Queen) = 4/52 + 4/52 = 0.154
    
<b>Non-Disjoint Events:</b>
- Events can happen at the same time.
- Also known as <u>Mutually Inclusive</u> events.
- Examples:
    - A student can get an A in Statistics and A in History at the same time.
    - A single card drawn from a deck can be both a Jack and a Diamond.
<img src='assets/Non-Disjoint Set.png' width=400>


<br>

<u>Union of Non-Disjoint Events:</u>
- For non-disjoint events A and B, the probability of A or B happening is:
    - P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
- Example: Probability of drawing a Heart of Ace from a deck of card is:
    = P(Ace) + P(Heart) - P(Ace ∩ Heart) = 4/52 + 13/52 - 1/52 = 0.538

## Dependent and Independent Events

In probability and statistics, events are often classified as dependent or independent. As a basic rule of thumb, the existence or absence of an event can provide clues about other events. 


<b>Dependent Events:</b>
- For events to be considered as dependent, one must have an influence over how probable another is.
- A dependent event can only occur if another event occurs first.
- Provides Information about other events.
- Examples: 
    - You are willing to go on a vaccation at the end of next month, but that depends on having enough money to cover the trip. You may be counting on a bonus.
    - One must buy a lottery ticket to have a chance at winning. Your odds of winning are increased if you buy more than one ticket. 
    
<b>Independent Events:</b>
- An event is considered as independent, when the event is not connected to another event.
- Its probability of happening, or conversely, of not happening is not affected by another event.
- Provides no information about other events.
- Examples:
    - The color of your hair has absolutely no effect on where you work. The two events of 'having black hair' and 'working in Google' are completely independent of one another.
    - Getting a 6 on a dice and getting Progmotion at work are completely independent.

## Multiplicative Rule of Probability

- The multiplication rule is a way to find the probability of two events happening at the same time.
- There are two forms of this Rule:
    - Specific Multiplication Rule.
    - General Multiplication Rule.
    
<b>Specific Multiplication Rule:</b>
- This is used to calculate the joint probability of independent events.
- To use this rule, multiply the probabilities for independent events.
- With independent events, the occurrence of event A does not affect the likelihood of event B.
- P(A ∩ B) = P(A)*P(B)

Examples:
- Calculate the probability of obtaining 'heads' during two consecutive coin flips.  
    - P(Head ∩ Head) = P(Head) * P(Head) = 0.5*0.5 = 0.25
    
- You have 10 pants, out of them 3 are Tan. You have got 16 shirts, out of them 5 are Red. Selection of Pants doesnot affect the likelihood of drawing the Red shirt.
    - P(Tan Pants ∩ Red Shirt) = P(Tan Pants) * P(Red Shirt) = 0.3*0.25 = 0.075
    
<br>

<b>General Multiplication Rule:</b>
- Use the general Multiplication rule to calculate joint probabilities for either independent or dependent events.
- Helps us to factor in how the occurrence of Event A affects the likelihood of Event B.
- P(A ∩ B) = P(A) * P(B|A)
- The joint probability of A and B occurring equals to the probability of A occurring multiplied by the conditional probability of B occurring given that A has occurred.
- For independent/disjoint event, P(B|A) = P(B) so,  
    P(A ∩ B) = P(A) * P(B)

Examples:
- Draw cards from a deck of cards without replacement, and find the probability of drawing hearts on two consecutive draws. 
    - P(H1 ∩ H2) = P(H1) * P(H2) = 0.25*0.235 = 0.059

## Bayes Theorem

- Bayes' Theorem, named after 18th-century British mathematician, Thomas Bayes, is a mathematical formula for determining conditional probability.
- Conditional probability is the likelihood of an outcome occuring, when some other probabilities are known in advance.  
P(A|B) = $\frac{P(B|A)\ *\ P(A)}{P(B)}$


Example:
- Given that first card was an ace, what is the probability of drawing a second ace from a deck?
- Given that test was positive, what is the probability of having a disease?
- Given that a person likes fiction, what is the probability that he would like Game of Thrones?
    
<br>

<b>Usage of Bayes Theorem:</b>
- Bayes Theorem shows the relation between a conditional probabiliy and its reverse form.
- If Probability is P(A|B), we can use Bayes Theorem to find the reverse probabilities i.e. P(B|A).

## Applications of Bayes Theorem in Data Science

<b>App I: Testing Hypothesis:</b>
- Hypothesis approximates a target function which needs to be tested on data.  
  For example: mean weight of newborn baby is 3.5 kg.
- Bayes Theorem allows to test whether the hypothesis holds true for given data as P(h|D),  
  which means, probability of "hypothesis" being true for given "Data".
  
<br>

<b>App II: Classification:</b>
- When the possible values are categorical, Bayes Theorem can be applied for classification problems.  
  For example: whether a customer defaults on credit card payment or not based on the account balance.
- Naive Bayes Classifier is one such implementation of Bayes Theorem.
 
 <br>
 
 <b>App III: Model Optimization:</b>
 - Optimizing machine learning models involves finding an input that minimizes or maximizes an objective function.
 - Bayes Theorem applies probability to find out these values.
 - Bayesian Optimization is a technique used to improve the performance of a machine learning model.

## Random Variables

- A function that maps every outcome in the sample space to a real number is called a random variable.
- Example:   
    We performed an experiment tossing a coin. In our sample space, we will have a head and a tail i.e. {Head, Tail}.  
    A random variable function R, maps every event in the sample space to a real number (probability of each event occuring).  
    For the event Head, R(Head) = 0.5  
    For the event Tail, R(Tail) = 0.5
- There are two types of random variables:
    - Discrete Random Variable.
    - Continuous Random Variable.
    
<br>

<b>Discrete Random Variables:</b>
- Described using probability mass function (PMF) & cumulative distribution function (CDF).
- PMF is the probability that a random variable X takes a specific value k.
    - Number of fraudulent transactions at an e-commerce platform is 10.
- CDF is the probability that a random variable X takes a value less than or equal to k.
    - Number of fraudulent transactions at an e-commerce platform is less than or equal to 10.
    
<br>

<b>Continuous Random Variables:</b>
- Described using probability density function (PDF) & cumulative distribution function (CDF).
- PDF is the probability that a continuous random variable X takes a value in a small neighbourhood (range).
    - Price of house is between 100k to 125k.
- CDF is the probability that a continuous random variable X takes a value less than or equal to k.
    - Price of house is less than or equal to 125k.

## Probability Distribution Functions

#### a) Probability Distribution Functions for Discrete Random Variables

<b>Binomial Distribution:</b>
- This is a discrete probability distribution.
- Used in senarios where the random variable has only two outcomes.
- Objective is to find probability of getting x successes out of n-trials.
- If the probability of success is p, then the probability of failure is (1-p).
- PMF and CDF are used.
- Example: Loan Repayment Default.

<br>

<b>Geometric Distribution:</b>
- Occurs when we count the number of independent Bernoulli Trials until the first success.
- X = no. of trials Y until first success.
- X will be distributed with Geometric Distribution.

#### b) Probability Distribution Functions for Continuous Random Variables

<b>Uniform Distribution:</b>
- It is a probability distribution that has constant value in each interval.
- Also known as Rectangular Distribution.

<br>

<b>Exponential Distribution:</b>
- This provides a way to find the probabilities in time for a process. (time complexity)
- Traditionally used for modelling time-to-failure of electronic components.
- This represents a process in which events occur continuously and independently at a constant average rate.

<br>

<b>Normal Distribution:</b>
- One of the most important probability distribution in the field of statistics.
- Fits several natural phenomenons.
- Most ML algorithms assumes that the data follows Normal Distribution.
- Example: Measurement error, Heights, IQ scores, Blood pressure - follows Normal Distribution.

Graph of Normal Distribution depends on Mean and Standard Deviation.  
Mean - Determines the location of the center of the graph.  
Standard Deviation - Determines the height of graph. (determines how far apart the data are from each other)