## Introduction to Conditional Probability and Bayes theorem for data science professionals

[Reference](https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/)

### Introduction

Solutions to many data science problems are often probabilistic in nature. Hence, a better understanding of probability will help you understand & implement these algorithms more efficiently.

In this article, We will focus on conditional probability. We would strongly recommend go through [this article](https://www.analyticsvidhya.com/blog/2017/02/basic-probability-data-science-with-examples/?utm_source=blog&utm_medium=ConditionalProbabilityBayesTheoremarticle) before proceeding further.

A predictive model can easily be understood as a statement of conditional probability. For example, the probability of a customer from segment A buying a product of category Z in next 10 days is 0.80. In other words, the probability of a customer buying product from Category Z, given that the customer is from Segment A is 0.80.

**Table of Contents**
1. Events – Union, Intersection & Disjoint events
2. Independent, Dependent and Exclusive events
3. Conditional Probability
4. Bayes Theorem
5. Probability trees
6. Frequentist vs Bayesian definitions of probability
7. Example Challenges

### 1. Events – Union, Intersection & Disjoint events

Before we explore conditional probability, let us define some basic common terminologies:

**Events**

An event is simply the outcome of a random experiment. Getting a heads when we toss a coin is an event. <br>
We associate probabilities to these events by defining the event and the sample space.

The __sample space__ is nothing but the collection of all possible outcomes of an experiment. <br>
For example: A sample space for a single throw of a die will be {1,2,3,4,5,6}. 

An event can also be a combination of different events.

**Union of Events**

We can define an event (C) of getting a 4 or 6 when we roll a fair die. Here event C is a __union__ of two events:

    Event A = Getting a 4

    Event B = Getting a 6

    P (C) = P (A ꓴ B)
   
![image.png](attachment:image.png)

In simple words we can say that we should consider the probability of (A ꓴ B) when we are interested in combined probability of two (or more) events.

**Intersection of Events**

Event C is an intersection of event A & B.

Probabilities are then defined as follows.

    P (C) = P (A ꓵ B)
    
![image.png](attachment:image.png)

We can now say that the shaded region is the probability of both events A and B occurring together.

**Disjoint Events**

What if, you come across a case when any two particular events cannot occur at the same time.

For example: Let’s say you have a fair die and you have only one throw.

    Event A = Getting a multiple of 3

    Event B = Getting a multiple of 5

You want both event A & B should occur together.

Let’s find the sub space for Event A & B.

    Event A = {3,6}

    Event B = {5}

    Sample Space= {1,2,3,4,5,6}

As you can see, there is no case for which event A & B can occur together. Such events are called disjoint event. To represent this using a Venn diagram:

![image.png](attachment:image.png)

Now that we are familiar with the terms Union, intersection and disjoint events, we can talk about independence of events.

### 2. Independent, Dependent & Exclusive Events

Suppose we have two events – event A and event B.

    If the occurrence of event A doesn’t affect the occurrence of event B, these events are called independent events.
    
The probability of outcome of the second event is not affected at all by the outcome of the first event.

**Probability of independent events**

In this case the probability of P (A ꓵ B) = P (A) * P (B)

Let’s take an example here. Suppose we win the game if we pick a red marble from a jar containing __4 red and 3 black marbles__ and we get heads on the toss of a coin. What is the probability of winning?

Let’s define event A, as getting red marble from the jar

Event B is getting heads on the toss of a coin.

We need to find the probability of both getting a red marble and a heads in a coin toss.

        P (A) = 4/7

        P (B) = 1/2

We know that there is no affect of the color of the marble on the outcome of the coin toss.

    P (A ꓵ B) = P (A) * P (B)

    P (A ꓵ B) = (4/7) * (1/2) = (2/7)

**Probability of dependent events**

Next, can you think of examples of dependent events ?

In the above example, let’s define event A as getting a Red marble from the jar. We then keep the marble out and then take another marble from the jar.

**Will the probabilities in the second case still be the same as that in the first case?**

Let’s see. So, for the first time there are 4/7 chances of getting a red marble. Let’s assume you got a red marble on the first attempt. Now, for second chance, to get a red marble we have 3/6 chances.

If we didn’t get a red marble on the first attempt but a white marble instead. Then, there were 4/6 chances to get the red marble second time. **Therefore the probability in the second case was dependent on what happened the first time.**

**Mutually exclusive and Exhaustive events**

Mutually exclusive events are those events where two events cannot happen together.

The easiest example to understand this is the toss of a coin. Getting a head and a tail are mutually exclusive because we can either get heads or tails but never both at the same in a single coin toss.

A set of events is collectively exhaustive when the set should contain all the possible outcomes of the experiment. One of the events from the list must occur for sure when the experiment is performed.

For example, in a throw of a die, {1,2,3,4,5,6} is an exhaustive collection because, it encompasses the entire range of the possible outcomes.

Consider the outcomes “even” (2,4 or 6) and “not-6” (1,2,3,4, or 5) in a throw of a fair die. They are collectively exhaustive but not mutually exclusive.

### 3. Conditional Probability

Conditional probabilities arise naturally in the investigation of experiments where an outcome of a trial may affect the outcomes of the subsequent trials.

We try to calculate the probability of the second event (event B) given that the first event (event A) has already happened. If the probability of the event changes when we take the first event into consideration, we can safely say that the probability of event B is dependent of the occurrence of event A.

Here we can define, 2 events:

* Event A is the probability of the event we’re trying to calculate.
* Event B is the condition that we know or the event that has happened.

We can write the conditional probability as ![image.png](attachment:image.png), the probability of the occurrence of event A given that B has already happened.

![image.png](attachment:image.png)

Let’s play a simple game of cards for you to understand this. Suppose you draw two cards from a deck and you win if you get a jack followed by an ace (without replacement). What is the probability of winning, given we know that you got a jack in the first turn?

Let event A be getting a jack in the first turn

Let event B be getting an ace in the second turn.

We need to find ![image.png](attachment:image.png)

    P(A) = 4/52

    P(B) = 4/51 {no replacement}

    P(A and B) = 4/52*4/51= 0.006
    
![image.png](attachment:image.png)

Suppose you have a jar containing 6 marbles – 3 black and 3 white. What is the probability of getting a black given the first one was black too.

    P (A) = getting a black marble in the first turn

    P (B) = getting a black marble in the second turn

    P (A) = 3/6

    P (B) = 2/5

    P (A and B) = ½*2/5 = 1/5
    
![image.png](attachment:image.png)

#### Reversing the condition

Example: Rahul’s favorite breakfast is bagels and his favorite lunch is pizza. The probability of Rahul having bagels for breakfast is 0.6. The probability of him having pizza for lunch is 0.5. The probability of him, having a bagel for breakfast given that he eats a pizza for lunch is 0.7.

Let’s define event A as Rahul having a bagel for breakfast, Event B as Rahul having a pizza for lunch.

    P (A) = 0.6

    P (B) = 0.5
    
![image.png](attachment:image.png)

If we look at the numbers, the probability of having a bagel is different than the probability of having a bagel given he has a pizza for lunch. This means that the probability of having a bagel is dependent on having a pizza for lunch.

Now what if we need to know the probability of having a pizza given you had a bagel for breakfast. i.e. we need to know ![image.png](attachment:image.png).
Bayes theorem now comes into the picture.

### 4. Bayes Theorem

The Bayes theorem describes the probability of an event based on the prior knowledge of the conditions that might be related to the event. If we know the conditional probability ![image.png](attachment:image.png) we can use the bayes rule to find out the reverse probabilities

![image.png](attachment:image.png)

How can we do that?

![image.png](attachment:image.png)

The above statement is the general representation of the Bayes rule.

For the previous example – if we now wish to calculate the probability of having a pizza for lunch provided you had a bagel for breakfast would be = 0.7 * 0.5/0.6.

If multiple events Ai form an exhaustive set with another event B.

We can write the equation as

![image.png](attachment:image.png)

### 5. Example of Bayes Theorem and Probability trees



## Reference

1. https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/
2. https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
3. https://stattrek.com/probability/bayes-theorem.aspx
4. https://towardsdatascience.com/bayes-theorem-the-holy-grail-of-data-science-55d93315defb
5. https://www.mathsisfun.com/data/bayes-theorem.html