 # Set Theory and Probability 

This lesson will introduce data science students to fundamental concepts of set theory and probaility, provide hands on applications of how these theoretical ideas can be put into practice, and feel comfortable using terms from set theory and probability to talk about events. 

## Learning Goals 

After this lesson you should be able to ... 

* [ ] Explain how the study of probability is fundamental to research questions in data science
* [ ] Define sets, subsets, universal subset, union, intersection as they are related to set theory
* [ ] Explain how sample space and event space form the basis of the study of probability
* [ ] Explain the three axioms of probability 
* [ ] Define the difference between permutations and combintations
* [ ] Be able to give examples of independent and dependant events

## Probability 

One of the most important jobs of a data scientist (statistician) is to attempt to quantify uncertainty.
This is not an easy task and there are a lot of factors at play when we try to do this. 
In order to give data scientists a common language to talk about such a task we employ mathematical language to formalize our assumptions and ensure we know what each other is talking about when trying to solve problems concerning quantifying uncertainty.

This means that terms such as...

* Probability
* Odds
* Likelihood

all take on very specific meanings and you will be encouraged to make sure that you are using the correct terms when describing a concept.

Before jumping into the maths, let's start with a small discussion question?

> What does it mean to say that there is a 45% chance of it raining later today?
> Turn to your partner try to tackle this question.
> In your discussion, try to cover the following points:
> What does the word "chance" mean in this context?
> Where does the number 45 come from here? What does it reflect?
> What is the 45% a reflection of? The day? Nov 11ths? The weather? Our experience of the weather?
> How does this assumption about there being a 45% chance of rain "today" compare and 
> contrast to when we say that a six sided die has a ~33% chance of landing on an even number?
> How do discussions about a week of rain compare to a few rolls of a die?

Post - Discussion Points

* Frequentism 
* Bayes 
* Likelihoood
* Discrete vs Continuous 
* Independent vs Dependant Events


## Sets

[Set Theory](https://en.wikipedia.org/wiki/Set_theory) is a branch of maths dealing with collections of objects.
The labs you worked on this morning gave you an overview of these concepts.
Here we will review those concepts and discuss how they relate to the world of data science. 



## Set Theory
In probability theory, a set is denoted as a well-defined collection of objects.
Mathematically, you can define a set by $S$. If an element $x$ belongs to a set $S$, then you'd write $x \in S$. On the other hand, if $x$ does not belong to a set $S$, then you'd write $x\notin S$.

__2.1 Subsets__ <br>
Set $T$ is a subset of set $S$ if every element in set $T$ is also in set $S$. The mathematical notation for a subset is $T \subset S$.

__2.2 Set Operations__ <br>

    - Union of Two Sets: The union of 2 sets S and T is the set of elements of either S or T, or in both.
    
    - The intersection of two sets S and T is the set that contains all elements of S that also belong to T.
    
    
We are trying to create rooming arrangements based on staff interest for a staff trip. <br>
Who should room with whom based on interests?

This is another way to look at sets.<br>
And we can still use the math!



In [1]:
Robin = set(["art", "traveling", "wine", "doodling", "tech", "gadgets"])
Rob = set(["rock-climbing", "traveling", "dad jokes", "ice cream"])
Alison = set(["wine", "traveling", "schitts creek", "dogs"])
Su = set(["schitts creek", "dogs", "tarot card reading", "croquet", "taxonomy"])
Molly = set(["wine", "ice cream", "dogs", "zookeeping", "traveling"])


In [2]:
Robin.intersection(Alison)

{'traveling', 'wine'}

In [3]:
Rob.intersection(Alison)

{'traveling'}

In [4]:
Alison.intersection(Su)

{'dogs', 'schitts creek'}

In [5]:
Alison.intersection(Su)

{'dogs', 'schitts creek'}

In [6]:
Alison.intersection(Su)

{'dogs', 'schitts creek'}

In [7]:
Molly.intersection(Rob).intersection(Robin)

{'traveling'}

In [8]:
Alison.intersection(Su)

{'dogs', 'schitts creek'}

**Task**:

- In groups of 2-3, draw the Venn diagram of interests of each person and how they overlap. 
- Then try the set notation learned in the learn.co curriculum to find the overlap answers with python.

## Probability 

As discussed in the curricula, the important jump from Set Theory to probability comes in thinking about the relationship between Sample Space and Event Space.

In Set Theory, we can designate our sample space with the Greek letter $\Omega$ (Omega).
If sample space was meant to represent a single die, that space would be $S = <1,2,3,4,5,6>$.

This sample space contrasts with the idea of Event Space.
Event Space will be a subset of Sample Space ( $E \subset S $ ).

In our dice example, what would the event space of even numbers look like?
What about rolling a one? 
What about rolling a one or a six?

> The idea here is if we did this over and over again, we would arrive on a fixed number represented by this number.

This also brings us discussing the Axioms of Probability (also in Labs).

1. Probabilty must always be positive
2. All probabilities sum to 1.
3. Independent events in sample space are addative.

> Discuss how these three relate to our die example.

## Conditional Probability 

Up until this point, we have assumed that events in probility space are independent.
Not all events are independent.

The CLASSIC example that you will run into in stats textbooks as you continue your education is the idea of balls in an urn.

Imagine you have an urn with 10 balls in it.
Seven of the balls are red.
Three of the balls are green.

Given what we know this far...

* What is the sample space of this example?
* What is the probability that we pull out a red ball with all ten balls in?
* What is the probability that we pull out a green ball with all ten balls in it?
* What is the probability after pulling out a red ball and keeping it out that we will pull out a green ball?

We will explore this example more in the Bayes lecture!!!

# x

### Dependent Events 

**Events $A$ and $B$ are dependent when the occurrence of $A$ somehow has an effect on whether $B$ will occur (or not).**

Now things start getting a bit more interesting. 

Let's look at an example. Let's say event $A$ is taking an orange or purple marble out of a jar. The jar contains 3 orange and 2 purple marbles. 

<img src="images/Image_69_Marb.png" width="300">

The probability of getting a purple marble is $\dfrac{2}{5}$ and getting an orange marble is $\dfrac{3}{5}$.

<img src="images/Image_70_Cond3.png" width="300">

At that point, one marble is taken out and we now take another marble from the jar (event $B$).

Here you can see that our second event is dependent on the outcome of the first draw.

- If we drew an orange marble first, the probability of getting a purple marble for event B is $\dfrac{2}{4}$. 
- If we saw a purple marble first, however, the probability of seeing a purple in the second trial is $\dfrac{1}{4}$. 

In simple terms, the probability of seeing an event $B$ in the second trial depends on the outcome $A$ of the first trial. We say that $P(B)$ is **conditional** on $P(A)$.

A **tree diagram** can be used to explore all possible events.

<img src="images/Image_71_TreeDiag.png" width = 500>

## Conditional Probability 

**Conditional probability emerges when the outcome a trial may influence the results of the upcoming trials.**

While calculating the probability of the second event (event $B$) given that the primary event (event $A$) has just happened, we say that the probability of event $B$ relies on the occurrence of event $A$.

Here are some more examples: 

* Drawing a 2nd Ace from a deck of cards given that the first card you drew was an Ace.
* Finding the probability of liking "The Matrix" given that you know this person likes science fiction.


Let's say that $P(A)$ is the event we are interested in, and this event depends on a certain event $B$ that has happened. 

The conditional probability (Probability of $A$ **given** $B$) can be written as:
$$ P (A \mid B) = \dfrac{P(A \cap B)}{P(B)}$$



$P(A|B)$, is the probability A **given** that $B$ has just happened. 

<img src="images/Image_72_Cond4.png" width="300">


Understanding this formula may be easier if you look at two simple Venn Diagrams and use the multiplication rule. Here's how to derive this formula:

Step 1: Write out the multiplication rule:
* $P(A \cap B)= P(B)*P(A\mid B)$

Step 2: Divide both sides of the equation by P(B):
* $\dfrac{P(A \cap B)}{ P(B)} = \dfrac{P(B)*P(A\mid B)}{P(B)}$

Step 3: Cancel P(B) on the right side of the equation:
* $\dfrac{P(A \cap B)}{P(B)} = P(A \mid B)$

Step 4: This is of course equal to:
* $ P(A \mid B)=\dfrac{P(A \cap B)}{P(B)} $

And this is our conditional probability formula. 

There are a few variations and theorems that are related to and/or results of this conditional probability formula. The most important ones are: the **product rule**, the **chain rule** and **Bayes Theorem**



### Theorem 1 - Product Rule

The **product rule** was used to derive the conditional probability formula above, but is often used in situations where the conditional probability is easy to compute, but the probability of intersections of events isn't. 

The intersection of events $A$ and $B$ can be given by

\begin{align}
    P(A \cap B) = P(B) P(A \mid B) = P(A) P(B \mid A)
\end{align}

Remember that if $A$ and $B$ are independent, then conditioning on $B$ means nothing (and vice-versa) so $P(A|B) = P(A)$, and $P(A \cap B) = P(A) P(B)$.

### Theorem 2 - Chain Rule

The **chain rule** (also called the **general product rule**) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. 

Recall the product rule: 

$P(A \cap B) = P(A \mid B) P(B)$

When you extend this for three variables:

$P(A\cap B \cap C) = P(A\cap( B \cap C)) = P(A\mid B \cap C) P(B \cap C) = P(A \mid B \cap C) P(B \mid C) P(C)$

And you can keep extending this to $n$ variables:

$$P(A_1 \cap A_2 \cap \ldots \cap A_n) = P(A_1 \mid A_2 \cap \ldots\cap A_n) P(A_2 \mid A_3  \cap \ldots \cap \ A_n) P(A_{n-1}|A_n) P(A_n)$$

This idea is known as the **chain rule**.

If on the other hand you have disjoint events $C_1, C_2,...,C_m$ such that $C_1\cup C_2\cup ··· \cup  C_m = \Omega$, the probability of any event can be decomposed as:

\begin{align}
P(A) = P(A \mid C_1)P(C_1) + P(A \mid C_2)P(C_2) + \ldots + P(A \mid C_m)P(C_m)
\end{align}

### Theorem 3 - Bayes Theorem

The **Bayes theorem**, which is the outcome of this section. Below is the formula that we will dig deeper into in upcoming lessons.

\begin{align}
    P(A|B) = \frac{P(B|A)P(A)}{P(B)} \text{-        this follows from Theorem 1}
\end{align}


## Permutations and Combinations

What's the probability that a staff person likes wine?

That's a very specific probability example.

But there are other applications and terminology that are important for probability. 


In this section, we will introduce you to the foundation of independent probability theory. Later on in the course, you will be introduced to concepts such as conditional probability and probability of dependent events.

__Terminology Alert__ 
- Random Variable
    - A random variable is a variable whose outcome is the result of a random phenomenon which can take on different values
    - A random variable can either be discrete or continuous
        - __Discrete__ : the variable takes on integer values
        - __Continous__ : can take on any values
        
####  Probability of A and B 
<center>$P(A and B) = P(A) * P(B)$</center>

What's the probability that a staff person likes wine *and* likes dogs?

#### Probabilities of A or B
<center>$P(A  or  B) = P(A) + P(B) - P(A  and  B)$</center>

What's the probabilty that someone like ice cream *or* traveling?

What happens when you have multiple events? 

What happens when you have multiple events? 

$$ P(A or B or C) = P(A) + P(B) + P(C) - P(A and B) - P(A and C) - P(B and C) + P(A&B&C) $$

## Permutations & Combinations
Help us define the full *set* of options related to a probability

**Permutation**
    - ordering matter
    - how many different arrangement can you get out of a number of elements?
    - possible number of arrangement $r$ out of a total of $n$ elements is given by:  $n! / (n – r)!$ 
    
```
from itertools import permutations 
l = list(permutations(range(1, 4))) 
print l
```


In [22]:
from itertools import permutations 
l = list(permutations(range(1, 4))) 
print(l)

[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]


#### Scenario:

You are trying to break the code - to hack into the mainframe, and stop the KGB from launching US missiles remotely.

You know the password is some 5 letter anagram of a subset of the word "pochemuchka"

How many words potential passwords are there? ie, how large is the **set** of password options?

In [23]:
from itertools import permutations 
l = list(permutations("pochemuchka", 5)) 

In [24]:
len(l)

55440

In [25]:
len(set(l))

22050

What's the probability that the password starts with p?

**Combination:**
    - ordering does not matter
    - how many different selections can you get out of number of elements?
    - possible number of selections $r$ out of a total of $n$ elements given by :$n! /((r !) (n – r)!)$
    - Example

#### Scenario A
- Combinatorics in specific scenario
    - What is the probability of getting exactly 3 heads out of 5 fair coins? 
    - What is the probability of getting at least 3 heads out of 5 tosses?
    
