# Tutorial 2.2: Basic Probability using Python

This notebook was largely adapted from The Dark Art of Coding ([Github](https://github.com/chalmerlowe)). 

# Main objectives
---

You will be able to:

* Understand how to calculate probabilities of independent events
* Explore probabilities using Venn diagrams
* Explore probabilities using Probability Trees




# Calculating the probability of an event
---

There are lots of definitions of probability... let's use one found in **Statistics in a Nutshell** (Sarah Boslaugh):

*"... probability tells us how often something is likely to occur when an experiment is repeated..."*


## Definitions

To get started, let's put together some additional definitions:

### Trials (or experiments or observations)

A **trial** is an event whose outcome is unknown

Trials can be very simple OR extremely complex:

* simple as flipping a coin or rolling a dice
* complex as determining whether someone with a low birth weight is gonna graduate college 22 years later

### Sample space

The **sample space**, often signified by an `S` is a set of **all possible outcomes** of a trial.

Thus, if our trial is a single roll of a six-sided die, then the sample space will be:

`S = {1, 2, 3, 4, 5, 6}`

AND if our trial is choosing a coin from a bag, then the sample space might be:

`S = {quarter, dime, nickel, penny}`


### Events

An **Event** specifies the outcome of a trial. Events can be the outcome of a single trial OR a set of outcomes.

Events are often designated with an `E`.

For example, if we roll a die and want to specify our event to be that we get a one (1), then our event might be defined:

`E = {1}`

Similarly, if we want to specify our event to be that we get an even number, then our event might be defined:

`E = {2, 4, 6}`

### Calculating a probability

To calculate a simple probability of an event, labeled `P(E)`, we divide the number of ways of getting the event, `n(E)` by the number of elements in the sample space, `n(S)`.

$\Large P(E) = \frac{n(E)}{n(S)}$

For example, if we calculate the probability of rolling a `1` on a six sided die...

```
E = {1}
S = {1, 2, 3, 4, 5, 6}


n(E) = 1      # number of ways to get the event
n(S) = 6      # number of elements in the sample space

P(E) = 1 / 6
```

### Using python to calculate some probabilities

Let's start with a simple function that allows us to calculate a simple probability of getting a `1` when we roll a six-sided die.

NOTE: in the gaming community, die are often referred to with the prefix `d` and the number of sides:
* d6 for a six-sided die
* d8 for an eight-sided die
* etc

In [36]:
def one_on_a_d6():
    '''Return the probability of rolling a 1 on a d6'''
    
    return 1 / 6

In [37]:
one_on_a_d6()

0.16666666666666666

As you can imagine, this is pretty limited. So let's put together a function that is a bit more capable.

In [51]:
def probability(sample_size, event):
    '''Return the probability of an event given the event and the sample size.
    NOTE: the event can only be composed of a single element.'''
    
    count = 1
    return count / sample_size

Presuming that our event is to roll a `4` on a six-sided die...

In [39]:
probability(sample_size=6, event=4)

0.16666666666666666

The astute observer would notice that... `event` didn't really do anything in our function, since we were focused on the probability of only one event occurring.


### What about multiple elements in an event?

What if we wanted to calculate the probability of rolling a `1, 2, or 3`?

If you recall, our probability formula alluded to the fact that we can calculate the probability of an event w/ multiple elements. To calculate this probability labeled `P(E)`, we use the same formula and divide the number of ways of getting the event, `n(E)` by the number of elements in the sample space, `n(S)`.

$\Large P(E) = \frac{n(E)}{n(S)}$

For example, if we calculate the probability of rolling a `1, 2, or 3` on a six-sided die, since there are 3 elements in our event.

```
n(E) = 3      # number of ways to get the event
n(S) = 6      # number of elements in the sample space

P(E) = 3 / 6 = 0.5
```

Our previous function presumed we were only looking at a single element out of a sample space and thus used a hard coded value of `1` as our `count` for the number of elements in the event. That function doesn't help us if we have multiple elements in our event. Let's look at the probability of rolling a `1, 2, or 3`:

In [40]:
event = [1, 2, 3]

def probability(sample_size, event=event):
    '''Return the probability of an event given the event and the sample size.
    The event may be composed of either a single element OR multiple elements (i.e. 
    a tuple OR list).
    '''
        
    if isinstance(event, (list, tuple)):
        count = len(event)
    else:
        count = 1
    return count / sample_size

For simplicity, let's alias our function as `P`:

In [41]:
P = probability

In [42]:
# We want to make sure this works in the base case of a single element in a event
# i.e. rolling a two on a six-sided die

P(6, 2)

0.16666666666666666

In [43]:
# probability of rolling a four, five OR a six on a six-sided die

P(6, [4, 5, 6])

0.5

In [44]:
# probability of rolling an odd number on a six-sided die

P(6, [1, 3, 5])

0.5

## Venn diagrams

Venn diagrams are often used to represent elements in a sample space using simple pictures:
    
This might be a representation of rolling the odd numbers on a d10:

**NOTE**: 

* When dealing with Venn diagrams, typically the rectangle represents all possible values.
* Values in the circles/ellipses are values that you are interested in.
* For convenience, I explictly stated that the rectangle holds all the other values, but eventually, I will stop doing so (as that is generally a given).


## Experience Points
---

### Complete the following exercises:

It may be useful to create a Venn Diagram to help you characterize the Sample Space and the Event.

**A**. Use Python to calculate the probability of rolling a 20 on a d20. To help you, we suggest some inputs representing the element(s) in the event and elements in the sample space.

$E = 1$

$S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}$

$\large P(E) = \frac{n(E)}{n(S)}$


In [52]:
E = 1
S = 20
P = E/S
print("probability of rolling a 20 on a d20:",P)

probability of rolling a 20 on a d20: 0.05


**B**. Use Python to calculate the probability of rolling an even number on a d10. To help you, we suggest some inputs representing the element(s) in the event and elements in the sample space.

$E = \{2, 4, 6, 8, 10\}$

$S = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}$

$\large P(E) = \frac{n(E)}{n(S)}$

In [53]:
E = 5
S = 10
P = E/S
print("probability of rolling an even number on a d10:",P)

probability of rolling an even number on a d10: 0.5


**C**. Use Python to calculate the probability of picking an Ace from a standard 52-card deck. To help you, we suggest some inputs representing the element(s) in the event and elements in the sample space.

As a shortcut, we will represent the cards using this nomenclature:

Ace = A, King = K, Queen = Q, Jack = J, numbered cards are represented by number

Club = C, Spade = S, Diamond = D, Heart = H

Some example cards would be:

* Ace of Spades: AS
* King of Hearts: KH
* Queen of Clubs: QC
* Ten of Diamonds: 10D
* Nine of Spades: 9S

Thus... our Event and Sample Space would look like this:

$E = \{AC, AS, AD, AH\}$

$S = \{2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C, JC, QC, KC, AC, \\
   \phantom{=} 2S, 3S, 4S, 5S, 6S, 7S, 8S, 9S, 10S, JS, QS, KS, AS, \\
\phantom{=} 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, JD, QD, KD, AD, \\
\phantom{=} 2H, 3H, 4H, 5H, 6H, 7H, 8H, 9H, 10H, JH, QH, KH, AH\}$

$\large P(E) = \frac{n(E)}{n(S)}$



In [54]:
E = 4
S = 52
P = E/S
print("probability of picking an Ace from a standard 52-card deck:",P)

probability of picking an Ace from a standard 52-card deck: 0.07692307692307693


**D**. Use Python to calculate the probability of a day in May 2018 falling a weekend. Practice figuring out what the Event looks like and the Sample Space looks like. Use your answers to calculate the probability.

$E = ?$

$S = ?$

$\large P(E) = \frac{n(E)}{n(S)}$

In [55]:
import datetime
E = 0
S = 0
for day in range(1,31):
  date = datetime.date(2018, 5, day)
  if date.weekday() == 5 or date.weekday() == 6:
    E += 1
  S += 1
P = E/S
print("probability of a day in May 2018 falling a weekend:", P)

probability of a day in May 2018 falling a weekend: 0.26666666666666666


**E**. Use Python to calculate the probability of rolling an even number on a d10.

In [56]:
E = 5
S = 10
P = E/S
print(" probability of rolling an even number on a d10:",P)

 probability of rolling an even number on a d10: 0.5


**F**. Use Python to calculate the probability of drawing a face card (Jack, Queen, King) from a deck of cards that includes not only the standard 52 cards, but two jokers, as well.

In [57]:
E = 12
S = 54
P = E/S
print("probability of drawing a face card:",P)

probability of drawing a face card: 0.2222222222222222


# Exclusive events and intersections
---

As noted above, sometimes events are exclusive and adding their probabilities works as expected.

In other cases, events are not exclusive (i.e. they have some degree of overlap or intersection) and thus simply adding the probabilities together doesn't work

## Exclusive events

Here is a simple example of a pair of exclusive events:

On a d10, what is the probability of rolling an even number OR and odd number?

$P(even) = 0.5$

$P(odd) = 0.5$

$P(even \ OR \ odd) = 1.0 \ (i.e. \ this \ outcome \ is \ guaranteed)$

**NOTE**: you can't get higher than a **1.0** when calculating probabilities. Probabilities range from **zero** (it will not happen) to **one** (it will happen, guaranteed).


## Intersecting events

Consider the probability you roll a number higher than 5 OR roll an even number on a d10. Let's start by creating our Event sets...
    
$E(greater\_than\_five) = \{6, 7, 8, 9, 10\}$

$E(even) = \{2, 4, 6, 8, 10\}$

$S = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}$



$P(greater\_than\_five) = \frac{5}{10} = 0.5$

$P(even) = \frac{5}{10} = 0.5$

Which implies that ...

$P(greater\_than\_five \ OR \ even) = 1.0$

...which is clearly not correct. The **correct answer is**: 

$P(greater\_than\_five \ OR \ even) = \frac{7}{10} = 0.7$







### Wait, what went wrong?

Why does this fail? Because the formula we used previously for calculating exclusive events did not take into account overlap OR intersection:

$E(greater\_than\_five \ OR \ even) = \{2, 4, 6, 7, 8, 9, 10\}$

The generic formula for this is:


$\large P(A \ \cup \ B) = \frac{n(A) \ + \ n(B) \ - \ n(A \cap B)}{n(S)}$    

In the formula above, the intersection between two events is symbolized by this shape:

$\Large \cap$

And thus the intersection (or overlapping portion in a Venn diagram) is represented by:

$\large n(A \cap B)$

Similarly, the union between two events is symbolized by this shape:

$\Large \cup$

And thus the union in the formula is represented by:

$\large n(A \cup B)$


With this in mind, our previous calculation looks more like this:
    
$\large P(A \ \cup \ B) = \frac{n(A) \ + \ n(B) \ - \ n(A \cap B)}{n(S)}$   

$\large P(A \ \cup \ B) = \frac{5 \ + \ 5 \ - \ 3}{10} = \frac{7}{10}$

$\large P(A \ \cup \ B) = 0.7$
    

### Putting this into practice

Let's explore this a bit using our Python function:

What is the probability of rolling greater than a 5 on a d10 OR rolling an even number?

In [27]:
answer = P(10, [6, 7, 8, 9, 10]) + P(10, [2, 4, 6, 8, 10]) - P(10, [6, 8, 10])
print(answer)

0.7


What is the probability of drawing a red card from a standard deck OR drawing a King?

As you can imagine, this becomes problematic as our sample size grows...

```python
answer = P(52, ['AD', '2D', '3D', '4D', ...]) + P(52, ['KS', 'KC', 'KD', 'KH']) - P(52, ['KD', 'KH'])
print(answer)
```

To get around this, we can alter our probability function... to simply allow us to input the size of our event space.

In [28]:
def probability(sample_size, event=None, event_size=None):
    '''Return the probability of an event given the event and the sample size.
    
    event: the event argument may be composed of either
    * multiple elements (i.e. a tuple OR list) (which are then counted) OR
    * a single element which is equated internally to a count of one
    
    event_size: As an alternative to providing an event, you can simply provide your own count using
    the event_size argument:
    * an integer representing the number of elements in an event.
    
    Only one of these two (event OR event_size) should be provided.
    
    '''
        
    if isinstance(event, (list, tuple)):
        count = len(event)
    elif isinstance(event_size, int):
        count = event_size
    else:
        count = 1
    return count / sample_size

# Again, let's provide an alias for our probability function...

P = probability

In [24]:
# red card
# OR
# king

answer = P(52, event_size=26) + P(52, event_size=4) - P(52, event_size=2)

As a sanity check... let's take a quick look at the each of the components of the formula.

In [25]:
print('P(red)', P(52, event_size=26))
print('P(king)', P(52, event_size=4))
print('P(red kings)', P(52, event_size=2))

P(red) 0.5
P(king) 0.07692307692307693
P(red kings) 0.038461538461538464


In [26]:
# So, what is the answer?

print(answer)

0.5384615384615384


### Let's try it again...

What is the probability of drawing a red card from a standard deck OR drawing a face card (King, Queen, Jack)?

In [29]:
answer = P(52, event_size=26) + P(52, event_size=12) - P(52, event_size=6)
print(answer)

0.6153846153846154


In [30]:
# If the problem is simple enough to do the arithmetic in your head, then that is fine, too:

# 26 + 12 - 6 = 32

P(52, event_size=32)

0.6153846153846154

Imagine the 46 players in your board game group have favorite games in the following categories:
    
* pandemic ONLY (12 players)
* pandemic AND carcassonne (6)
* carcassonne ONLY (16)
* carcassonne AND clank! (4)
* clank! ONLY (8)

What is the probability of finding a **pandemic** player or a **carcassonne** player

In [31]:
answer = P(46, event_size=18) + P(46, event_size=26) - P(52, event_size=6)
print(answer)

0.8411371237458194


What is the probability of finding a **carcassonne** player or a **clank!** player

In [32]:
answer = P(46, event_size=34)
print(answer)

0.7391304347826086


## Experience Points
---

### Complete the following exercises:

**A**. Use Python to calculate the probability of drawing a Black card OR an Ace, King, Queen, Jack, or 10

In [33]:
answer = P(52, event_size=26) + P(52, event_size=16) + P(52, event_size=4) - P(52, event_size=10)
print(answer)

0.6923076923076923


**B**. Use Python to calculate the probability of drawing a Diamond card OR an odd numeric card (3, 5, 7, 9)

In [34]:
answer = P(52, event_size=13) + P(52, event_size=16) - P(52, event_size=4)
print(answer)

0.4807692307692308


**C**. Use Python to calculate the probability of drawing a Diamond or Spade card OR a numeric card (2 - 10)

In [35]:
answer = P(52, event_size=13) + P(52, event_size=13) + P(52, event_size=36) - P(52, event_size=18)
print(answer)

0.8461538461538461
