# Introduction to probability

We covered a bit of probability in the last mission, but we'll go more into depth here and build a strong foundation. Before we do that, let's introduce our dataset. Our dataset contains information on flags of countries around the world. Each row is a country. Here are the relevant columns:

* `name` -- name of the country
* `landmass` -- which continent the country is in (1=N.America, 2=S.America, 3=Europe, 4=Africa, 4=Asia, 6=Oceania)
* `area` -- country area, in thousands of square kilometers
* `population` -- rounded to the nearest million
* `bars` -- Number of vertical bars in the flag
* `stripes` -- Number of horizontal stripes in the flag
* `colors` -- Number of different colours in the flag
* `red, green, blue, gold, white, black, orange` -- 0 if color absent, 1 if color present in the flag

This data was collected from Collins Gem Guide to Flags. It was written in 1986, so some flag information may be out of date!

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [2]:
flags = pd.read_csv('data/flags.csv')
flags.head()

Unnamed: 0,name,landmass,zone,area,population,language,religion,bars,stripes,colors,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
0,Afghanistan,5,1,648,16,10,2,0,3,5,...,0,0,1,0,0,1,0,0,black,green
1,Albania,3,1,29,3,6,6,0,0,3,...,0,0,1,0,0,0,1,0,red,red
2,Algeria,4,1,2388,20,8,2,2,0,3,...,0,0,1,1,0,0,0,0,green,white
3,American-Samoa,6,3,0,0,1,1,0,0,5,...,0,0,0,0,1,1,1,0,blue,red
4,Andorra,3,1,0,0,6,0,3,0,3,...,0,0,0,0,0,0,0,0,blue,red


In [3]:
flags.sort_values(by='bars', ascending=False).iloc[:5]

Unnamed: 0,name,landmass,zone,area,population,language,religion,bars,stripes,colors,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
161,St-Vincent,1,4,0,0,1,1,5,0,4,...,0,0,0,0,0,1,1,1,blue,green
143,Rwanda,4,2,26,5,10,5,3,0,4,...,0,0,0,0,0,0,0,1,red,green
88,Ivory-Coast,4,4,323,7,3,5,3,0,3,...,0,0,0,0,0,0,0,0,red,green
85,Ireland,3,4,70,3,1,0,3,0,3,...,0,0,0,0,0,0,0,0,green,orange
30,Cameroon,4,1,474,8,3,1,3,0,3,...,0,0,1,0,0,0,0,0,green,gold


* Find the country with the most bars in its flag. Assign the name of the country to most_bars_country.
* Find the country with the highest population (as of 1986). Assign the name of the country to highest_population_country.

In [7]:
most_bars_country = flags.sort_values(by='bars', ascending=False).iloc[0]['name']
most_bars_country

'St-Vincent'

In [8]:
highest_population_country = flags.sort_values(by='population', ascending=False).iloc[0]['name']
highest_population_country

'China'

* Determine the probability of a country having a flag with the color orange in it. Assign the result to orange_probability.
* Determine the probability of a country having a flag with more than 1 stripe in it. Assign the result to stripe_probability.

In [5]:
orange_probability = len(flags[flags['orange']==1])/(flags.shape[0])
orange_probability

0.13402061855670103

In [7]:
stripe_probability = len(flags[flags['stripes']>1])/flags.shape[0]
stripe_probability

0.41237113402061853

### Conjunctive probabilities

let's say we have a coin that we flip 5 times, and we want to find the probability that it will come up heads every time. This is called **a conjunctive probability, because it involves a sequence of events**. We want to find the probability that the first flip is heads and the second flip is heads, and so on.

* Find the probability that 10 flips in a row will all turn out heads. Assign the probability to ten_heads.
* Find the probability that 100 flips in a row will all turn out heads. Assign the probability to hundred_heads.

In [8]:
ten_heads = (.5)**10
hundred_heads = (.5)**100

### Dependent probabilities

Let's say that we're picking countries from the sample, and removing them when we pick. Each time we pick a country, we reduce the sample size for the next pick. **The events are dependent -- the number of countries available to pick depends on the previous pick**. We can't just calculate the probability upfront and take a power in this case -- we need to recompute the probability after each selection happens. <br>

Let's simplify the example a bit by saying that we're eating some M&Ms. **There are 10 M&Ms left in the bag: 5 are green, and 5 are blue. What are the odds of getting 3 blue candies in a row?** The probability of getting the first blue candy is 5/10, or 1/2. When we pick a blue candy, though, we remove it from the bag, so the probability of getting another is 4/9. The probability of picking a third blue candy is 3/8. This means our final probability is 1/2 * 4/9 * 3/8, or .0833. So, there is an 8.3% chance of picking three blue candies in a row.

* Let's say that we're picking countries from our dataset, and removing each one that we pick.
* What are the odds of picking three countries with red in their flags in a row? Assign the resulting probability to three_red.

In [9]:
red_countries_count = flags[flags['red']==1].shape[0]
all_countires_count = flags.shape[0]

first_red = red_countries_count/all_countires_count
second_red = (red_countries_count-1)/(all_countires_count-1)
third_red = (red_countries_count-2)/(all_countires_count-2)

three_red = first_red * second_red * third_red
three_red

0.4884855242775493

### Disjunctive probability

Let's say we're rolling a six-sided die -- the probability of rolling a `2` is `1/6`.<br>

What if we want to know the probability of rolling a 2 or the probability of rolling a three? **We actually can just add the probabilities, because both events are independent**. Rolling a 2 doesn't change my odds of rolling a three next time around. Thus, the probability is `1/6` + `1/6`, or 1/3.

* Let's say we have a random number generator that generates numbers from `1` to `18000`.
 *What are the odds of getting a number evenly divisible by `100`, with no remainder? (ie `100`, `200`, `300`, etc). Assign the result to `hundred_prob`.
* What are the odds of getting a number evenly divisible by `70`, with no remainder? (ie `70`, `140`, `210`, etc). Assign the result to `seventy_prob`.

In [10]:
hundred_prob = (1/18000)*(18000/100)
hundred_prob

0.01

In [11]:
18000/70

257.14285714285717

In [13]:
seventy_prob = (1/18000)*int(18000/70)
seventy_prob

0.014277777777777778

### Disjunctive dependent probabilities

Let's think about a slightly more complex case with dependencies. **What if we have a set of 10 cars -- 5 are red and 5 are blue. 5 of the 10 are convertibles, and 5 are sport utility vehicles**. <br>

If we wanted to find cars that `were red or were convertibles`, we might try to **add the probability of the car being red to the probability of the car being a convertible**. This would give us `1/2` + `1/2` == `1`. But, this is wrong, as it tells us that all 10 cars are either red or convertibles. <br>

**It's wrong because it assumes that the two traits (color and vehicle type) are independent**, when in fact they aren't. Some of the cars are `red and convertibles`. If we don't account for this **overlap**, we end up with a vastly **inflated count**. <br>

Let's say that we have **3 cars that are red and convertibles**. Our probability for red or convertible then comes out to (1/2 + 1/2) - 3/10. We subtract 3/10 to account for the cars we double counted when we computed (1/2 + 1/2). This gives us a .7 probability of a car being a convertible or red.

* Find the probability of a flag having red or orange as a color. Assign the result to red_or_orange.
* Find the probability of a flag having at least one stripes or at least one bars. Assign the result to stripes_or_bars.

In [23]:
all_count = flags.shape[0]
red_count = flags[flags['red']==1].shape[0]
orange_count = flags[flags['orange']==1].shape[0]
red_orange_count = flags[(flags['red']==1) &\
                         (flags['orange']==1)].shape[0]


red_or_orange = (red_count/all_count)\
                + (orange_count/all_count)\
                - (red_orange_count/all_count)
red_or_orange

0.8247422680412371

In [24]:
stripe_gte1_count = flags[flags['stripes'] > 0].shape[0]
bar_gte1_count = flags[flags['bars'] > 0].shape[0]
stripe_bar_gte1_count = flags[(flags['stripes'] > 0) &\
                             (flags['bars'] > 0)].shape[0]

stripes_or_bars = (stripe_gte1_count/all_count) +\
                    (bar_gte1_count/all_count) -\
                    (stripe_bar_gte1_count/all_count)
        
stripes_or_bars

0.5927835051546392

### Disjuctive probabilities with multiple conditions

We've looked at disjunctive probabilities in cases where there are only two conditions (A or B). **But what if we have three or more conditions?** <br>

Let's say we have `10 cars` again. 
* 5 are red and 5 are blue. 
* 5 are convertibles and 5 are sport utility vehicles. 
* 5 have a top speed of 130mph, and 5 have a top speed of 110mph.

Let's say we want to find all cars that are `red` or `convertibles` or have a top speed of `130mph`. Let's say 2 cars meet all three criteria. We would end up with 1/2 + 1/2 + 1/2 - 1/5, or a 1.3 probability if we tried to apply the formula from before. This is clearly false, as we can't have a probability greater than 1. <br>

One easy way to solve for cases like this is 
1. to find everything that doesn't match our criteria first. 
   * In this case, we'd look for blue sport utility vehicles with a top speed of 110mph. 
2. We would then subtract that probability from 1 to get the probability of red or convertible or 130mph top speed. 
   * Let's say there are 2 vehicles that are blue and sport utility vehicles and have a 110mph top speed. 
3. We would get a 1 - .2 or .8 probability for red or convertible or 130mph top speed.

* Let's say we have a coin that we're flipping. Find the probability that **at least one of the first three flips comes up heads**. Assign the result to heads_or.

In [25]:
# no head in the first three flips

no_head_first3 = 4 * (.5)**5
heads_or = 1 - no_head_first3

0.875