# Probability refresher

Suggested readings before class:<br>
[Math is fun:Probability](https://www.mathsisfun.com/data/probability.html)


Probability is all about the **chances of an event occuring** or how likely an event is to occur, in a set of events.

If you really think about it, you've probably been thinking about probability all of your life, such as if you've ever wonderered about 


> -  The chances of it raining today
> -  The chances of winning the lottery
> -  The chances of getting hired at Google


To really make sense of **the chances** of an event occuring, we need to look at a bit of math through  **probability**.

<br>
In math probability is modeled by the expression:<br><br>
$P(A)= \frac{Count of A }{sample Space}$<br><br>

 <br><br>

$ P $  is the probability of the event $ A $ occurring in a set of observed events $ sample space $
$ count of A $  - is the number of times an certain detail was present in the whole set
$ sampleSpace $ - total number of observed events


To better understand how works, the closer this number is to 0, the less likely it is to occur, with a value 0 meaning it didnt happen at all. The closer to 1 being an indicator that it, the event is more likely to happen, with a value of 1 being that in every single case this happened.


<img src="https://www2.southeastern.edu/Academics/Faculty/dgurney/Math241/StatTopics/PrbScl4.jpg" />


You'll often see this represented in data sets in a number of formats. Here are some examples:<br><br>

| Hired|
|------|
| false|
| true |
| true |
| false|

<br>
<br>

| Won Lottery |
| ----|
| yes |
| no  |
| no  |
| no  |

<br>
<br>

| Survived |
| ----|
| 0 |
| 1  |
| 0  |
| 1 |


Lets look at a short example by actually examining some hiring numbers at Google<br><br>
[Click here to read an article about the hiring stats at Google](https://qz.com/285001/heres-why-you-only-have-a-0-2-chance-of-getting-hired-at-google/)<br>
> __Google gets around 3 million applications a year now, according to HR head Laszlo Bock, and hires 7,000 .... making it far more selective than institutions like Harvard, Yale, and Stanford.__<br>


<br><br>



So we have 3 million **observed events**,(in this case the event is submitting an application).<br>
And in 7000 of  those 3 million them a hiring occured


Lets model that with some code!




In [2]:

# total number of applicants to Google
num_of_applicants = 3000000

# here we have the number of people that actually got hired out of that group
total_hires = 7000


def probability(event_count, sample_space):
    p =  event_count/sample_space
    return p

prob = probability(total_hires, num_of_applicants)

^ this number can be used to tell more human readable calculations such as turning it into a percentage by multiplying it by 100

In [5]:
def percentage(prob):
    percentage = prob * 100
    return '{:.4f}% chance of occurence'.format(percentage)
percentage(prob)

'0.2333% chance of occurence'

To get the fractional form of the probabiility, divide both the divide both $ A $ and $ sample space $ by $ A $, then(if needed), round the denominiator

In [8]:
def fraction_probability(event_count,sample_space):
    denominator = round( sample_space/event_count)
    numerator = int(event_count/event_count)
    return '{}/{} chance of occurence'.format(numerator, denominator)

fraction_probability(total_hires, num_of_applicants)

'1/429 chance of occurence'

### Let's plug our numbers from google into these functions to the chances of getting hired

In [58]:
# probability of being hired
prob_hired = probability(total_hires,num_of_applicants)
display(prob_hired)

# Percentage of people hired
display(percentage_hired(prob_hired))

# Fraction of people hired
display(fraction_probability(total_hires,num_of_applicants))

0.0023333333333333335

'0.23333333333333336% chance of occurence'

'1 /429 chance of occurence'

^
Use the formulas above to solve the following questions:

1. It rained 3 days last month, what was the probability of it raining? Express this in percentage and fractions as well
2. You had 28 days last year where your website had over 100,000 unique visitors. What was the probability of any one day having over 100k visitors? What percentage of days had over 100k visitors?
3. Your website crashes every 3rd Tuesday and every 2nd Thursday. What is the probability of a crash occuring in a 31 day month? Express this in percentages and fractions

# Conditional Probability

Conditonal probability takes this a bit further in that it gets more descriptive. The conditional probability of an event occurring ( called $ B $ ) is the probability that the event will occur, given the knowledge that another event ( $ A $ ) has already occurred.

What this means is that $ B $ happening is dependent on $ A $ happening.<br>

Example:

> of those 7000 hired, let's say that 1500 of them have brown hair
> What are the chances that you were hired AND had brown hair?

So our $ sample space $ is still 3 million, the number of applicants hasn't changed
Our count of hires $ A $ is still the same at 7000.
The counts in our new variable $ B $ are a subset of A ($ A $ must occur for $ B $ to be possible)
> You'll see this expressed as:<br>
$ B $ is a subset of $ A $<br> or<br>  $B ⊂ A$

You'll see this expressed in mathematics as:

$ P(A|B) = \frac{P(A and B)}{P(A)} $

Let's head to Python to see this in action!



In [11]:
# To get the conditional probability
def conditional_prob(subset_count, event_count, sample_space):
    subset_prob = subset_count/ sample_space
    event_prob = event_count / sample_space
    conditional_p = (subset_prob)/ event_prob
    return 'The conditional probability of this occuring out of {} events is {}'.format(sample_space,conditional_p)



# To get the percentage of the conditional probability
def conditional_prob_percentage(cond_prob):
    percentage = cond_prob * 100
    return '{}% chance of occuring'.format(percentage)



# To get the fracitonal representation
def conditional_prob_fraction(cond_prob,sample_space):
    numerator = round(cond_prob / cond_prob)
    denominator = round(sample_space / cond_prob)
    return '{} / {} chance of occurence'.format(numerator, denominator)




### To really make this stick, were going to plug the numbers from our hiring into the functions!

In [13]:
# Number of people with black hair is 1500
num_of_brown_hair = 1500

cond_prob = conditional_prob(num_of_brown_hair,total_hires,num_of_applicants)
display(cond_prob)

cond_percent = conditional_prob_percentage(0.0005)
display(cond_percent)

fraction = conditional_prob_fraction(0.0005,num_of_applicants)
display(fraction)

'The conditional probability of this occuring out of 3000000 events is 0.21428571428571427'

'0.05% chance of occuring'

'1 / 6000000000 chance of occurence'

In [None]:
# Given that a passenger survived, what are the odds that they are male?

DATA = "train.csv"