# Calculating probabilities

In the last mission, we looked at `probability`, and the ideas of `disjunctive`, `dependent`, and `conjunctive` probabilities. <br>

We'll dive more into probability in this mission, and calculate more complex probabilities. But first, we'll look at the dataset we'll be using. <br>

In many countries, there are bikesharing programs where anyone can rent a bike from a depot, and return it at other depots throughout a city. There is one such program in Washington, D.C., in the US. We'll be looking at the number of bikes that were rented by day. Here are the relevant columns:

* `dteday` -- the date that we're looking at.
* `casual` -- the number of casual riders (people who hadn't previously signed up with the bikesharing program) that rented bikes on the day.
* `registered` -- the number of registered riders (people who signed up previously) that rented bikes.
* `cnt` -- the total number of bikes rented.

This data was collected by `Hadi Fanaee-T` at the `University of Porto`, and can be downloaded [here](http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset).

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [2]:
bikes = pd.read_csv('data/bike_rental_day.csv')
print(bikes.shape)
bikes.head()

(731, 16)


Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [4]:
# Find the number of days the bikes rented exceeded the threshold.
days_over_threshold = bikes[bikes["cnt"] > 2000].shape[0]

# Find the total number of days we have data for.
total_days = bikes.shape[0]

# Get the probability that more than 2000 bikes were rented for any given day.
probability_over_2000 = days_over_threshold / total_days
print(probability_over_2000)

0.86593707250342


Find the probability that more than 4000 bikes were rented on any given day.
* Assign the result to `probability_over_4000`.

In [6]:
days_over_threshold4000 = bikes[bikes["cnt"] > 4000].shape[0]

probability_over_4000 = days_over_threshold4000 / total_days
probability_over_4000

0.6183310533515732

### `Up to` or `greater`

Let's say we flip three coins, and we want to know the probability of getting `2 or more heads`. In order to do this, we'd need to add the probability of getting exactly 2 heads with the probability of getting exactly 3 heads. The probability that any single coin will be heads is .5 (the probability that the coin will be tails is the same, .5).<br>

The probability of `3 heads` is easy to calculate -- this can only happen in one situation, where all three coins are heads, or `.5 * .5 * .5`, which equals `.125`.<br>

The probability of `2 heads` is a little trickier -- there are three different combinations that the three coins can configure themselves in to end up with 2 heads. We show this in the table below, using H for heads, and T for tails.

Coin1|Coin2|Coin3
---|---|---
H|H|T
T|H|H
H|T|H

Each one of these has a probability of .5 * .5 * .5, so we just multiply 3 * .125 to get .375, the probability that we'll get 2 heads. <br>

We then just have to add up the probability of getting 2 heads to the probability of getting 3 heads to get .5, the probability of getting 2 or more heads when we flip 3 coins.

Find the probability that `1` coin out of `3` is heads.
* Assign the result to coin_1_prob.

In [7]:
coin_1_prob = 3 * (.5)**3
coin_1_prob

0.375

### Number of combinations

What we found in the last screen was that there were exactly 3 combinations of coins to get 2 out of the 3 coins to be heads. There was exactly 1 combination to get all three coins to be heads. <br>

Let's scale this example up a little bit. Let's say that we live in Los Angeles, CA, and the chance of any single day being sunny is .7. The chance of a day not being sunny is .3. <br>

If we have a sample of 5 days, and we want to find the chance that all 5 of them will be sunny, there's only one combination that allows this to happen -- the sunny outcome has to occur on all 5 days: <br>

Day1|Day2|Day3|Day4|Day5
---|---|---|---|---
S|S|S|S|S

If we want to find the probability that only 4 days will be sunny, there are 5 possible combinations.

Day1|Day2|Day3|Day4|Day5
---|---|---|---|---
S|S|S|S|N
S|S|S|N|S
S|S|N|S|S
S|N|S|S|S
N|S|S|S|S

You may notice a pattern here. The most extreme cases -- a given outcome happening all the time or none of the time, can only occur in one combination. The next step lower, a given outcome happening every time except once, or a given outcome only happening once, can happen in as many combinations as there are total events.


Find the number of combinations in which 1 day will be sunny.
* Assign the result to `sunny_1_combinations`.

In [None]:
sunny_1_combinations = 5

In fact, there's an easily quantifiable pattern with the number of combinations. We can calculate the number of combinations in which an outcome can occur k times in a set of events with a formula:

$$\frac{N!}{k!(N-k)!}$$

In this formula, N is the total number of events we have, and k is the target number of times we want our desired outcome to occur. So if we wanted to find the number of combinations in which 4 out of 5 days can be sunny, we'd set N to 5, and k to 4. The ! symbol means factorial. A factorial means "multiply every number from 1 to this number together". So 4! is 1*2*3*4, which is 24.

