# Welcome to the Dark Art of Coding:
## Introduction to Data Science Fundamentals
Probability distributions and Combinations, Permutations, Factorials


<img src='../images/logos.3.600.wide.png' height='250' width='300' style="float:right">

# Main objectives
---

You will be able to:

* Understand probability distributions
* Understand permutations and combinations, including:
    * factorials
    * calculating the number of arrangements of items
    * dealing with duplicates
    * arranging by individual items versus types of items
    * examining permutations
    * examining combinations

# Discrete Probability Distributions
---

Often, it is useful to simply understand the probability of an event. But sometimes it just isn't enough.

For example, sometimes we need to know the consequences of a fairly likely event OR the results if an unlikely event occurs.

To start this conversation, let's consider a classic example: the slot machine:




## Experience Points
---

### Complete the following exercises:

If you (and your partner, if you're working in pairs) are done, then you can put your green sticky up! This is how we know you're done.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

# Combinations & Permutations with a sprinkling of Factorials

In looking at the sample space... a primary concern is being able to accurately identify or calculate the number of arrangements of items.

Two difficulties that we run into include:

* dealing with duplicates
* arranging by individual items versus types of items

Two help us with calculating the number of arrangements in a sample space, we will look at factorials, permutations and combinations (with and without replacement).

# Factorials

A **factorial** is the product of every whole number from `n` to 1. Factorials are written in the following manner:

$\Large n!$

For example:

$\Large 2! = 2 \times 1 = 2$

$\Large 3! = 3 \times 2 \times 1 = 6$

$\Large 4! = 4 \times 3 \times 2 \times 1 = 24$

$\Large 10! = 10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 = 3628800$

Factorials present an easy way to write certain large numbers and are often used in statistics, especially in terms of calculating permutations and combinations.

A great attribute of factorials is that when performing math with factorials, you can often simplify your work greatly if you need to do things by hand OR want to get a rough order of magnitude. In this example, the :

$\Large \frac{5!}{3!} = \frac{5 \times 4 \times 3 \times 2 \times 1}{3 \times 2 \times 1} = \frac{5 \times 4}{1}  = 20$

$\Large \frac{5!  \times 4!}{3! \times 2!} =
 \frac{5 \times 4 \times 3 \times 2 \times 1}{3 \times 2 \times 1 } \times \frac{4 \times 3 \times 2 \times 1 }{\times 2 \times 1}
      {} = \frac{5 \times 4}{1} \times \frac{4 \times 3}{1} = 20 \times 12 = 240$



# Permutations and Combinations

## Definitions:





* permutations                    # (w/o rep)

$\Large\frac{n!}{(n - k)!}$

* combinations

$\Large\frac{n!}{k!(n - k)!}$

* product

$\Large n^k$

* combinations with replacement

$\Large\frac{(k + n - 1)!}{k!(n - 1)!}$

||Without replacement|With replacement|
|:-|:-|:-|
|**Order matters**|permutations|product???|
|**Order doesn't matter**|combinations|combinations with rep|

In [None]:
# handmade calcs?

# factorial 

def factorial(n):
    
    
    if n == 1:
        return 1
    elif n >= 2:
        return n * factorial(n - 1)
    elif n == 0:
        return 1

In [None]:
# perms

factorial(3) / factorial(3 - 2)

In [None]:
# combs

factorial(3) / (factorial(2) * factorial(3 - 2))

In [None]:
# prod

3 ** 2

In [None]:
# combs with rep...

factorial(2 + 3 - 1) / (factorial(2) * factorial(3 - 1))

In [None]:
# Warning... using a roll-your-solution is often not the best plan
#            if performance is goal, but can be useful to learn new things.
#            # 32-bit integers max out around 12!
#            # 64-bit integers max out around 20!


from math import factorial as f

In [None]:
%timeit factorial(40)



In [None]:
%timeit f(80)

In [None]:
l = [1, 2, 3]


print(list(itertools.permutations(l, 2)))
print(list(itertools.combinations(l, 2)))


print(list(itertools.product(l, repeat=2)))
print(list(itertools.combinations_with_replacement(l, 2)))

In [None]:
i = itertools.product(l, repeat=2)

In [None]:
print(list(itertools.permutations(l, 3)))
print(list(itertools.combinations(l, 3)))


print(list(itertools.product(l, repeat=3)))
print(list(itertools.combinations_with_replacement(l, 3)))

In [None]:
# Standard library ... 

import itertools
# combinations
# permutations
# product
# combos with replacement

In [None]:
itertools.combinations
itertools.combinations_with_replacement
itertools.permutations
itertools.product


What does all this mean, anyway...


Thus far, we have looked a lot of examples where we generally just focused on an OR specific elements out of a sample space:

probability of a 3 out of 6 on a d6
probability of an even number out on a d20

Much of probability is not that simple...

For example, in a superhero foot race, how many different ways can the superheroes cross the finish line would be an example of looking at a problem where order matters:

||First, Second, Third|
|:--|:--|
|Order 1|iron man, black widow, black panther|
|Order 2|iron man, black panther, black widow|
|Order 3|black widow, iron man, black panther|
|Order 4|black widow, black panther, iron man|
|Order 5|black panther, iron man, black widow|
|Order 6|black panther, black widow, iron man|

Let's use our newfound skills to calculate this:

In [None]:
heroes = ['iron man', 'black widow', 'black panther']

First we will look at all the possible outcomes and count those outcomes:

* What are the possible orderings (permutations)?
* How many possible ordering are there (i.e. what is the size of the sample space)?

In this case, we can find out how many possible outcomes are available by using our function `factorial()`, but this is somewhat limiting.

In [None]:
print(factorial(len(heroes)))

This is somewhat limiting in that we don't know the order of the items and such.

To actually get the ordering 

In [None]:
import pprint as pp
heroes = ['iron man', 'black widow', 'black panther']
S = list(itertools.permutations(heroes, 3))
pp.pprint(S)
print()
print('Total number of permutations:', len(S))

In [None]:
import pprint as pp
heroes = ['iron man', 'black widow', 'black panther', 'captain america', 'spiderman']
S = list(itertools.combinations(heroes, 3))
pp.pprint(S)
print()
print('Total number of permutations:', len(S))

# BACKUP:

# Continuous Distributions

In [None]:
def uniform_pdf(x):
    return 1 if 0 <= x < 1 else 0

In [None]:
# Let's test our uniform_pdf function

for n in range(-4, 20):
    n = round(n * 0.1, 1)
    print(uniform_pdf(n), '\t', n)

In [None]:
def uniform_cdf(x):
    '''Returns the probability that a uniform random variable
    is <= x'''
    
    if x < 0: return 0     # uniform random var is never less than 0
    elif x < 1: return x   # P(x <- 0.4) = 0.4
    else: return 1         # uniform random is always less than 1

In [None]:
# Let's test our uniform_cdf function

for n in range(-4, 20):
    n = round(n * 0.1, 1)
    print(uniform_cdf(n), '\t', n)

In [None]:
def normal_pdf(x, mu=0, sigma=1):
    sqrt_two_pi = math.sqrt(2 * math.pi)
    return (math.exp(-(x-mu) ** 2 / 2 / sigma ** 2) / (sqrt_two_pi * sigma))

In [None]:
xs = [x/ 10.0 for x in range(-50, 50)]

In [None]:
plt.plot(xs, [normal_pdf(x, sigma=1) for x in xs], '-', label='mu=0, sigma=1')
plt.plot(xs, [normal_pdf(x, sigma=2) for x in xs], '--', label='mu=0, sigma=2')
plt.plot(xs, [normal_pdf(x, sigma=0.5) for x in xs], ':', label='mu=0, sigma=0.5')
plt.plot(xs, [normal_pdf(x, mu=-1) for x in xs], '-.', label='mu=-1, sigma=1')
plt.legend()

If mu = 0 and sigma = 1, then the distribution is considered a standard normal diatribution.

If Z is a Standard, Normal Random variable, then in the equation: X = sigma * Z + mu, X is considered normal, but with a mean of mu and  a standard deviation of sigma.
    

In [None]:
def normal_cdf(x, mu=0, sigma=1):
    return (1 + math.erf((x - mu) / math.sqrt(2) / sigma)) / 2

In [None]:
plt.plot(xs, [normal_cdf(x, sigma=1) for x in xs], '-', label='mu=0, sigma=1')
plt.plot(xs, [normal_cdf(x, sigma=2) for x in xs], '--', label='mu=0, sigma=2')
plt.plot(xs, [normal_cdf(x, sigma=0.5) for x in xs], ':', label='mu=0, sigma=0.5')
plt.plot(xs, [normal_cdf(x, mu=-1) for x in xs], '-.', label='mu=-1, sigma=1')
plt.legend(loc=4)

In [None]:
def inverse_normal_cdf(p, mu=0, sigma=1, tolerance=0.00001):
    '''find approximate inverse, by running a binary search'''
    
    # if the curve is not a Standard curve, compute a 
    # standard curve and rescale appropriately
    
    if mu != 0 or sigma != 1:
        return mu + sigma * inverse_normal_cdf(p, tolerance=tolerance)
    
    low_z, low_p = -10.0, 0
    hi_z, hi_p = 10.0, 1
    while hi_z - low_z > tolerance:
        mid_z = low_z + hi_z / 2
        mid_p = normal_csf(mid_z)
        if mid_p < p:
            # midpoint is still too low, search higher numbers
            low_z, low_p = mid_z, mid_p
        elif mid_p > p:
            # midpoint is still too high, search lower numbers
            hi_z, hi_P = mid_z, mid_p
        else:
            break 
    return miz_z           
    
    

* Examine and use special probability distributions, such as:
    * geometric
    * binary
    * poisson
    * normal
    
* Combinations & Permutations 