# Counts and probabilities
## Data Science 350

In this notebook we will explore count data and the related probabilities. Event data is typically analyzed as counts, the number of each event that occurs. Using count data we can compute the probabilities that each type of event occurs in the future. 

![](img/Boom.jpg)

### Counting and Combinatorics

Combinatorics of the biggest areas of mathematics. We apply combinatorics to compute the possible combinations or permutations of an combinationn of events. 

For example, we can use combinatorics to compute the number of possible sandwiches we can order at a sandwich shop with a limited menu, 4 bread choices, 5 meat choices, 4 toppings.  How many sandwich unique sandwich combination can we order by picking  one item from each category?   

$$4 * 5 * 4 = 80$$

You can see that for this problem we just need to multiple the number of choices for each class. This is an example of the **multiplication principle** of combinatorics.

In the above example there is no dependncy of our choice from one category to anyother. Consequently, we can find all of the possible combinations by simple multiplication. 

This is not always the case. Let's look at an example where each event changes the subsequent possible events. Let's say I go to a pub and I want to order a 4-beer taster, with each beer being unique. The pub has 10 beers on tap. How many possible choices do I have for my taster? Fortunately I know R, so I can use the R 'combn' fuction to build a table of all possible combinations of my 4-beer taster!

In [1]:
c = combn(10,4)
c
dim(c)[2]

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1,1,1,1,1,1,1,1,1,1,...,5,5,5,5,5,6,6,6,6,7
2,2,2,2,2,2,2,2,2,2,...,6,7,7,7,8,7,7,7,8,8
3,3,3,3,3,3,3,4,4,4,...,9,8,8,9,9,8,8,9,9,9
4,5,6,7,8,9,10,5,6,7,...,10,9,10,10,10,9,10,10,10,10


The function builds a table of all combinations of 4 items chosen from a list of 10. The second dimension tells me how many combinations there are. 

### Sandwich combinatorics
 
 Let's investigate the sandwich shop example in a bit more detail. The code in the cell below creates three vectors containing the possible choices for bread, meat and topping. Execute this code.

In [2]:
##-----Sandwich Count----
breads = c('white', 'wheat', 'italian', 'sevengrain')
meats = c('ham', 'turkey', 'chicken', 'pastrami', 'meatballs')
toppings = c('mustard', 'mayo', 'salt_pepper', 'oil_vinegar')

To make our calculations simple, we can create a table or grid of all the possible sandwich choices. Execute the code in the cell below to create a grid or table of the possible sandwich choices, using the ```expand.grid``` function. 

In [3]:
sandwiches = expand.grid(breads,
                         meats,
                         toppings)
nrow(sandwiches)
head(sandwiches, 20)

Var1,Var2,Var3
white,ham,mustard
wheat,ham,mustard
italian,ham,mustard
sevengrain,ham,mustard
white,turkey,mustard
wheat,turkey,mustard
italian,turkey,mustard
sevengrain,turkey,mustard
white,chicken,mustard
wheat,chicken,mustard


As expected, there are 80 possible sandwich types enumerated in the table.

***
**Your turn:** In the cell below, redo the sandwich shop example with three types of cheese added to the menu, chedar, american, swiss. How many unique sandwiches can you now order, and does the table show all the purmuations?
***

In [4]:
cheese = c('chedar', 'american', 'swiss')
sandwiches = expand.grid(breads,
                         meats,
                         toppings,
                         cheese)
nrow(sandwiches)
head(sandwiches, 20)

Var1,Var2,Var3,Var4
white,ham,mustard,chedar
wheat,ham,mustard,chedar
italian,ham,mustard,chedar
sevengrain,ham,mustard,chedar
white,turkey,mustard,chedar
wheat,turkey,mustard,chedar
italian,turkey,mustard,chedar
sevengrain,turkey,mustard,chedar
white,chicken,mustard,chedar
wheat,chicken,mustard,chedar


###  Factorials and purmuations

Factorials are a way to compute the number of ways to order $N$ things. We use the term **Purmutations** to describe the number of ways you can order some objects or events. This is where **factorials** arise:

$$Number\ of\ ways\ to\ order\ N\ things = N!$$  

Let's say you have 5 new books on probability you wish to put on a shelf (having read them cover-to-cover no doubt!). How many was can you order them:  

$$5 * 4 * 3 * 2 * 1 = 5! = 120$$

This is another application of the multiplication principle. 

Easy enough, so far. But let's say we want to find the number of purmutations of $k$ unique items chosen from $N$ total items. We can compute the number of possible purmuations as:

$$\frac{N!}{(N - k)!}$$

Let's revisit our beer example. The order I drink my 4 beers in the sampler might matter. Maybe the tasts will be a bit different if I drink stout before I drink a red ale? We saw the number of combinations previously. But, since order matters, I have many more purmuations:

$$\frac{10!}{(10 - 4)!} = 10 * 9 * 8 * 7 = 5040$$

****
**Your turn:** Let's say I am going to order a 5-beer taster and I care about order. In the cell below create the R code to compute how many permutations are there. Can you see how the number of purmuations gets large rather quickly? 
****



In [5]:
10 * 9 * 8 * 7 * 6

### Computing factorials

Computing factorials can be tricky. A 64 bit unsigned integer can represent numbers as large as $2^{64} = 9.2E18$. However $21! = 5.1E19$. In practice, compuation of factorials is done on ratios to make the problem tractable. For example, we just wrote our beer example in a tractable form:

$$\frac{10!}{6!} = \frac{10!}{(10-4)!} = 10 * 9 * 8 * 7$$

We never had to actually compute the largest number $10!$. In fact, we just multipled 4 numbers. 

### Combinations

What if order does not matter? I may just want to find all unique combinations of k items of N choices. For example, for the beer example when order does not mater, there are $10$ choices and I want to pick $4$ unique choices. In the language of combintorics, we say that the above quantity is $10$ **choose** $4$, which can be writen:

$$\frac{10!}{4!(10 - 4)!} = \binom{10}{4}$$

We say that $N$ choose $k$ is a **combinations** since order does not matter. More generally we compute combinations with the formula:

$$\frac{N!}{k!(N - k)!} = \binom{N}{k}$$

From these forumlas you can see that there are $k!$ combinations than purmutations.

For our example, we can visualize how this process works with **Pascal's triangle**. You can see an example below. 

![](img/Pascal.jpg)

In this case we find $10$ choose $4$ by counting down 10 rows and over 4 elements. Vola! we have the value we expect! 

Notice that Pascal's triangle is symetric. This illustrates an important symetry property of combinations. Notice that:

$$\binom{N}{k} = \binom{N}{N-k}$$



***
**Your turn:** Use the R 'choose' function to compute the number of 4-beer tasters you could create from 10 taps.
***

In [6]:
choose(10,4)

***
**Fun note:** there are $52!$ ways to shuffle deck of cards, or combinations. It is likely that each suffle is unique in the history of the world!
***

### Probability for dice

Once we can use combinatorics to enumerate all possible states following a series of events, we can also compute probabilities of these events. 

Let's start by enumerating all of the possible end states from throwing two dice. We will assume that these dice are  'fair'.  In other words, the chance there is an equal probability that any face of the dice will land pointing up. In terms of probability, we say that the distribution of scores for each die follows a **Uniform distribution**. further, we assume that the dice indentical and that the score for each dice is not dependent on the other dice. In the terminology of probability, we can now say that the score of each dice is **Independent Identically Distributed**, or **iid**.  

The code in the cell below computes as table with all possible outcomes. Run this code and examine the  results. 

In [None]:
##-----Two Dice Example ------
twoDice = expand.grid(1:6,1:6)
twoDice

As a first step in computing the probabilities, we need to find how many ways we can get into each state. In the case of the dice, how many ways can we roll each score (sum of the numbers shown on the upper side of the dice). The figure below shows an example of the number of ways we can roll a 7 or a 10.

![](img/dice.jpg)

The code in the table below computes the score for each state in our table, as well as determining if the score is a result of a double. Execute this code and examine the results.

In [None]:
twoDice$sum = twoDice$Var1 + twoDice$Var2
twoDice$isdouble = twoDice$Var1 == twoDice$Var2 ## == is logical equals
twoDice

Next, we need to transform this table enumerating the states to a frequency table with the counts of states. Execute the code in the cell below to do just this.

In [None]:
# Count different sums
sumCounts = table(twoDice$sum)
sumCounts

Examine this table. Which score is the most likely, and  which scores are least likely?

Now, compute the probability of rolling a double by executing the code in the cell below, which uses the 'fractions' function from the 'MASS' library.

In [None]:
library(MASS) # Contains the function 'fractions()', 
# Probability of a double:
fractions(doubles/nrow(twoDice)) # type ?fractions for detail

Finally, we can compute and plot the probabilites of the possible scores from rolling two dice. For situations where the outcomes are discrete states, the probability of each state is just the number of ways that state can occur divided by the total number of possible states. This is done to ensure the sum of all probabilities adds to 1.0. 

Execute the code in the cell below and examinethe result. 

In [None]:
# Probabilities of sums:
sumProb = fractions(table(twoDice$sum)/nrow(twoDice)) 
barplot(sumProb)

Examine this result. Notice that the probabilities are proportional to the frequencies, but are scaled to ensure they add to 1.0.



***
**Your turn:** In the cell below write and execute the R code to determine how many of the 36 possible states of our dice are doubles. Is the result consistent with the properties of dice?
***

## Basics of Probability

We will now investigate some basics of probability in a bit more detail.  

### Discrete probability

A **discrete distribution** is a probability distribution describing a process with discrete outcomes. We have already investigate an example of a discrete process when we examined the outcome of dice throws. Each die lands with a certain side up and the sum of the two numbers is the total score. 

For a discrete distribution the probability of an event, A, is the number of ways A can occur, divided by the number of total possible outcomes in our Sample Space, S. Let's make this concrete with an example.

![](img/Prob1.png)

The probability of the events in a subset, A, given a set of possible events in the sample space, S, can be computed as follows:

$$P(A) = \frac{N(A)}{N(S)}$$

In this case there are 10 events in S, 6 events in subset A and 4 in subset B. We can compute the probabilities of A and B like this:

$$P(A) = \frac{6}{10} = \frac{3}{5} = 0.6\\
P(B) = \frac{4}{10} = \frac{2}{5} = 0.4$$

***
**Your turn:** in the cell below find the probability that the score from a throw of a pair of dice will be less than or equal to 3. Hints: use filters on the `twoDice` data frame and the `nrow` function to get the counts. 
***

In [None]:
nrow(twoDice[twoDice$sum == 2 | twoDice$sum ==3, ]) / nrow(twoDice)

### Axioms of probability

All probability distributions must have a certian properties, which we refer to as the **axioms of probability**. These are:

- Probability is bounded between 0 and 1:  
$$0 \le P(A) \le 1 \\$$

- Probability of the Sample Space = 1:  
$$P(S) = \sum_{All\ i}P(a_i) = 1\\$$

- The probability of finite independent unions is the sum of their probabilities:

$$P(A \cup B) = P(A) + P(B)\\ if\ and\ only\ if\\ A \cap B = 0\\$$

***
**Your turn:** In the cell below create and execute the R code to show that $P(S) = 1$ for the simple set example we have using.
***

### Set operatons and probability

Set operations can be readily applied to probability problems. Continuing with our example, we can apply the following common set operations.

- **Intersection:** 
$$P(A \cap B)  = \frac{2}{10} = \frac{1}{5} = 0.2$$

- **Union:** 
$$P(A \cup B) = \frac{8}{10} = \frac{4}{5} = 0.8$$

- **Negation:** 
$$P(A)' = \frac{4}{10} = \frac{2}{5} = 0.4$$

You can use this basic operations to create more complex operations. For example:

$$P((A \cup B)') = P(A' \cap B') = P(C) = \frac{2}{10} = 0.2$$

We can also write an expression like the following:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

### Independence vs. mutual exclusivity

In probability there are two concepts which are quite different but often confused, mutual exclusivity and independence.

Events in B are considered **independent** of events in A if events in B have no effect on events in A. For independent events, we can write the following:

$$P(A \cap B) = 0\\
and\\
P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

If A and B are **mutually exclusive** an event in B means there are can no event in A and vice versa. For mutually exclusive events, we can write:

$$P(A \cap B) = 0 = Empty\ set\\
and\\
P(A \cup B) = P(A) + P(B)
$$



***
**Your turn:** Consider the diagram of event sub-sets in the figure below:

![](img/ME.jpg)

Write and execute the R code to compute the following:
$$P(A \cup B)\\
P(A \cap B)\\
P((A \cup B)')$$
***

### Conditional probability

**Conditional probability** is the probility that event A occurs given that event B has occured. We can write conditional probability as follow, which we say is the probability of A given B:

$$P(A|B)$$

We can work out this conditional probabilty as follows:

$$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{\frac{2}{10}}{\frac{4}{10}} = \frac{2}{4} = \frac{1}{2}$$

If event A is **independent** of B, then:

$$P(A|B) = P(A)$$

But, just because A is independent of B, does not mean B is independent of A. Or, in terms of our notation:

$$P(A|B) = P(A)\\ does\ not\ imply\\ P(B|A) = P(B)$$

