# Introduction to Probability

Refer to [chapter 5 of onlinestatbook](http://onlinestatbook.com/2/probability/probability_intro.html) , specifically, sections 2, 3, 6, 8, 10, and 11 to get familiarized with the concepts. 

In many events there are no definite outcomes, so the outcome can't be predicted with total certainty.
What we can say is how likely the outcomes are to happen, using the idea of probability. 
Inferential statistics is built on the foundation of probability theory and has been remarkably successful in guiding opinion about the conclusions to be drawn from data. 

One conception of probability is drawn from the idea of symmetrical outcomes. 
For example, the two possible outcomes of tossing a fair coin seem not to be distinguishable in any way that affects which side will land up or down. 
Therefore the probability of heads is taken to be 1/2, as is the probability of tails. 
In general, if there are N symmetrical outcomes, the probability of any given one of them occurring is taken to be 1/N. 
Thus, if a six-sided die is rolled, the probability of any one of the six sides coming up is 1/6.


### Sample Spaces

For a random experiment $E$, the set of all possible outcomes of $E$ is called the sample space and is denoted by the letter $S$. 
For a coin-toss experiment, $S$ would be the outcomes of “Head” and “Tail”, which we may represent by $S = \{H, T\}$. Formally, the performance of a random experiment is the unpredictable selection of an outcome in $S$.

The R package `prob` has all the functions to find probability of different basic events. 
A sample space is (usually) represented by a data frame.
Each row of the data frame corresponds to an outcome of the experiment.

Consider the random experiment of tossing a coin.
The outcomes are H and T. 
We can set up the sample space quickly with the tosscoin function:

In [1]:
library(prob)
tosscoin(1)

Loading required package: combinat


Attaching package: ‘combinat’


The following object is masked from ‘package:utils’:

    combn


Loading required package: fAsianOptions

Loading required package: timeDate

Loading required package: timeSeries

Loading required package: fBasics

Loading required package: fOptions


Attaching package: ‘prob’


The following objects are masked from ‘package:base’:

    intersect, setdiff, union




toss1
<fct>
H
T


The number 1 tells tosscoin that we only want to toss the coin once. We could toss it more times, like tosscoin(3), to get the output below...

In [2]:
tosscoin(3)

toss1,toss2,toss3
<fct>,<fct>,<fct>
H,H,H
T,H,H
H,T,H
T,T,H
H,H,T
T,H,T
H,T,T
T,T,T


---

## Events

An event $A$ is merely a collection of outcomes, or in other words, a subset of the sample space. 
After the performance of a random experiment $E$, we say that the event $A$ occurred if the experiment’s outcome was $A$. We say that a number of events $A_1, A_2, A_3, \dots$
are mutually exclusive or disjoint if $A_i \cap A_j = \emptyset$; for any distinct pair $A_i \ne A_j$. 
For instance, in the coin-toss experiment the events $A$ = {Heads} and $B$ = {Tails} would be mutually exclusive.

In [3]:
subset(rolldie(3), X1 + X2 + X3 > 16)

Unnamed: 0_level_0,X1,X2,X3
Unnamed: 0_level_1,<int>,<int>,<int>
180,6,6,5
210,6,5,6
215,5,6,6
216,6,6,6


Since the die is rolled 3 times, total number of possible outcomes are $6^3=216$. There are 4 outcomes as listed above where the sum of numbers appeared on dice in three rolls is greater than 16. 

---

#### Functions for Finding Subsets

**The `%in%` function**

The function %in% helps to find out whether each value of one vector lies somewhere inside another vector.

In [4]:
x <- 1:10
y <- 8:12
y %in% x

Notice that the returned value is a vector of length 5 which tests whether each element of `y` is in `x`, in turn.

----

**The `isin` function**

It is more common to want to know whether the whole vector y is in x. We can do this with the isin function.

In [5]:
isin(x, y)

Note that there is an optional argument ordered, which tests whether the elements of y appear in x in the order in which they are appear in y. The output is...

In [6]:
isin(x, c(3, 4, 5), ordered = TRUE)

In [7]:
isin(x, c(3, 5, 4), ordered = TRUE)

---

## Properties of Probability


There are three axioms that establish the foundation of Probability theory. Axioms are assumptions that are accepted as true  statements without proof and serve as building blocks. For example `if a=b then b=a` is an algebraic axiom. 

**Axioms:**

Considering $S$ an event set with a probability measure $P$ so that the probability of any event $A$ in $S$ is given by $P(A)$, the probability measure obeys the following axioms: 


 1. $P(A)\ge0$ : Probability measure cannot be negative. 
 
 
 
 2. $P(S)=1$    : Sum of all probabilities in $S$ is 1. 
 
 
 
 3. if $\{A_1, A_2, ..., A_j\}$ is a sequence of mutually exclusive events so that their intersection $A_i\cap A_j$ is empty set for all distinct pairs of $i$ and $j$, then the probability of their unions $P(A_1\cup A_2\cup \dots A_j\cup \dots)$ can be computed as the sum of their probabilities $P(A_1)+P(A_2)+\dots P(A_j)+\dots$.
 
 
Every other property or theorem can be proved by using these axioms. For example, the following theorem can be proved: 
 
*The probability that either A or B will happen or that both will happen 
is the probability of A happening plus the probability of B happening 
less the probability of the joint occurrence of A and B.*
   
which can be expressed like this: 
 
 $P(A\cup B) = P(A) + P(B) - P(A \cap B)$ 
 
 
See this link for axioms and more properties: https://www.le.ac.uk/users/dsgp1/COURSES/LEISTATS/STATSLIDE2.pdf  
Refresh your memory about the **set theory** symbols: https://www.rapidtables.com/math/symbols/Set_Symbols.html  
Rules of probability: https://www2.isye.gatech.edu/~brani/isyebayes/bank/handout1.pdf

---



**Some rules:**

For any events $A$ and $B$,

I. $P(A^c) = 1 - P(A)$.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Proof: Since $A \cup A^c = S$ and $A \cap A^c = \emptyset$, we have

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $P(A) + P(A^c) = P(A \cup A^c) = P(S)  = 1$.

This is called the **Complement Rule**.  It states that for an event A and its complement $A^c$, the probability of $A$ is equal to one minus the probability of $A^c$.  What are complementary events? In probability theory, the complement of an event A is the event _not_ A; this complementary event is often denoted A’ or $A^c$.   



II. P($\varnothing$) = 0.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Proof: See this link, page 3: https://www.le.ac.uk/users/dsgp1/COURSES/LEISTATS/STATSLIDE2.pdf.

The probability of a null event (i.e., an event that will never occur) is 0.
    



III. If $A \subset B$ , then $P(A) \le P(B)$. If $A$ is a subset of $B$, then its probability is less or equal. 
    



IV. $0 \le P(A) \le 1$. Probability of an event is between 0 and 1 inclusive. This seems like an axiom but it can actually be proved by using the three axioms. 



V. The General Addition Rule.

If the events are mutually exclusive:

$P(A \cup B) = P(A) + P(B)$

If the events are *not* mutually exclusive

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$



VI. The Theorem of Total Probability. Let $B_1, B_2, . . . , B_n$ be mutually exclusive and exhaustive.

Then $P(A) = P(A \cap B_1) + P(A \cap B_2) + ... + P(A \cap B_n)$

Following diagram should provide intuition for this theorem. 

<img src='../images/total.png'>

---

### Counting Methods

To be able to compute the probability of an event $A$, we need to know the number of outcomes in the sample space $S$ and the number of outcomes in the event $A$; so we need to do some counting. 


$$ P(A) = \frac{|A|}{|S|}=\frac{M}{N}$$

Finding the probability of $A$ reduces to a **counting problem**.

There are different counting methods. 

**Fundamental Counting Principle: the multiplication principle**: 
The simplest way to count is to multiply the outcomes of events to find the total number of outcomes. 
Suppose that an experiment is composed of two successive steps. 
Further suppose that the first step may be performed in $n_1$ distinct ways while the second step may be performed in $n_2$ distinct ways. 
Then the experiment may be performed in $n_1 \times n_2$ distinct ways.

More generally, if the experiment is composed of $k$ successive steps which may be performed in $n_1, n_2, . . . , n_k$ distinct ways, respectively, then the experiment may be performed in $n_1\times n_2 \times \dots \times n_k$ distinct ways.





**Example:** We would like to order a pizza. 
It will be sure to have cheese (and marinara sauce) but we may elect to add one or more of the following five (5) available toppings: 

        pepperoni, sausage, anchovies, olives, and green peppers.

How many distinct pizzas are possible?


**Answer:** There are many ways to approach the problem, but the quickest avenue employs the Multiplication Principle directly.
We will separate the action of ordering the pizza into a series of stages. At the first stage, we will decide whether or not to include pepperoni on the pizza (two possibilities). 
At the next stage, we will decide whether or not to include sausage on the pizza (again, two possibilities). 
We will continue in this fashion until at last we will decide whether or not to include green peppers on the pizza.

At each stage we will have had two options, or ways, to select a pizza to be made. 
The Multiplication Principle says that we should multiply the 2’s to find the total number of possible pizzas:
$$2 \times 2 \times 2 \times 2 \times 2 = 2^5 = 32$$

**Another example:** How many different ways can four people line up? 

**Answer:** We choose any one of the **four** people to be first. Then there are **three** people who can be second, and **two** people who can be third. At this point there is only **one** person left to be last. Using the multiplication principle: 

$$4\times 3\times 2\times 1 = 24$$ 

There are 24 ways for four people to line up. This type of counting comes up pretty often and there is a notation to signify multiplication of numbers from $1$ to $n$: it's called **factorial** and is written like following:

$$n! = n\times (n-1) \times (n-2) \times \dots \times 2 \times 1$$

So there are $4!$ ways for four people to line up. 

---

**Now, we need some terminology:**

**Sampling:** sampling from a set means choosing an element from that set. We often **draw** a sample at random from a given set in which each element of the set has equal chance of being chosen.

**With or without replacement:** usually, we draw multiple samples from a set. If we **put** each object **back** after each draw, we call this sampling with replacement. In this case a single object can be possibly chosen multiple times. For example, if $A=\{a_1,a_2,a_3,a_4\}$ and we pick 3 elements with replacement, a possible choice might be $(a_3,a_1,a_3)$. Thus "with replacement" means "repetition is allowed." On the other hand, if repetition is not allowed, we call it sampling without replacement.

*The example above with four people lining up is without replacement: once we choose a person, we can only choose from the remaining people.*  

**Ordered or unordered:** If ordering matters (i.e.: $(a_1,a_2,a_3)\ne (a_2,a_3,a_1)$), this is called ordered sampling. Otherwise, it is called unordered.


*The example above with four people lining up is ordered: $(Person_1,Person_2,Person_3,Person_4)$ and $(Person_2,Person_1,Person_3,Person_4)$ are different outcomes (distinct sequences), for example.* 


Thus when we talk about sampling from sets, we can talk about four possibilities.

 - ordered sampling with replacement
 - ordered sampling without replacement
 - unordered sampling without replacement
 - unordered sampling with replacement



### Ordered Samples

The number of ways in which one may select an **ordered** sample of $k$ subjects from a population that has $n$ distinguishable members is

* $n^k$ if sampling is done with replacement


* $n(n - 1)(n - 2) ... (n - k + 1)$ if sampling is done without replacement


**Example:** We rent five movies to watch over the span of two nights. 
We wish to watch 3 movies on the first night. 
How many distinct sequences of 3 movies could we possibly watch?

**Answer:** $5 \times 4 \dots \times (5-3+1)  = 5 \times 4 \times 3 = 60$. Here, the sampling is 3 out 5 and there is no replacement; once we watch a movie, we don't want to watch it again. The question asks for **distinct** sequences, so the order we pick and watch the three movies matter. 

### Unordered Samples

The number of ways in which one may select an **unordered** sample of $k$ subjects from a population that has $n$ distinguishable members is:

 - if sampling is done with replacement:
 $$\frac{(n - 1 + k)!}{(n - 1)!k!}$$ 

 - if sampling is done without replacement: 
 $$\frac{n!}{k!(n - k)!}$$ 

The quantity $\frac{n!}{k!(n-k)!}$ is called a binomial coefficient and plays a special role in mathematics. 
It is denoted  

$$\binom nk  = \frac{n!}{k!(n-k)!}$$


**Example:** 
You rent five movies to watch over the span of two nights, 
but only wish to watch 3 movies the first night. 
Your friend, Fred, wishes to borrow some movies to watch at his house on the first night. 
You owe Fred a favor, and allow him to select 2 movies from the set of 5. How many choices does Fred have? 

**Answer:** $\binom 52  = 10$. He will choose two movies (without replacement of course) and in which order he will choose them  is irrelevant, so it is unordered. 

---


**Which formula did we actually use for the "four people" problem?**

We know that it is without replacement; we can't choose the same person twice to get in line. We also know that the order matters. So we have to use:

$n(n - 1)(n - 2) ... (n - k + 1)$

**What is n and k?**

n=4, no question there. What is k? Is it 1? Does the formula work? 


Remember that we are looking to see how many different possibilities there are for four people to line up. So we are actually **choosing sequences of 4 people out of 4 people**: $n=k=4$. 

$4(4 - 1)(4 - 2) ... (4 - 4 + 1) = 4\times 3\times 2\times 1$

---

### Let's try to solve a problem using R commands:

**Example:** 

Let our urn contain three balls, labeled 1, 2, and 3, respectively. 
We are going to take a sample of size 2 from the urn.

**Answer:** 

The `prob` package accomplishes sampling with the `nsamp()` function, which has arguments `n`, `k`, `replace`, and `ordered`. 

The argument `n` represents the urn from which sampling is to be done. 
The `k` argument tells how large the sample will be. 
The `ordered` and `replace` arguments are logical and specify how sampling will be performed. 


If sampling is with replacement, then we can get any outcome 1, 2, or 3 on any draw. 
Further, by "ordered" we mean that we shall keep track of the order of the draws that we observe. 
We took a sample of size 2 from an urn with three distinguishable elements.

In [6]:
nsamp(n = 3, k = 2, replace = TRUE, ordered = TRUE)

The total number of outcomes for above event are 9. 

What if we did not allow replacement? 

In [7]:
nsamp(n = 3, k = 2, replace = FALSE, ordered = TRUE)

---

Take a look at the following solutions for the above problems. 


In [8]:
# The movie example: distinct sequences of three movies out of five

nsamp(n = 5, k = 3, replace = FALSE, ordered = TRUE)

In [9]:
# Let Fred choose two:

nsamp(n = 5, k = 2, replace = FALSE, ordered = FALSE)

In [10]:
# Four people problem

nsamp(n = 4, k = 4, replace = FALSE, ordered = TRUE)

**Pizza problem:**

Not so obvious. We are choosing 5 toppings out 5 toppings, no replacement, order does not matter, right?  

In [11]:
# Pizza problem

nsamp(n = 5, k = 5, replace = FALSE, ordered = FALSE)

**What just happened??**

What we are really choosing is a yes/no answer for each of the five toppings. So we need a **sequence of yes/no answers**. 

     yes,yes,no,no,yes --> three toppings 

     no, yes,no,no,yes --> two toppings, etc. 

Does the order matter? Yes, that's how we choose a particular topping or not. 

Is there replacement? Of course, we should be able to choose "yes" or "no" multiple times. 

So, we have two choices, yes or no, n=2

and we choose the answers five times, k=5

they are ordered, and there is replacement. 



In [12]:
# Pizza problem

nsamp(n = 2, k = 5, replace = TRUE, ordered = TRUE)