# Conditional Probability

In this lab notebook, we will be looking into independent and dependent events, 
concepts of conditional probablity, permutations, and combinations. 


Conditional probability is the probability of seeing some event knowing that some other event has actually occurred. For example, weather forecasting is based on conditional probabilities. 
When the forecast says that there is a 30% chance of rain, 
that probability is based on all the information that the meteorologists know up until that point.

Let's look into what independent and dependent events are. 

**Reference:** [Elementary Probability and the prob Package](https://cran.r-project.org/web/packages/prob/vignettes/prob.pdf)

---

<span style="color:#1871d6; font-size:20px; font-weight:500">  Independent and Dependent Events</span>


**Independent:** The probabilities of events $A$ **and** $B$ (and as in intersection) are said to be independent if the fact that one event has occurred does not affect the probability that the other event will occur.  

$$ P(A \cap B) = P(A) \times P(B) $$
            
           
IF $P(A) \times P(B) = P(A \cap B)$ THEN $A$ and $B$ are independent events, otherwise, they are dependent events.

If the occurrence of Event $A$ changes the probability of Event $B$, then Events $A$ and $B$ are **dependent.**  

A **conditional probability** is one that denotes the probability that Event $A$ occurs, **given** that Event $B$ _has occurred_.  Here is how we represent a conditional probability:

$$ P(A|B) = \frac{P(A\cap B)}{P(B)} $$
    

In a conditional probability, **if A and B are independent** (which then means it's not truly a conditional probability), the numerator of the fraction will factor out so that $P(B)$ then cancels out:


 

if $A$ and $B$ are independent   $$ P(A|B) = \frac{P(A)\times P(B)}{P(B)} =  P(A)$$

                                                     

Let's consider the example of tossing ten coins to illustrate the nature of independent events. 
So, what is the probability of observing at least one Head? 

Imagine that we are tossing the coins in such a way that they do not interfere with each other; i.e. they are independent events. 

The only way there will not be at least one Head is if **all** tosses are Tails. 
Therefore,
         
$P$(at least one H) $= 1 − P$(all T) $ = 1 - {(\frac{1}{2})}^{10} = 0.9990234375$
         
---

In [None]:
#Let's try to do the same thing in R

library(prob)

Space <- tosscoin(10, makespace = TRUE)

# The isrep function in the prob package will test each row of Space to see whether the value T appears 10 times 
# and returns true or false for each row it checks. The subset function is logical, so makes a subset with the 
# rows which are true.

head(isrep(Space, vals = "T", nrep = 10))

subset(Space, isrep(Space, vals = "T", nrep = 10))

A <- subset(Space, isrep(Space, vals = "T", nrep = 10))

1 - Prob(A)

---

<span style="color:#1871d6; font-size:16px; font-weight:700"> Repeated Experiments with Independent Events </span>

Experiments are repeated when we want to discern the probability of two events occuring more reliably. 
Often, a single experiment does not yield sufficient data. 
Therefore, it is common to repeat a certain experiment multiple times under identical 
conditions and in an independent manner. 
Experiments like tossing a coin repeatedly, rolling a die or dice, etc. are repeated experiments.

The `iidspace` function in the prob library in R (note `library(prob)` in the code above) implements repeated experiments. 
It takes three arguments: 
`x`, which is a vector of outcomes, 
`ntrials`, which is an integer telling how many times to repeat the experiment, and 
`probs` to specify the probabilities of the outcomes of x in a single trial.

In [None]:
iidspace(c("H","T"), ntrials = 3, probs = c(0.5, 0.5))

---

<span style="color:#1871d6; font-size:20px; font-weight:700"> Dependent Events / Conditional Probability


Consider an example of drawing cards from a full deck of 52 standard playing cards as an example of *dependent events* and *conditional probability*. 


We will select two cards from the deck, in succession. Let's define two events $A$ and $B$ as following: 

                   A = {first card drawn is an Ace}; B = {second card drawn is an Ace}. 

Since there are **four** Aces in the deck, it is natural to assign $P(A) = 4/52$. 


Are $A$ and $B$ dependent? 


Let's see how $B$ depends on $A$: after the first card is drawn, there are only **51** cards remaining. 
What is the probability of **B** now? The answer depends on the value of the first card. 
If the first card is an Ace, then the probability that the second also is an Ace should be 3/51, 
but if the first card is not an Ace, then the probability that the second is an Ace should be 4/51. 

So the probability of $B$ is conditioned on $A$. 
For the situation in which Event $B$ is a drawn Ace **after** Event $A$ is a drawn Ace, we write:
      
$$ P(B|A) = 3/51$$

$$ P(A) = 4/52$$
    
The probability of $B$ happening, if $A$ happened, is 3/51. 

    
**Definition:** The conditional probability of $B$ given $A$ (i.e., the probability of $B$ given that $A$ occurs), denoted $P(B|A)$, is defined by:
    

$$ P(B|A) = \frac{P(A\cap B)}{P(A)}  $$

                
We can't factor the numerator and cancel the $P(A)$'s here, though, because the events are not independent.

$P(A \cap B)$ means **the probability that A and B intersect**; the probability that both **A and B** occur.

The events are _dependent_ because the occurrence of Event $A$ changes the probability of the occurrence of Event $B$. 
    
    
From 
$$ P(B|A) = \frac{P(A\cap B)}{P(A)}  $$

we can rewrite:

$$ P(A\cap B) = P(B|A)\times P(A) =   \frac{3}{51}\times \frac{4}{52}  =    0.0045     $$



In [None]:
(3/51)*(4/52)

---

**Example:** 

Let's work out an example. 

Toss a six-sided die **twice**. 
The sample space consists of all ordered pairs $(i, j)$ of the numbers $1, 2,\dots , 6$, that is, $S = \{(1, 1), (1, 2), \dots ,(6, 6)\}$. 

Essentially, $i$ is the outcome of one die and $j$ is the outcome of the other die. 

Let $A$ = {outcomes match} and $B$ = {sum of outcomes at least 8}.


In [None]:
# The first thing to do is set up the probability space with the 
# rolldie function inside the prob library.

# So, "S" is the probability space
# "rolldie" is the function
# 2 is the number of die
# makespace is "make the space? Yes or no?"

S <- rolldie(2, makespace = TRUE)

# S contains all the 36 possible outcomes {(1,1),(1,2)....(6,6)} with 
# each outcome having an identical probability of 0.02777778
head(S)

In [None]:
# Subsetting sample space S, for outcomes matching event A(outcomes match). 
# This results in a set where both die are the same (i & j are the same)
A <- subset(S, X1 == X2)
A

In [None]:
# Subsetting sample space S, 
# for outcomes matching event B(sum of outcomes at least 8). 
# The die total must be 8 or more
B <- subset(S, X1 + X2 >= 8)
B

In [None]:
# When calculating conditional probability, we should use the 
# "given" argument of the prob function as shown below:

# A is the event of getting same outcome {{1,1},{2,2}...{6,6}}

# B is the event of getting the outcomes with sum of >=8 {{2,6},{3,6},{4,6},{5,6},{6,6},{3,5}...{6,6}}

paste('P(A/B): ',Prob(A, given = B))

paste('P(B/A): ',Prob(B, given = A))

# Instead of defining events A and B, you can directly do conditional probability, if we reference the original 
# probability space S as the first argument of the prob calculation as shown below:
paste('P(A/B): ', Prob(S, X1 == X2, given = (X1 + X2 >= 8)))

paste('P(B/A): ', Prob(S, X1 + X2 >= 8, given = (X1 == X2)))

The above examples shown are simple applications of conditional probability on a die. 
`prob` package can be extended to multivariate datasets where events can be defined 
as columns and supplied as arguments, like in the previous examples.

---

<span style="color:#1871d6; font-size:20px; font-weight:700"> Permutations and Combinations

The main difference between combinations and permutations is that a combination does not take into account the order, whereas a permutation does.

Consider a simple example from [mathisfun](http://www.mathsisfun.com/combinatorics/combinations-permutations.html). 
When we say "My fruit salad is a **combination** of apples, grapes and bananas", we are not bothered about what order the fruits are in. No matter in which order you mention the fruits, it's the same fruit salad.

But when we say "You need the combination 123 to open the safe", 
we care about the order of numbers. 
No other combination will work to open the safe. 
It has to be exactly 1-2-3. 
This is a **permutation**.

  * When the order doesn't matter, it is a Combination.
	
  * When the order does matter, it is a Permutation.
    

There are many ways you can create permutations and combinations in R. 
We will be using `combinat` package for this. 

**combn():**

`combn()` is used to generate combinations. Its usage is illustrated below. 

`Usage`

    combn(x, m, fun=NULL, simplify=TRUE, ...)


`Arguments`

    x         vector source for combinations i'e the vector of elements used to generate the combinations 
    m         number of elements in combination. If you specify 2 as input, combinations of size two are generated.
    fun       function to be applied to each combination (may be null). It can be any function like sum(), mean() etc.
    simplify  logical, if FALSE, returns a list, otherwise returns vector or array. 
    ...       args to fun

It generates all combinations of the elements of x taken m at a time. 
In code snippet below, we have given an input of 4 to x and 2 to m. 
So, the function has to return combinations of size 2 using the numbers {1,2,3,4}, like {{1,2},{1,3}....}. 

If argument FUN is not NULL, the code applies a function given by the argument to each point. 
We will supply sum() as the function. 
If `simplify` is FALSE, it returns a list; otherwise, it returns an array, typically a matrix. 
"..." are passed unchanged to the FUN function, if specified.

In [None]:
library(combinat)
#Generate different possible combinations of size 2 using numbers {1,2,3,4}
combn(4, 2)

print("sum of elements of each combination ")
#Generate different possible combinations of size 2 using numbers {1,2,3,4} and return their sums.
combn(4, 2,sum)

**permn():** `permn()` is used to generate permutations. 

`Usage`

    permn(x, fun=NULL, ...)
        
        
`Arguments`

    x    vector source for permutations i'e the vector of elements used to generate the permutations 
    fun  if non.null, applied at each perm

Generates all permutations of the elements of x. 
In the example below we have given 3 as our input in order to generate permutations of size 3, like {{1,2,3},{1,3,2},{2,1,3}...} etc. 
If argument "fun" is not null, it applies a function given by the argument to each point. 

In [None]:
#Generate different possible permutations using numbers (1,2,3)
permn(3)

#Generate different possible permutations using numbers (1,2,3) and return standard deviation of permutations.
permn(3,sd)

In [None]:
# You can find the number of permutations generated using length function. 
length(permn(3))

---

### Extensions of probability to multivariate data

We have seen how conditional probability has been applied to simple dice events. 
Let's continue our discussion to multivariate data. 
We will work with the motor vehicle thefts dataset. 
The data is a combination of both factor and continuous variables. 
The table() command is used extensively when dealing with conditional probability.

Load the dataset into a dataframe called `vehicle_thefts`. 
Dataset is located in '/dsa/data/all_datasets/motor_vehicle_thefts/' directory. 

**NOTE:** This is a variation of the mvt.csv file that is used in other courses.

In [None]:
vehicle_thefts <- read.csv("/dsa/data/all_datasets/motor_vehicle_thefts/mvt.csv", header = TRUE)

head(vehicle_thefts)

In [None]:
# Extract month, weekday, hour etc. values from a date variable. Convert the format of date variable into a 
# standard format so that day, month, year etc. values can be extracted from a date.

# There are two internal implementations of date/time: POSIXct, which stores seconds since UNIX 
# epoch (+ some other data), and POSIXlt, which stores a list of day, month, year, hour, minute, second, etc.

# strptime is a function to directly convert character vectors (of a variety of formats) to POSIXlt format.
# In this dataset, the date variable was originally stored as a character vector.

DateConvert = strptime(vehicle_thefts$Date, "%m/%d/%Y")

# Extract the month and the day of the week and add these variables to the data frame vehicle_thefts.
# months() and weekdays() functions help you extract the values from a "POSIXlt" object

vehicle_thefts$Month <- months(DateConvert)

vehicle_thefts$Weekday <- weekdays(DateConvert)

**Reference:** [strptime()](http://rfunction.com/archives/1912)                

In [None]:
head(vehicle_thefts)

In [None]:
# What is the probability that an arrest has occurred for domestic motor vehicle theft?

# We have to find the distribution of thefts based on whether they are domestic or not. We will use a 2-way 
# table to generate these frequencies and to use these frequencies to determine probabilities.

# with() is used to avoid refering to the dataframe every time we refer to one of its variables in the table command. 
with(vehicle_thefts, table(Arrest, Domestic))

In [None]:
# We are trying to find out the probability of arrest happening given the theft is a Domestic type. 
# Mathematically, this is represented as P(Arrest|Domestic) = P(Arrest & Domestic)/P(Domestic) = 65 / 415

# P(Arrest & Domestic) = 65. Look at above table for the instances where arrest is TRUE and domestic is TRUE.
# P(Domestic) = 415. Look for instances where Domestic is true in above table.

65 / (350 + 65)

**Reference:** [with()](http://www.statmethods.net/stats/withby.html). 

You can use `attach()` function as an alternative to `with()`. 
`attach()` is used to make objects within dataframes accessible in R with fewer keystrokes. 
Once you attach the dataframe you can refer to its variable without referring to the dataframe.

**Reference:** [attach()](https://www.r-bloggers.com/to-attach-or-not-attach-that-is-the-question/)

In [None]:
# what is the probability that an arrest has been made for motor thefts given that the year was 2001?
with(vehicle_thefts,table(Arrest, Year))

In [None]:
# P(Arrest|Year == 2001) = P(Arrest & Year == 2001)/P(Year == 2001)

# P(Arrest being made & Year == 2001) = 2152
# P(Year == 2001) = 2152 + 18517

2152 / (2152 + 18517)

In [None]:
# There are different locations where the cars are being stolen from. Subset the data using top 5 locations in 
# the order of maximum number of thefts, excluding the "Other" category.  Select the bottom 5 of the following 
# options.

sort(table(vehicle_thefts$LocationDescription))

In [None]:
# Create a subset of data, including observations for which the theft happened in one of the top five locations.  
# Call this new data set "Top5".

Top5 <- subset(vehicle_thefts, vehicle_thefts$LocationDescription=="STREET" | 
                               vehicle_thefts$LocationDescription=="PARKING LOT/GARAGE(NON.RESID.)" | 
                               vehicle_thefts$LocationDescription=="ALLEY" | 
                               vehicle_thefts$LocationDescription=="DRIVEWAY - RESIDENTIAL" | 
                               vehicle_thefts$LocationDescription=="GAS STATION")


str(Top5)

Take a look at the number of levels of **LocationDescription**. 
Ideally, the new dataframe `Top5` should contain only five locations: 
STREET, PARKING LOT/GARAGE(NON.RESID.), ALLEY, DRIVEWAY - RESIDENTIAL and GAS STATION. 
However, str() says **LocationDescription** is character.


In [None]:
# We need to update the LocationDescription of Top5 dataframe according to new data and make it factor. If 
# you forget to update the LocationDescription, the Top5$LocationDescription will either contain all 78 original levels 
# that you find in vehicle_thefts$LocationDescription if it is read as factor, OR it will still be a chr if read as chr. 

Top5$LocationDescription = factor(Top5$LocationDescription)

str(Top5)

In [None]:
# What is the probability that an arrest is made given that the place was a street?

with(Top5,table(LocationDescription, Arrest))

In [None]:
# P(arrest|LocationDescription == "street") = P(Arrest & street)/P(street)

# P(Arrest being made & location is 'street') = 11595

# P(street) = 11595 + 144969 
 
(11595) / (144969 + 11595)

In [None]:
# What is the probability that an arrest did not happen on a Monday?

with(Top5, table(Weekday, Arrest))

In [None]:
# P(!Arrest|Weekday == "monday") = P(!Arrest & Weekday)/P(Weekday)

# P(Arrest didn't happen & Weekday is 'Monday') = 23334
# P(Weekday  is 'monday') = 23334 + 1954

23334 / (23334 + 1954)