# Probabilty Fundamentals

![xkcd](img/increased_risk_2x.png)

[xkcd comic 1252](https://xkcd.com/1252/)

Learning goals for today:
- Develop a definition of probability beyond coin tosses  
- Describe set theory and it's terminology
- defining the size of sets with permutations and combinations


## Learning Goal 1:

### Openning Task:
- In groups of 5, take the following 5 scenarios.
- Each person is responsible for describing the scenario to the rest of the group of 5. 
- Each group will make a list of commonalities between the five scenarios
- The group will make a list of new terms from the articles they have not seen before
- The three groups of 5 should then consolidate a list of new terms on the board

### Scenario 1:

> Adequate prediction of the concrete bridge deck deterioration rate is necessary for maintenance and rehabilitation decisions. The stochastic deterioration of bridge decks can be most accurately modeled with a time-based probabilistic approach. In this work, a semi-Markov time-based model, based on accelerated failure time (AFT) Weibull fitted-parameters, was used to estimate the transition probabilities and sojourn times for the deterioration of concrete bridge decks. Approximately 30 years of in-service performance data for over 22,000 bridges in Pennsylvania were used in the model development. The proposed approach attempts to relate deck deterioration rates to various explanatory factors, such as structural system attributes, average daily traffic (ADT), route type, and environmental conditions. The following factors were found to be statistically significant with respect to the rate of bridge deck deterioration: type of rebar protection, continuous versus simply supported spans, overall bridge deck length, number of spans, bridge location, type of overlay, and whether or not the deck was located on interstate routes. Furthermore, the effects of remediation on bridge deck deterioration and service life were also evaluated and quantified, based on in-service performance data.


[Stochastic Analysis and Time-Based Modeling of Concrete Bridge Deck Deterioration](https://ascelibrary.org/doi/10.1061/(ASCE)BE.1943-5592.0001285?utm_source=TrendMD&utm_medium=cpc&utm_campaign=Journal_of_Bridge_Engineering_TrendMD_0&)

### Scenario 2:

>Occupancy has significant impacts on building performance. However, in current building performance simulation programs, occupancy inputs are static and lack diversity, contributing to discrepancies between the simulated and actual building performance. This paper presents an Occupancy Simulator that simulates the stochastic behavior of occupant presence and movement in buildings, capturing the spatial and temporal occupancy diversity. Each occupant and each space in the building are explicitly simulated as an agent with their profiles of stochastic behaviors. The occupancy behaviors are represented with three types of models: (1) the status transition events (e.g., first arrival in office) simulated with probability distribution model, (2) the random moving events (e.g., from one office to another) simulated with a homogeneous Markov chain model, and (3) the meeting events simulated with a new stochastic model. A hierarchical data model was developed for the Occupancy Simulator, which reduces the amount of data input by using the concepts of occupant types and space types. Finally, a case study of a small office building is presented to demonstrate the use of the Simulator to generate detailed annual sub-hourly occupant schedules for individual spaces and the whole building. The Simulator is a web application freely available to the public and capable of performing a detailed stochastic simulation of occupant presence and movement in buildings. Future work includes enhancements in the meeting event model, consideration of personal absent days, verification and validation of the simulated occupancy results, and expansion for use with residential buildings.

[An agent-based stochastic Occupancy Simulator](https://link.springer.com/article/10.1007/s12273-017-0379-7?utm_source=TrendMD&utm_medium=cpc&utm_campaign=Building_Simulation_TrendMD_1)

Occupancy stimulator to simulate the stochastic or random behavior of tenants in a building to try to accurately depict building performance

stochastic
Markov chain model
markov  time based model
weeble fitted parameters
monte carlo simulation
geometric probability
genetic algorthms

## Scenario 3

> Paul has the option between a high deductible plan and a low deductible plan for health insurance. 
If Paul chooses the low deductible plan he will pay the first 1000 dollars of the any medical costs. The low deductible plan costs 8000 dollars.<br>
> If Paul chooses the high deductible plan he will pay the first 2500 dollars of any medical costs. The high deductible plan costs 7500 dollars. <br>
> Paul found a table of data on the frequency of medical costs. Based on this table, which should he choose?

| Cost | Probability|
|:----:|:-------:|
|0 | 30% |
|1000 | 25%|
|4000 | 20% |
|7000 | 20% |
|15000 | 5% |

### Scenario 4

>A drawer contains red socks and black socks. When two socks are drawn at random, the probability that both are red is 1/2. <br> 
>a) How small can the number of socks in the drawer be?<br>
>b) How small if the number of black socks is even?

[the sock problem](https://engineering-math.org/2017/05/10/the-sock-drawer-probability-and-statistics-problem/)

### Scenario 5

>Among engineers, risk is defined as a product of probability of the occurrence of an undesired event and the expected consequences in terms of human, economic, and environmental loss. These two components are equally important; therefore, the appropriate estimation of these values is a matter of great significance. This paper deals with one of these two components—the assessment of the probability of vessels colliding, presenting a new approach for the geometrical probability of collision estimation on the basis of maritime and aviation experience. The geometrical model that is being introduced in this paper takes into account registered vessel traffic data and generalised vessel dynamics and uses advanced statistical and optimisation methods (Monte Carlo and genetic algorithms). The results obtained from the model are compared with registered data for maritime traffic in the Gulf of Finland and a good agreement is found.

[Probability modelling of vessel collisions](https://www.sciencedirect.com/science/article/pii/S0951832010000256)

__What is probabilities?__<br>
Probability theory is the study on the frequency of a given event occuring in all possible events. The terminology is not to be confused with odds. In this section we discuss the event space & sets, will calculate probability of events, and the relevance of probability theories in data science. <br>

__What should I care about probabilities?__<br>
Studying probabilities allows us to make better and more informed decisions, based on data previously collected. For example, understanding the fact that it is nearly impossible for us to ever win the lottery from a probabilistically stand point deters us from ever using that as a source of income. <br>
Probability theory also lies in the heart of making inference using our data, which is what statistics is all about!

## II. Set Theory
In probability theory, a set is denoted as a well-defined collection of objects.
Mathematically, you can define a set by $S$. If an element $x$ belongs to a set $S$, then you'd write $x \in S$. On the other hand, if $x$ does not belong to a set $S$, then you'd write $x\notin S$.

__2.1 Subsets__ <br>
Set $T$ is a subset of set $S$ if every element in set $T$ is also in set $S$. The mathematical notation for a subset is $T \subset S$.

__2.2 Set Operations__ <br>

    - Union of Two Sets: The union of 2 sets S and T is the set of elements of either S or T, or in both.
    
    - The intersection of two sets S and T is the set that contains all elements of S that also belong to T.

We are trying to create rooming arrangements based on staff interest for a staff trip. <br>
Who should room with whom based on interests?

This is another way to look at sets.<br>
And we can still use the math!

In [15]:
Robin = set(["art", "traveling", "wine", "doodling", "tech", "gadgets"])
Rob = set(["rock-climbing", "traveling", "dad jokes", "ice cream"])
Alison = set(["wine", "traveling", "shitts creek", "dogs"])
Su = set(["shitts creek", "dogs", "tarot card reading", "croquet", "taxonomy"])
Molly = set(["wine", "ice cream", "dogs", "zookeeping", "traveling"])

In [19]:
A=Robin|Rob
A

{'art',
 'dad jokes',
 'doodling',
 'gadgets',
 'ice cream',
 'rock-climbing',
 'tech',
 'traveling',
 'wine'}

In [17]:
B=Alison.intersection(Su)
B

{'dogs', 'shitts creek'}

In [18]:
Molly.intersection(Rob).intersection(Robin)

{'traveling'}

In [20]:
from itertools import permutations

In [24]:
l = list(permutations("pochemuchka", 5)) 
len(print(l))


[('p', 'o', 'c', 'h', 'e'), ('p', 'o', 'c', 'h', 'm'), ('p', 'o', 'c', 'h', 'u'), ('p', 'o', 'c', 'h', 'c'), ('p', 'o', 'c', 'h', 'h'), ('p', 'o', 'c', 'h', 'k'), ('p', 'o', 'c', 'h', 'a'), ('p', 'o', 'c', 'e', 'h'), ('p', 'o', 'c', 'e', 'm'), ('p', 'o', 'c', 'e', 'u'), ('p', 'o', 'c', 'e', 'c'), ('p', 'o', 'c', 'e', 'h'), ('p', 'o', 'c', 'e', 'k'), ('p', 'o', 'c', 'e', 'a'), ('p', 'o', 'c', 'm', 'h'), ('p', 'o', 'c', 'm', 'e'), ('p', 'o', 'c', 'm', 'u'), ('p', 'o', 'c', 'm', 'c'), ('p', 'o', 'c', 'm', 'h'), ('p', 'o', 'c', 'm', 'k'), ('p', 'o', 'c', 'm', 'a'), ('p', 'o', 'c', 'u', 'h'), ('p', 'o', 'c', 'u', 'e'), ('p', 'o', 'c', 'u', 'm'), ('p', 'o', 'c', 'u', 'c'), ('p', 'o', 'c', 'u', 'h'), ('p', 'o', 'c', 'u', 'k'), ('p', 'o', 'c', 'u', 'a'), ('p', 'o', 'c', 'c', 'h'), ('p', 'o', 'c', 'c', 'e'), ('p', 'o', 'c', 'c', 'm'), ('p', 'o', 'c', 'c', 'u'), ('p', 'o', 'c', 'c', 'h'), ('p', 'o', 'c', 'c', 'k'), ('p', 'o', 'c', 'c', 'a'), ('p', 'o', 'c', 'h', 'h'), ('p', 'o', 'c', 'h', 'e'), 

**Task**:

- In groups of 2-3, draw the venn diagram of interests of each person and how they overlap. 
- Then try the set notation learned in the learn.co curriculum to find the overlap answers with python


## Foundations of (Independent) Probabilities 

What's the probability that a staff person likes wine?

That's a very specific probability example.

But there are other applications and terminology that are important for probability. 


In this section, we will introduce you to the foundation of independent probability theory. Later on in the course, you will be introduced to concepts such as conditional probability and probability of dependent events.

__Terminology Alert__ 
- Random Variable
    - A random variable is a variable whose outcome is the result of a random phenomenon which can take on different values
    - A random variable can either be discrete or continuous
        - __Discrete__ : the variable takes on integer values
        - __Continous__ : can take on any values

####  Probability of A and B 
<center>$P(A and B) = P(A) * P(B)$</center>

What's the probability that a staff person likes wine *and* likes dogs?

#### Probabilities of A or B
<center>$P(A  or  B) = P(A) + P(B) - P(A  and  B)$</center>

What's the probabilty that someone like ice cream *or* traveling?

What happens when you have multiple events? 

$$ P(A orB or C) = P(A) + P(B) + P(C) - P(A and B) - P(A and C) - P(B and C) + P(A&B&C) $$

## Permutations & Combinations
Help us define the full *set* of options related to a probability

**Permutation**
    - ordering matter
    - how many different arrangement can you get out of a number of elements?
    - possible number of arrangement $r$ out of a total of $n$ elements is given by:  $n! / (n – r)!$ 

```
from itertools import permutations 
l = list(permutations(range(1, 4))) 
print l
```

#### Scenario:

You are trying to break the code - to hack into the mainframe, and stop the KGB from launching US missiles remotely.

You know the password is some 5 letter anagram of a subset of the word "pochemuchka"

How many words potential passwords are there? ie, how large is the **set** of password options?

What's the probability that the password starts with p?

**Combination:**
    - ordering does not matter
    - how many different selections can you get out of number of elements?
    - possible number of selections $r$ out of a total of $n$ elements given by :$n! /((r !) (n – r)!)$
    - Example

#### Scenario A
- Combinatorics in specific scenario
    - What is the probability of getting exactly 3 heads out of 5 fair coins? 
    - What is the probability of getting at least 3 heads out of 5 tosses?
    

#### Scenario B
if you used combinatorics rather than permutations to figure out the kgb password - how many passwords would you potentially miss?

### Assessment:
- Message the Coaches and myself together:
  - how is probability used?
  - why do we care about sets?
  - How is the union of two sets defined?

In [27]:
from itertools import combinations
l=list(combinations("pochemuchka",5))
len(l)

462

In [28]:
len(set(l))

452

Probability is used to determine the chance that a selected/desired outcome will occur given a set total of possible outcomes.  We care about sets because they allow us to work with probabilities, permutations, combincations, and other statistical methods in python as a low level object.  The union of two sets is defined as the total number of values in the two sets without repeats.