# Probability and combinatorics

> "... probability tells us how often something is likely to occur when an experiment is repeated..." (Sarah Boslaugh, Statistics in a Nutshell)

Probability theory describes what properties our sample should have, given the properties of the underlying population. It is a purely theoretical discipline, telling us how likely an event is to happen and does not require data. 

There are two related terms:
- **Probability** is the proportion of times an event would occur in infinite repetitions; quantifying predictions of events yet to happen (the future); 
- **Likelihood** is measuring the frequency of events that already occurred (the past);

---

- Relative frequency / empirical probability / experimental probability - probability based on past experience, in a number of non-infinite events; empirical probability estimates probabilities from experience and observation.
  - Relative frequency: $\frac{3}{9}, \frac{6}{9}$
  - Empirical probability: $0.333, 0.666$
- Probability (classic definition) - limit of relative frequency as the number of events approaches infinity; probability for infinite number of events;

---

Some important terms:
| Term | Definition | Example |
| - | - | - |
| **Trial** (experiment) | An event whose outcome is unknown. | |
| **Event (E)** | An outcome of a trial.| An event that the sum of the two die is 11, $E = \{(5,6), (6,5)\}$ |
| **Sample space** (S, set) | A set of all possible outcomes of a trial. | For a roll of a six-sided die, $S = \{1,2,3,4,5,6 \}$ |

Probability of an event is the number of desirable events divided by the total number of events in the sample space:
$$ P(E) = \frac{n(E)}{n(S)} $$


**Theoretical probability** is like what we expect to see in flipping a coin. E.g. probability of landing a coin on Heads is $P(H) = 0.5$; if we have a 6-sided die, the probability of landing a score that equals to 3 or more is $P(\ge3) = 4/6 = 2/3$

**Experimental probability** is an estimate we make based on previous experience. E.g. if we played 16 games in the past, and we make a histogram with number of points and count for each bin, we can later make a prediction for the 17th game based on past data, for example, the probability of obtaining a score that is more than a certain number.

In probability, **The Law of Large Numbers** states that experimental probability gets closer to the theoretical probability with a large number of experiments. 

Types of probabilities:
- **Marginal probability**: probability of a single event $P(X)$
- **Joint probabilities**: probability of two or more events happening at the same time: $P(A \text{ AND } B)$

Example problems:

---

There is a promotion that states that each box of cereal in their line has 1 of 6 toys to collect. Estimate how many boxes, on average, it would take to get all 6 prizes. 
- A) Randomly generate digits 1-6 and calculate how many boxes it would take to collect all unique toys; 
- B) Repeat step A many times. 
- C) Mean of the distribution of boxes that it takes to collect the toys is an approximation that we need;  

# Sets

Datasets:
- $A = \{1,5,7,18,19\}$
- $B = \{1,7,18\}$
- $C = \{19,20\}$
- Count of items in a set: $|A| = 5$ (*writes like a modulus*)
- Intersection ("and"): $A \cap B = \{1,7,18\}$
- Union ("or"): $A \cup B = \{1,5,7,18,19\}$
- Difference: $A \setminus B = A - B = \{5,19\}$. *Subtract B from A; relative complement of B in A; not in B but in A*
- $A \setminus A = A - A = {}$. *Empty / null set , with no objects in it*
- Mermbership: $1 \in A$ (1 is a member of A)
- $2 \notin A$
- Subset - every member of your set also belongs to another set: $B \subseteq A$
- Every set is its own subset: $A \subseteq A$;
- Strict subset - all objects in B belong to A, but not vice versa - so they do not equal: $B \subsetneq A$
- Every set is NEVER its own strict subset. 
- Superset - the opposite of subset: $A \supseteq B$
- Strict superset. 
- U - universal set: the set containing all objects or elements and of which all other sets are subsets. E.g. $U = A+B+C$
- Absolute complement: a set of all things in U (universal set) that aren't in C (a set called "C"). $C' = U - C = U \setminus C = \{1,5,7,18\} $
- 

## Venn diagram

There are two different types of diagram to show sets and their intersections - Venn Diagram and Euler Diagram. The difference is that, while a Venn diagram shows all possible intersections, with some being impossible (and thus having zero data points), Euler diagram only shows real-world intersections. 

<img src="Media/math/venn-euler.png">

**Venn Diagram** visualises sets and their relationships

<img src="Media/Venn.png">

# Odds

Odds are the number of times something occurred divided by the number of times it didn't occur. 

Odds are meaningful: odds of 2 mean that an event is two times more likely to happen than not to happen.

> Examples:
> 
> Odds that a team will win are 3 to 1 -> probability of winning = 3/(3+1) = 3/4
>
> The probability of obtaining 1 when we roll a die is $P = \cfrac{1}{6}$, but the odds are $\text{Odds} = \cfrac{1}{5}$
>
> If a horse wins 3 out of every 4 races, the probability of that horse winning a race is $P = \cfrac{3}{4}$, but odds are $\text{Odds} = \cfrac{3}{1} = 3$
>
> While probability is a number between 0 and 1, odds are a number between 0 and infinity

Odds and probability can be interchanged:
> $P(X) = 0.7, P(Y) = 0.3$
>
> Odds: $O(X) = \cfrac{7}{3}$
>
> $P(X) = \cfrac{O(X)}{1 + O(X)}$
>
> $O(X) = \cfrac{P(X)}{1 - P(X)}$



# Rules of probability

$\cup \text{ = OR} \\ 
\cap \text{ = AND}$

**Rule of Complementary Probabilities**: for an event $E$, the probability of a complement of the event ($E^{c}$) is 1 minus the probability of $E$:
$$ P(E^{c}) = 1 - P(E) $$

**Addition (sum) rule for probability**: the probability of two (mutually exclusive) events occurring (in an OR manner) equals to the sum of probability of each event minus the probability of both events occurring at the same time. IOW, P(A or B) = P(A) + P(B) - P(A and B):

$$ \text{ Sum rule of probability (general form): } P(A \cup B) = P(A \text{ OR } B) = P(A) + P(B) - p(A \cap B) $$

| Exclusivity of events | Explanation | Example | Venn diagram | AND | OR (the addition rule) |
| - | - | - | - | - | - |
| Not Mutually Exclusive (nonmutually exclusive) | A and B are **not mutually exclusive** if they intersect - have some degree of overlap. | On a d6 die, probability of rolling more than 1 or rolling odd ($P(>1 \text{ OR odd})$). <br> Probability of getting a heads (on a coin flip) OR a 6 (on a d6 die). | <img src="Media/not-mutually-exclusive-events.png"> | An intersection does exist. | $$ P(A \cup B) = P(A) + P(B) - p(A \cap B) $$ |
| Mutually Exclusive | A and B are **mutually exclusive** if the two events cannot occur at the same time. E.g. on a d6 die, P (even OR odd) $ = P(even) + P(odd) = 0.5 + 0.5 = 1 $ | Probability of getting a 4 or a 6 on a die roll. | <img src="Media/mutually-exclusive-events.png"> | No intersection: $P(A \cap B) = \{\}$ | The Sum rule for disjoint probabilities: $$ P(A \cup B) = P(A) + P(B) $$ |

**Multiplication / product rule**: the probability of event 1 AND event 2 occurring is the probability of event 1 and probability of event 2 given the event 1:
$$ \text{ Product rule (general form): } P(A\cap B) = P(A \text{ AND } B) = P(A) * P(B | A) $$

Examples of not mutually exclusive events:
- Out of a deck of playing cards, pulling out an ace (A) and pulling out a spades: $P(E_1 \cup E_2) = P(E_1) + P(E_2) - P(E_1 \cap E_2) = \frac{4}{52} + \frac{13}{52} - \frac{1}{52} = \frac{4}{13}$

Examples of mutually exclusive events:
- Out of a deck of playing cards, pulling out an ace OR a king: $P(E_1 \cup E_2) = P(E_1) + P(E_2) = \frac{1}{13} + \frac{1}{13} = \frac{2}{13}$

| Independence of events | Explanation | Example | The Multiplication / Product Rule | 
| - | - | - | - |
| Independent events | Events are independent if P of one event is not affected by the occurrence of the other event. IOW, if two events A and B are independent (occurrence of one doesn't influence in any way the occurrence of the other one), then the probability of both happening (the intersection of events) is the product of the probabilities of each of the events. | E.g. flipping a coin and getting H/T every trial. P of getting T on the first trial doesn't affect the subsequent probabilities. | In independent events, $P(B\|A)=P(B)$, therefore, *Product rule of independent events*: $$P(A\cap B) = P(A)*P(B)$$ | 
| Dependent (conditional) events | Events are conditional (Bayes) when the two events are correlated | e.g. consequently taking out a particular colour of marble out of a bag full of different coloured marbles. | *Product rule of dependent events*: $$p(A\cap B) = P(A) * p(B\| A)$$ |

Examples of independent events:
- Events of "fifth coin flip lands on heads" and "sixth coin flip lands on tails": $P(E_{1} \cap E_{2}) = P(E_1) P(E_2)$
- $P(\text{A is alive in 20 years}) = 0.7, P(\text{B is alive in 20 years}) = 0.5; \text{Probability that both are alive in 20 years: } P(A \cap B) = 0.7*0.5 = 0.35 $

<img src="Media/independent-events.png" width=900>


## Examples

"""txt
Example 1
What is the probability of getting three sixes on three consecutive rolls of a six-sided die?

P = (1/6)^3 = 1/216
"""

What is the probability of rolling 6 on a d6 die OR flipping heads on a coin? 

$P(heads \text{ OR } 6) = 1/2 + 1/6 - (1/2 * 1/6) = 0.58333$

# Conditional Probability

*Conditional probability* - the probability of an event A given that another event B has occurred:

$$ p(A|B) = \frac{p(A\cap B)}{p(B)} $$

In independent events, $ p(A|B) = \frac{p(A\cap B)}{p(B)} = \frac{p(A)p(B)}{p(B)} = p(A) $

**The Bayes theorem** reverses the direction of the dependencies (this formula is useful for flipping probabilities):
$$ p(A|B) = \frac{p(B|A) p(A)}{p(B)} = \frac{p(B|A) p(A)}{ p(B|A)p(A) + p(B|A_{c})p(A_{c}) } $$

Bayes' Theorem allows revising the predicted probability of an event based on new information. Calculates conditional probability of an event. 

In the formula above, 
- $p(A)$ - the *prior probability* of $A$ (probability of an event before introduction of the new data);
- $p(A|B)$, $p(B|A)$ - conditinal probabilities;
- $P(A|B)$ - the *posterior probability* of $A$ given $B$ (the updated probability based on new information). 

IOW, :
- $p(A)$ - the prior probability; 
- $p(B)$ - event. 
- $p(A|B)$ - the posterior probability of A given the new information B.


*Rule of complementary probabilities* - for an event E, its complement ($E_{c}$) has the probability of 1 minus p of E: $p(E_{c}) = 1 - p(E)$


<img src="Media/Bayes.png" width="400px">

$ p(B\cap A) = p(B|A) * p(A) = 0.3*0.2 = 0.06 $

$ p(B|A) = 0.3 $

s

$ p(A|B) = \frac{p(B|A)p(A)}{p(B)} = \frac{p(B|A) p(A)}{p(B|A) p(A) + p(B|A_{c}) + p(B|A_{c}) + p(A_{c})} = \frac{0.3*0.2}{0.3*0.2 + 0.05*0.8} = 0.6$

---

<u>Example problems for conditional probability</u>:

There are 8 coins in a bag, 3 of which are unfair (60% chance of H) and the rest are fair. If you randomly choose one coin from the bag and flip it 2 times, what is the probability of getting 2H?
- Upon drawing a coin, we have $p=5/8$ of drawing a fair coin and $p=3/8$ of drawing an unfair coin. 
- $P(Fair \cap HH) = P(Fair) * P(HH | Fair) = 5/8 * 0.5^{2} = 5/8 * 0.25 = 0.15625$
- $P(Unfair \cap HH) = P(Unfair) * P(HH | Unfair) = 3/8 * 0.6^{2} = 3/8 * 0.36 = 0.135$
- $P(HH) = 0.15625 + 0.135 = 0.29125 = 29.125$%

<u>Example</u>

A disease test is 99% accurate. Your friend has been tested positive with this test. If this disease affects 1 out of 10,000 people on average, what is the probability that your friend has the disease, given that he tested positive?
- IOW, the question is "We have two possibilities - friend is sick or healthy. If the test is positive, what is the probability that the friend is sick as opposed to healthy?"
- Let's say that $n = 1,000,000$ is our sample;
- Disease prevalence is $\cfrac{1}{10,000} = 0.0001$ 
- From the prevalence of the disease, for $n$ people $0.0001 * n = 100$ have the disease and $0.9999 * n = 999,900$ don't;
- If our test if 99% accurate for both classes, then
  - for the diseased people $100 * 0.99 = 99$ are tested positive (correctly) and $100 * 0.01 = 1$ is tested negatively (incorrectly)
  - for the healthy people $999,900 * 0.99 = 989,901$ are tested negatively (correctly) and $999,900 * 0.01 = 9,999$ are tested positively (incorrectly)
  - $P = \cfrac{99}{99 + 9,999} = 0.0098$, which is 0.98% 

<u>Example</u>

There are 100 emails, 20 of which are spam and 80 - ham. The number of emails containing the word 'sale' is 6 for spam emails and 4 for ham. 

```text
                'sale' (6/20)
                /
      spam (0.2) 
    /          \
  /             no word 'sale' (14/20)
/
\
  \               'sale' (4/80)
    \           /
      ham (0.8)
                \
                  no word 'sale' (76/80)
```

- $P('sale'|spam) = \cfrac{P('sale' \cap spam)}{P(spam)} = \cfrac{0.2 * 6/20}{0.2} = 6/20$
- $P('sale'|ham) = \cfrac{P('sale' \cap ham)}{P(ham)} = \cfrac{0.8 * 4/80}{0.8} = 4/80$
- $P(spam|'sale') = \cfrac{P(spam \cap 'sale')}{P('sale')} = \cfrac{0.2*6/20}{0.2*6/20 + 0.8*4/80} = 0.6$

# Expected value

In essence, expected value means "after a very large number of turns, the average outcome per one action will be Expected value".

Expected value of a discrete random variable.

If we define $X$ as a random discrete variable that can take on values $X_1, X_2, ..., X_K$ with the respective probabilities $p_1, p_2, ..., p_K$, where $p_1, p_2, ..., p_K = 1$, the expected value of $X$ is denoted as $E(X)$ and is calculated as follows:
$$E(X) = p_1 X_1 + p_2 X_2 + ... + p_K X_K = \sum pX$$

---

$X$ - number of workouts in a week. Below is the probability distribution of this variable:

| $X$ | $P(X)$ |
| - | - |
| 0 | 0.1 |
| 1 | 0.15 |
| 2 | 0.4 |
| 3 | 0.25 |
| 4 | 0.1 |

In this case (discrete random variable), it's also equal to the weighted mean (weighted sum):

$E(X) = 0*0.1 + 1*0.15 + 2*0.4 + 3*0.25 + 4*0.1 = 2.1 $

---

Another example - betting:

| | Win | Lose |
| - | - | - |
| X (net gain) | $35 | -$1 |
| P(X) | 1/38 | 37/38 |

Expected value of a player's net gain on a $1 bet on a single slot: $E(X) = 35 * \frac{1}{38} + (-1) * \frac{37}{38} = -0.053$ dollars. 

We could also interpret it as the following: if we look at many bets, the average return would be about -$0.053 dollars per ticket. 

---

Example: lottery ticket expected payoff.

There is a lottery with 10,000 possible selections. The lottery pays $4500 on a $1 bet that all 4 digits of a selection match the lottery result. Calculate the expected net gain $E(X)$ on an X (a player's net gain on a $1 straight bet).
- If he wins (probability 1/10000), he net gains $4499;
- If he loses (probability 9999/10000), he net gains -$1;
- $E(X) = 4499 * \frac{1}{10000} + (-1) * \frac{9999}{10000} = -0.55$
- So if we play 10000 times, we pay $10000 and expect to win $4500 - net gain of -$5500

---

Example: insurance expected payoff.

An electronics store gives customers the option of purchasing a protection plan when customers buy a new refrigerator. The customer pays $125 for the plan, and if their refrigerator is damaged or stops working, the store will replace it for no additional charge. The store knows that 3% of customers who buy this plan end up needing a replacement that costs the store $1500 each. Calculate the expected net gain E(X) from one of these plans. 
- Replacement, probability = 0.03, net gain = -$1375; 
- No replacement, probability = 0.97, net gain = $125;
- $E(X) = 0.03*(-1375) + 0.97*(125) = 80$ dollars

# Pascal's triangle

Pascal's triangle essentially reflects the number of repetitions of unique groups of permutations of binary event repeated N times, where N - the number of row. 

<img src="Media/math/pascal-triangle.png">

# Combinatorics

Permutations: order matters.

Combinations: order doesn't matter. 

| Type | Repetitions | Formula | Explanation | Example | 
| - | - | - | - | - |
| Permutations | With repetitions | $$n^{r}$$, where $n$ is the number of things to choose from, and we choose $r$ of them | | how many permutations are there to make 4-digit code, where in each allele there can be a digit from 0-9? |
| Permutations | No repetitions | $$ _{n} P _{r} = \frac{n!}{(n-r)!}$$ | Let's consider the following problem: In how many ways can 6 students sit in 3 chairs, order matters? Here, on the first chair we can sit one of the 6 students, therefore, on the second chair we will be able to sit only one of the 5 students (as the first one is occupied) and the third chair - by one of the 4 students. Therefore, there are $P=6*5*4=120$ permutations available. | In how many ways can 5 people sit in 3 chairs? (=60); how many ways can 5 people sit on 5 chairs? (=5!) How many order-wary sequences of 4 people can be chosen from a team of 6 people (=360); How many unique ways are there to arrange 5 runners in 5 lanes? (=5!=120). |
| Combinations | With repetitions | $$\frac{(r + n - 1)!}{r! (n-1)!} \Leftrightarrow {r + n - 1 \choose r}$$, where n is the number of things to choose from, and we choose r of them, repetition allowed, order doesn't matter | ... | ... |
| Combinations | No repetitions | $$  _{n} C _{r} = \frac{n!}{r!(n-r)!} \Leftrightarrow {n \choose r} $$ | <u>(n choose r)</u> Same example, but now we do NOT care about the order. This means that we take the total number of permutations - 120 - and divide it by the number of ways our subgroup can be arranged (as we only care about one of all possible ways to arrange the subgroup), which is 6: $C=120/6=20$ | In how many different ways could 23 children sit on 23 chairs in a match class? (=23!); How many different sets of 3 color bottles can be arranged from an available set of 8 paint color bottles? (56); How many groups of 4-flower bouquets can be combined from 7 available flowers? (=35); |

---

*Some examples*:

In how many ways can you arrange 4 reindeers in a line?
- 4 choose 4, order matters - combinations, no repetitions, so $C = \frac{4!}{(4-4)!} = 4! $

You need to put your reindeer, Lancer, Gloopin, Rudy, and Bloopin, in a single-file line to pull your sleigh. However, Gloopin and Rudy are best friends, so you have to put them next to each other, or they won't fly.
- We can count the number of arrangements where Gloopin and Rudy are together by treating them as one double-reindeer. Now we can use the same idea as before to come up with $C = 3*2*1 = 6$ different arrangements, times 2 to account for the two ways to arrange the "double-reindeer": $C=6*2=12$

You need to put your reindeer, Quentin, Jebediah, Lancer, and Gloopin, in a single-file line to pull your sleigh. However, Gloopin and Lancer are fighting, so you have to keep them apart, or they won't fly.
- Total number of permutations, no repetition: $P_{total} = \frac{4!}{(4-4)!} = 4! = 24$
- Total number of permutations where the two reindeers are together - we calculate this by counting them as one "double-reindeer": $P_{double} = 3*2*1*2 = 12$
- Therefore, final answer = $24-12 = 12$

---

How many numbers between 1-100 (inclusive) are divisible by 3 or 10?
- Amount of numbers divisible by 3: $\frac{100}{3} \approx 33$; 
- Amount of numbers divisible by 10: $\frac{100}{10} = 10$;
- Number that is divisible by both 3 and 10 is $30$, therefore, amount of numbers divisible by 30: $\frac{100}{30} \approx 3$
- Subsequently, from the total number of divisible numbers (33+10) we need to subtract this number:
- $Answer = 33+10-3 = 40$

How many numbers between 1-100 (inclusive) are divisible by 3 or 2?
- Divisible by 3: $\frac{100}{3} \approx 33$
- Divisible by 2: $\frac{100}{2} = 50$
- Divisible by both is 6: $\frac{100}{6} \approx 16$
- Answer = $33+50-16 = 67$

How many numbers between 1 and 100 (inclusive) are divisible by 5 or 8?
- Divisible by 5: $100/5 = 20$
- Divisible by 8: $100/8 \approx 12$
- Common factor of both is $5*8=40$; subsequently, divisible by 40: $100/40 \approx 2$
- Answer = $20 + 12 - 2 = 30$

How many numbers between 1-100 (inclusive) are divisible by 10 or 7?
- Answer = $10+14 - 1 = 23$

---

How many unique ways are there to arrange the letters in the word PRETTY?
- Total number of permutations without repetition: $P = \frac{6!}{(6-6)!} = 6! = 720$
- Nevertheless, there are repeting sequences within this space of 720 sequences, as two letters in our word are the same;
- Therefore, we divide our answer by $2!$
- Answer = $\frac{6!}{2!} = \frac{720}{2} = 360$

How many unique ways are there to arrange the letters in the word ERROR?
- Answer = $5!/3! = 120/6 = 20$


# Probability with permutations and combinations

We have 8 coin flips. What's the probability of having exactly 3 coins land as Heads?
- $p(\frac{3}{8} H) = ?$
- Total number of events = $2^8 = 256$
- Combinations 8 choose 3 = $\frac{8!}{3! (8-3)!} = 56$
- $p(\frac{3}{8} H) = \frac{56}{256} = 0.219$

Probability of making exactly 3 out of 5 freethrows? $p(FT) = 80%$
- Total number of combinations in which we have 3 out of 5 freethrows: $C = \frac{5!}{3! *2!} = 10$
- Probability of one sequences with 3 out of 5 freethrows: $p = 0.8^{3} 0.2^{2}$
- Probability of all possible sequences with 3/5 freethrows: $p(3/5) = 10 * 0.8^3 * 0.2^2 = 20.48%$

Probability of making at least 3 out of 5 freethrows (3/5 or more)? $p(FT) = 80%$
- 5 choose 3 = 10;
- 5 choose 4 = $\frac{5!}{4!*1!} = 5$
- $p(>= 3/5) = 10*0.8^{3}*0.2^{2} + 5*0.8^{4}*0.2^{1} + 1*0.8^{5} = 0.942 $

---

Each card in a standard deck of 52 playing cards is unique and belongs to 1 of 4 suits: 13 cards are clubs, 13 are diamonds, 13 are hearts, and 13 are spades. Suppose that Luisa randomly draws 4 cards without replacement. What is the probability that Luisa gets 2 diamonds and 2 hearts (in any order)?
- Any order - combinations; 
- Combinations (2 diamonds) = $_{13}C_{2}$
- Combinations (2 hearts) = $_{13}C_{2}$
- Total number of 4-card combinations = $_{52}C_{4}$
- Probability: $p = \frac{(_{13}C_{2}) (_{13}C_{2})}{_{52}C_{4}}$

Declan's friend Luka claims that he can read minds. To test Luka's abilities, Declan draws 5 cards without replacement from a standard deck of 52 playing cards. Declan then asks Luka to identify in any order which 5 cards he drew without looking. Assume that Luka has no special abilities and is randomly guessing the cards.
What is the probability that Luka correctly identifies all 5 cards in any order?
- There is only 1 correct set available to make from the 52 cards;
- Any order, so combinations; 
- Total number of combinations of cards = $_{52}C_{5}$
- Therefore, $p = \frac{1}{_{52}C_{5}}$

---

A club of 9 people wants to choose a board of three officers: President, VP, and Secretary. Assuming the officers are chosen at random, what is the probability that the people chosen for the roles are Marsha for President, Sabita for VP, and Robert for Secretary?
- $p = 1/9 * 1/8 * 1/7 = 1/504$

Nia is 1 of 24 students in a class. Every month, the teacher randomly selects 4 students from their class to act as president, vp, secretary, and treasurer. Students cannot hold different positions at once. What's the probability that Nia is chosen as president in a given month?
- Order matters - permutations; 
- Total number of 4-student arrangements: $_{24} P _{4}$
- All the arrangements that include Nia as the president, which is equivalent to how many arrangements of 3 students are possible from Nia's 23 classmates: $_{23} P _{3}$ 
- Answer = $\frac{_{23} P_{3}}{_{24} P_{4}}$

---

What is the probability of guessing a 4-digit passcode consisting of non-repeating digits (0-9)?
- Answer = $\frac{1}{10*9*8*7} = \frac{1}{_{10} P _{4}}$


# Random variable

Random variable is a variable whose value is unknown; a function that assigns values to each of the experiment's outcomes. 

Random variables are different from standard variables in that, while the former can take on a many numbers, the latter is clearly-defined.  

E.g. a random variable X representing an outcome of a coin flip:

$$ X = \begin{cases} 1,&Heads \\ 0,&Tails \end{cases} $$

Random variables' varieties:
- Discrete: can take on a distinct set of values; e.g. outcomes of a coin flip; 
- Continuous: can take on any value in an interval; e.g. the mass of a random animal; 

Example: let's define a discrete random variable $X$ - number of heads after 3 flips of a fair coin. We can plot a probability distribution for this random variable. 

E.g.:

| $X$ | $P(X)$ |
| - | - |
| 0 | 0.1 |
| 1 | 0.15 |
| 2 | 0.4 |
| 3 | 0.25 |
| 4 | 0.1 |

Expected value: $E(X) = 0*0.1 + 1*0.15 + 2*0.4 + 3*0.25 + 4*0.1 = 2.1$

Variance of the random variable X: $Var(X) = 0.1*(0-2.1)^{2} + 0.15*(1-2.1)^{2} + 0.4*(2-2.1)^{2} + 0.25*(3-2.1)^{2} + 0.1*(4-2.1)^{2} = 1.19 $

Standard deviation of X: $ \sigma(X) = \sqrt{Var(X)} = 1.09 $

Transforming random variable: let's say we have a normally-distributed random variable $X$, and $Y$ is also a random variable produced from $X$:
- Shift: $Y = X + k$
- $\mu_{Y} = \mu_{X} + k$
- $\sigma_{Y} = \sigma_{X}$
- Scale: $Y = X*k$
- $\mu_{Y} = k * \mu_{X}$
- $\sigma_{Y} = k* \sigma_{X}$


## Bernoulli random variable

$Y = \begin{cases} 1, & \text{pick a yellow ball} \\ 0, & \text{pick a ball that is not yellow} \end{cases}$ 

Expected value or mean: $E(Y) = \mu_{Y} = p = 0.6$

Standard deviation: $\sigma_{Y} = \sqrt{p(1-p)}$

## Binomial random variable

*How many successes in a SPECIFIED number of trials?* A sequence of Bernoulli random variables, in a sense. 

E.g. if <u>Bernoulli random variable</u> $Y = \begin{cases} 1, & \text{pick a yellow ball} \\ 0, & \text{pick a ball that is not yellow} \end{cases}$ , then a <u>binomial random variable</u> is $X$ - sum of 10 independent trials of $Y$.

Conditions for a binomial variable:
- Made up of independent trials; 
- Each trial can be classified as either success or a failure; 
- Fixed number of trials; 
- Probability of success on each trial is constant - trials are independent in their probability; 
- all the trials in your sample constitute less than 10% of the population;

> Examples of binomial random variables:
> - $X$ - number of heads after 10 flip of a coin. 
> - $L$ - the number of house tours that result in a sale in a sample of 30 tours. 
> - $T$ - the number of plants that live; we transplant plants from a bigger garden, and each plant has a 60% chance to survive upon transplantation; 

Expected value or mean ($n$ = number of trials, $p$ - probability of success):

$$ E(X) = \mu_{X} = np $$

Standard deviation:

$$ \sigma_{X} = \sqrt{ n p (1-p) } $$

> Example:
> - If $X$ - number of successes after $n=10$ trials, where for each trial $P(success) = 0.3$; then, $E(X=10) = 10 * 0.3 = 3$ 
> - If $X$ - number of correctly guessed questions (randomly) of 20 questions with 5 options (1 correct option), then $E(X) = 20*0.2 = 4$ and $\sigma_{X} = \sqrt{ 20 * 0.2 * 0.8} = 1.8 $
> - If $X$ - number the blue candies that a random customer gets in a purchase ($P(blue) = 0.3$, $n=15$ candies), then $E(X) = 15*0.3 = 4.5$ and $\sigma_{X} = \sqrt{ 15*0.3*0.7 } = 1.775 $



## Geometric random variable

*How many trials until success?*

Conditions:
- Binary trial outcome - success or failure;
- Trial results independent;
- Same probability on each trial; 
- The conditions are above are the same as for the binomial variable, however, <u>number of trials is unlimited</u>

> Examples:
>
> A person makes 25% of his freethrows. $M$ - the number of shots it takes him to successfully make his first three-point shot. 
> Probability that Jeremiah's first successful shot occurs on his 3rd attempt: $P(M=3) = 0.75 * 0.75 * 0.25 = 0.14$

> Example - cumulative geometric probability (greater than)
>
> 10% of the shoes sold at a store are defective. $C$ - the number of shoes sold until a defective pair is sold. Find the probability that it takes more than 4 pairs to sell a defective pair. 
>
> $ P(C>4)$ = P(first 4 pairs sold are not defective) = $0.9^{4} = 0.66$

> Example - cumulative geometric probability (less than)
>
> 10% of the shoes sold at a store are defective. $C$ - the number of shoes sold until a defective pair is sold. Find the probability that it takes fewer than 5 orders to get the first defective pair. 
>
> $ P(C<5) = P(C=1) + P(C=2) + P(C=3) + P(C=4) = 0.1 + 0.9*0.1 + 0.9^{2}*0.1 + 0.9^{3}*0.1 = 0.34 $

# Stochastic / random processes / models

A stochastic or random process is a mathematical object usually defined as a sequence of random variables, where the index of the sequence has the interpretation of time.


## Markov chains

**Markov Property**: It states that the future behavior of a process depends only on its current state, regardless of how it reached that state. This characteristic makes Markov property different from other processes that may depend on the entire history of past events.

**Property 2**: the sum of probabilities of all outgoing arrows from any state equals to 1. 

We can simulate $n \rightarrow \inf$ steps of Random Walks, which will lead to a probability distribution (for probability of each state, or how often it was observed) called the Stationary Distribution (the equilibrium state), which will not change anymore with time.

<img src="Media/math/markov-chains.png">

## Monte Carlo simulations