# Probability Review


Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty. 

##  Random Variables

A random variable is a variable that can take on diﬀerent values randomly. We typically denote the random variable itself with a lowercase letter in plain typeface, and the values it can take on with lowercase script letters. For example, x1 and x2 are both possible values that the random variable x can take on. For vector-valued variables, we would write the random variable as x and one of its values as x. On its own, a random variable is just a description of the states that are possible; it must be coupled with a probability distribution that speciﬁes how likely each of these states are. 

Random variables may be discrete or continuous. A discrete random variable is one that has a ﬁnite or countably infinite number of states. Note that these states are not necessarily the integers; they can also just be named states thatare not considered to have any numerical value. A continuous random variable isassociated with a real value.

## Probability Distributions

A probability distributionis a description of how likely a random variable or set of random variables is to take on each of its possible states. The way we describe probability distributions depends on whether the variables are discrete or continuous.

![image.png](attachment:image.png)
 <h5><center>Discrete vs Continuous</center></h5>                                            

### Discrete Variables and Probability Mass Functions

A probability distribution over discrete variables can be explained using a probability mass function (PMF). We typically express probability mass functions with a capital P. Usually, we associate each random variable with a different probability mass function, and the reader must determine which PMF to use based on the identity of the random variable instead of the name of the function; P (x) is generally not the same as P (y).

### Continuous Variables and Probability Density Functions

When working with continuous random variables, we describe probability distri-butions using a probability density function (PDF) rather than a probability mass function. To be a probability density function, a function p must satisfy the following properties:

1. The domain of p must be the set of all possible states of x
2. ∀x ∈ x, p(x) ≥ 0. Note that we do not require $p(x)\leq1$
3. $\int_a^bp(x)dx = 1$

## Marginal & Conditional Prob.

### Marginal Probability

The probability of one event in the presence of all (or a subset of) outcomes of the other random variable is called the *marginal probability* or the *marginal distribution*. The marginal probability of one random variable in the presence of additional random variables is referred to as the marginal probability distribution.

It is called the marginal probability because if all outcomes and probabilities for the two variables were laid out together in a table (X as columns, Y as rows), then the marginal probability of one variable (X) would be the sum of probabilities for the other variable (Y rows) on the margin of the table.

 <h2><center>P(X=A) = sum P(X=A, Y=yi) for all y</center></h2>   

### Conditional Probability

The probability of one event given the occurrence of another event is called the *conditional probability*. The conditional probability of one to one or more random variables is referred to as the conditional probability distribution.

 <h2><center>P(A given B)</center></h2>   

## Independence

If one variable is not dependent on a second variable, this is called *independence* or *statistical independence*. For example, we may be interested in the joint probability of independent events A and B, which is the same as the probability of A and the probability of B.

Probabilities are combined using multiplication, therefore the joint probability of independent events is calculated as the probability of event A multiplied by the probability of event B

<h2><center>P(A and B) = P(A) * P(B)</center></h2>   

## Expectation, Variance, Covariance

### Expectation

The expectation, or expected value, of some function f(x) with respect to a probability distribution P(x) is the average, or mean value, that f takes on when x is drawn from P. For discrete variables this can be computed with a summation:

<h1><center>$\mathrm{E[x]}=\sum_{n=a}^{b}P(x)f(x)$</center></h1>

while for continuous variables, it is computed with an integral:

<h1><center>$\mathrm{E[x]}=\int_a^bP(x)f(x)dx$</center></h1>


### Variance

The variance gives a measure of how much the values of a function of a random variable x vary as we sample diﬀerent values of x from its probability distribution:

<h1><center>$\mathrm{Var[x]}= \mathrm{E[(x-\mathrm{E[x]})^2]}$</center></h1>

When the variance is low, the values of f(x) cluster near their expected value. The Square root of the variance is known as the *standard deviation*.

### Covariance

In probability, covariance is the measure of the joint probability for two random variables. It describes how the two variables change together.

It is denoted as the function cov(X, Y), where X and Y are the two random variables being considered.

<h1><center>$\mathrm{Cov[x,y]}=\mathrm{E[(x-\mathrm{E[x]})(y-\mathrm{E[y]})]}$</center></h1>

## Bayes' Rule 

In probability theory and statistics, Bayes's theorem (alternatively Bayes's law or Bayes's rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes's theorem allows the risk to an individual of a known age to be assessed more accurately than simply assuming that the individual is typical of the population as a whole.

We often ﬁnd ourselves in a situation where we know P(y | x) and need to know P(x | y). Fortunately, if we also know P(x), we can compute the desired quantity using Bayes’ rule:

<h1><center>$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$</center></h1>
