# **Concentration Inequalities** #

### What are **Concentration Inequalities?** ###
* They are Inequalities which tell us how far a random variable $X$ is likely to be away from its expectation $E[X]$

### **Why are they useful?** ###
* Expectations are easy to compute - so if $X$ is close to $E[X]$, we have a lot of information about $X$
* Allow us to quantify how tightly a random variable concentrates around its mean, or expected value 


#### **How are concentration Inequalities different from Variance?** ####
* Variance is a measure of the *spread* or *dispersion* of a random variable around its mean, providing information about how much individual observations vary from the expected value 

* Concentration Inequalities provide bounds on the probability a random variable deviates from its mean by a fixed amount
    * They essentially **quantify** a probabilistic guarantee on how tightly a random variables concentrates around its mean






### **Markov's Inequality** ###

Consider the scenario: 
* We've defined a non-negative random variable $X$ $(X$ always $\geq 0)$
* The expectation of $X$ is $E[X]$

What can you tell me about the $P[X \geq c]$, or what is the upper bound on the probability of $X$ being greater than $c$?

By **Markov's Inequality**, we know that the **max** probability of observing $X \geq c$ is:

$$P[X \geq c] \leq \frac{1}{c} \cdot E[X]$$


**NOTE*** this is only for *Non-negative* random variables 

The intuition for this is that $E[X]$ must be **as large as** the $E[X \geq c]$, which is just $c \cdot P[X \geq c]$. Swapping around values gives us Markov's Inequality

<br>


### **Chebyshev's Inequality** ###

**Chebyshev's Inequality** takes into account the *variance* of our random variable, thus allowing use to get more accurate cutoffs than Markov's Inequality 

* For *all* random variables


#### Some terms in **Chebyshev's Inequality** ####
* $X$ is a random variable 
* $E[X]$ is the finite mean of $X$
* $\sigma ^2$ is the finite variance of $X$
* $k$ is a positive constant representing the number of standard deviations from the mean
* $|X - E[X]|$ represents the absolute deviation of $X$ from its mean

The probability that $X$ deviates from its mean by *more than* $k$ standard deviations is bounded by: 

$$P[|X - E[X]| \geq k\sigma(X)] \leq \frac{1}{k^2}$$

In other words, we can **quantify** the probability that $X$ deviates from its mean by more than $k$ standard deviations (to the left and to the right)

<br>

Let's dive into Chebyshev's that's a *little* more specific: 

For *any* positive constant $c$, the probability that $X$ deviates from its mean by more than $c$ units is bounded by 

$$P[|X - E[X]| \geq c] \leq \frac{\text{Var}(X)}{c^2}$$

* This form is useful in that it allows us to **quantify** the probability that $X$ deviates from its mean by more than $c$ units (to the left and to the right)

##### **NOTES** ##### 
* This is can be applied to random variables that are $\geq 0 \text{ and} \leq 0$
* Gives a two-sided bound (above and below $E[X]$) 
<br>

### **Statistical Estimation** ### 

#### **Key Question** ####
* How large does $N$ have to be in order to ensure accuracy $\pm 1 \% \text{ and confidence } z\%$


#### **Definitions** ####

We define $\delta$ to be $1 - z$ where $z$ is our confidence level 
* So if we have $95\%$ confidence, our $\delta$ would be $1 - 0.95 = 0.05$

We define our **accuracy** as $\pm \epsilon \mu$
* $\epsilon$ represents margin of error as a *percentage* of the mean



<br>

Let $X_1$, $X_2$, ..., $X_N$ be independent and identically distributed random variables with expectation $E[X] = \mu$, variance $\text{Var}(X_i) = \sigma^2$

Estimate of $\mu$ is $\hat{\mu} = \frac{1}{N} (X_1 + ... + X_N)$
* You can think of this as a *sample mean*
* $Var(\hat{\mu}$) $= \frac{\sigma^2}{N}$

If we want an accuracy of $\pm \epsilon\mu$ and confidence $1 - \delta$: 

$$ P[|\hat{\mu} - \mu | \geq \epsilon \cdot \mu] \leq \frac{Var(\hat{\mu})}{\epsilon^2 \mu^2} \leq \delta $$

So to ensure confidence of $1 - \delta$ we need

$$N \geq \frac{\sigma ^2}{\mu ^2} \cdot \frac{1}{\epsilon ^2 \delta}$$

### **Law of Large Numbers** ###

Theorem: Let $X_1$, $X_2$, ..., $X_N$ be independent and identically distributed random variables with expectation $E[X_i] = \mu$, Then 

$$\text{Sample Mean} = \frac{1}{n} \sum_{i=1}^{N} X_i$$

satisfies 

$$P[ |\text{Sample Mean} - \mu| \geq \epsilon] \rightarrow 0 \text{ as } N \rightarrow \infty$$

for any $\epsilon > 0$

<br>

#### **In English:** #### 
* we can ahieve any desired accuracy $\epsilon > 0$ by any desired confidence $ 1 - \delta < 1$ by taking the sample size $N$ large enough