# Probability and Statistics



## Combinatory


* Permutation: From $n$ elements, number of ordered elements (list) it can be created: $n!$
  * Variation: Is a special case of permutation where we just pick $r$ elements $\frac{n!}{(n-r)!}$ (note that this is the formula of permutation but discarding $(n-r)!$, i.e. permutation of elements that are not picked)

* Combination: From $n$ elements, pick $r$ unordered (set) elements: $\frac{n!}{r!(n-r)!}$ (note that this is the formula of variation but discarding $r!$, i.e. permutation of the picked elements so we ignore the order))



For more info check a summary with the formulas in a table [here](http://users.telenet.be/vdmoortel/dirk/Maths/PermVarComb.html)







## Probability

* Probability of A or B
$$
P(A \cup B) = P(A) + P(B) - P(A \cap B)
$$

* Probability of A and B if they are independent.
$$
P(A \cap B) =  P(A) * P(B)
$$

* Probability of A and B if they are dependent:
$$
P(A \cap B) =  P(A) * P(B|A)
$$

* Probability of B, knowing that A is true:
$$
P(B|A) = \frac{P(A \cap B)}{P(B)}
$$


## Descriptive statistics
### Univariate analysis

#### Measures of Central Tendency
Measures that represents the distribution.
* Mean: Computed as $\frac{\sum_{i=1}^{N} x_i}{N}$
* Median: The value that is in the middle,  after sorting them. This measure tend to be more robust when there are extreme values.
* Mode: The most frequent value.

#### Measures of Dispersion
Measures how far are data points from their mean:

* Variance: Average of the squared differences from the mean. Computed as $\frac{1}{n}\sum_{i=1}^{n} (x_i - \bar{x})^2$
* Standard Deviation: Computed as the square root of the variance.

#### Shape
* Asymmetry: Measures how skewed the distribution is. Positive/right-skewed distributions have longer right tail. In asymmetric distributions, the mean (and median) is closer to the tail than the mode.
* Kurtosis: Measures how long the tails of the distribution are. 

### Bivariate analysis
Measures causes and relationships.

Covariance measures how related two variables are. It can be measured as:

$$
cov(X,Y)=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})*(y_i-\bar{y})
$$
If $cov(X,Y)=0$ there is no relation between these variables. If it is positive there is direct dependency, otherwise inverse dependency.

Correlation is a normalized version in the range $[0,1]$ computed as:
$$
r=\frac{cov(X,Y)}{SD(X)SD(Y)}
$$





## Probability Distributions
### Discrete probabilities



#### Binomial distribution
A binomial distribution is a discrete probability distribution that measure the number of success of an experiment (having only 2 outputs: success or fail). We assume the experiments are independent. For example, if we consider the example "obtaining two aces when one draws 5 times", we should return the card back after each draw. The formulation of the binomial distribution is:

$$P(x)=\binom{n}{x}p^x(1-p)^{n-x}$$

The binomial distribution takes two parameters, $n$ and $p$. Therefore, it is often called as $B(n,p)$

The mean of a binomial is $\mu=np$, and variance $\sigma^2=npq$. When the sample is big enough (usually $n \geq 30$) the binomial is modeled like a Normal distribution as follows:

$$B(n,p)=N(np,\sqrt{npq})$$

#### Distribution of a Sample Proportion
In some cases, we want to know the proportion of success in a sample of size $n$. This distribution is similar to Binomial, but is divided by $n$, therefore:

$$N(\frac{np}{n},\frac{\sqrt{npq}}{n}) = N(p,\sqrt{\frac{pq}{n}})$$


#### Poisson
We use  Poisson distribution is used to obtain the probability of the numbers of successes based on the average number of successes (because we do not not the probability of a success). It is modeled as:

$$P(x,\lambda)=\frac{\lambda^x e^{-\lambda}}{x!}$$

where $\lambda$ is the average number of successes and $x$ the number of successes in a region/period of time.

>For example: "In the shop there are on average 10 clients per day, What is the probability that exactly 12 clients will come tomorrow?"
The answer is: $P(x=12,\lambda=10)=\frac{10^{12} e^{-10}}{12!}=0.948$ 


#### Multinomial Distribution 

The multinomial distribution are used to compute the probabilities when there are more than two possible outcomes (for example, the result of a match can be "win", "draw" or "lose").

If an event ocur $n$ times, what is the probability of getting: $n_1$ times the outcome 1, $n_2$ times the outcome 2, ...$n_k$ times the outcome $k$.


$$\frac{n!}{n_1!n_2!...n_k!}p_1^{n_1}p_k^{n_k}p_k^{n_k}$$

>For example (taken from [[onlinestatbook]](http://onlinestatbook.com/Online_Statistics_Education.pdf) ) "If played 12 games, what is the probability that Player A would win 7 games (win prob: 0.4), Player B would win 2 games (win prob:0.35), and the remaining 3 games would be drawn (drawn prob:0.25)?" Answer: $\frac{12!}{7!2!3!}0.4^7 0.35^{2} 0.25^{3}=0.0248$

#### Hypergeometric distribution
We use hypergeometric distribution when obtain samples without replacement.

$$P(x)=\frac{\binom{d}{x}\binom{N-d}{n-x}}{\binom{N}{n}}$$

* N: Number of total elements
* n: Number of elements I take
* d: Number of elements of the "type A"
* (N-d): Number of elements of the "type B"


The probability of taking x elements of type A.




### Continuous probabilities
#### Normal distribution

The normal distribution with mean $\mu$ and standard deviation $\sigma$ is:
$$
N(\mu,\sigma)=\frac{1}{\sigma \sqrt{2\pi } } e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} 
$$

In many cases the standardized normal distribution is used (i.e. $N(0,1)$). A variable X is standardized as: $Z=\frac{X-\mu}{\sigma}$




## Estimations

The main use of the field of statistics is to estimate parameters of a population by using the statistics of a sample (parameter refers to any value that describes something from the population such as mean, variance etc., while statistic is the value that describes the sample).

#### Sampling Distribution

A sampling distribution is when we take more than one (and independent) random sample of the same size from a given population. 

With a population with a mean of $\mu$ and standard deviation of σ. If we compare this to a sampling distribution obtained by forming simple random samples of size $n$, the sampling distribution of the mean will still have a mean of $\mu$, but the standard deviation for a sampling distribution becomes $\sigma/\sqrt{n}$. [[ThoughtCo]](https://www.thoughtco.com/what-is-a-sampling-distribution-3126417)

The central limit theorem states that the sampling distribution tends to be a normal distribution (the larger size of the sample $N$ is) with a mean of $\bar{x}=\mu$ and standard deviation of $\sigma_{sample}=\sigma \sqrt{n}$. 


* The distribution of a sample (size $n$) is 
$N(\mu,\sigma \sqrt{n})$
* The distribution of the sum of a sample (size $n$) is 
$N(n\mu,\sigma \sqrt{n})$

#### Degrees of freedom
An estimate of the variance based on a sample size of 100 is based on more information than an estimate of the variance based on a sample size of 5. The degrees of freedom of an estimate is the number of independent pieces of information on which the estimate is based.[[OnlineStatBook]](http://onlinestatbook.com/2/estimation/df.html)

If we sample two elements (independently!) to compute the mean, this estimate has two degrees of freedom because we used two independent pieces of information. However, if we compute the variance the they will not be completely independent anymore: in order to compute the variance it is necessary to compute the mean first. When computing the mean, the elements influence each other (if we had sampled another elements, the mean would be different)

The degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated for computing it. Therefore the estimate of variance has $2-1=1$ degree of freedom (2 pieces of information minus 1 parameter estimated, the mean, for the computation of the variance).


#### Bias
Bias: A statistic is biased if, on average, of the statistic of the sample is different from the parameter (of the population). The mean of a sample is an unbiased estimate because there is no tendency for overestimate/underestimate.In the case of estimating variance it is computed as $s^2=\frac{\sum (x-\bar{x}) }{N-1}$ where $(N-1)$ is the degrees of freedom.






## Confident interval


A Confidence interval is an interval that we think a "element will be inside" with a given probability p. It is an interval $[A,B]$ that $P(A\leqslant X \leqslant B) > p$

### Mean estimation
We have a sample of size $n$. We know the mean $\bar{x}$ of the sample and the standard deviation $\frac{\sigma}{\sqrt{a}}$. We don't know the mean $\mu$, at least find an interval  $[A,B]$ where the value is very probable to be. $P(A\leqslant \mu \leqslant B) > 1-\alpha$

We say $\alpha$ to be an error, so we use $1-\alpha$. Also, we know that the distribution of a standarized variable $Z$ is $N(0,1)$ so:
$P(-z_{\alpha/2}  \leqslant Z \leqslant z_{\alpha/2}) > 1-\alpha$

Now,the value $z_{\alpha/2}$ is defined (in tables). To get the actual interval:
$$P(-z_{\alpha/2}  \leqslant \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} \leqslant z_{\alpha/2}) > 1-\alpha$$
Therefore, the confidence interval is:
$$(\bar{x}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\bar{x}+z_{\alpha/2}\frac{\sigma}{\sqrt{n}}) $$

### Mean estimation (with unknown $\sigma$ )

In case we don't know the value of $\sigma$ and therefore we cannot use the previous formulation. However, we can use student-T distribution:
$$P(-t_{\alpha/2;n-1}  \leqslant \frac{\bar{X}-\mu}{\frac{s}{\sqrt{n}}} \leqslant t_{\alpha/2;n-1}) > 1-\alpha$$

The formulation is similar to the previous section, but this time we have $t_{\alpha/2}$ with $n-1$ degrees of freedom instead of $z_{\alpha/2}$. Also instead of $\sigma$ we have $s$:

$$s=\sqrt{\frac{ \sum_{i=1}^{n} (x_i-\bar{x})^2 }{n-1}}$$

### Variance estimation 
In this case, we need to use $\chi^2$ and the following formulation:
$$P(\chi^{2}_{n-1;1-\alpha/2}  \leqslant \frac{(n-1)s^{2}_{n-1}}{\sigma^2} \leqslant \chi^{2}_{n-1;1-\alpha/2}) > 1-\alpha$$


### Estimation of mean diference
When we have two population, with known variance, we may want estimate the difference between the means, $\mu_x-\mu_y$ in order to compare two populations.

$$P(-z_{\alpha/2}  \leqslant \frac{\bar{X}-\bar{Y} - (\mu_x-\mu_y) }{\sqrt{\frac{\sigma^2_x}{n_x}+\frac{\sigma^2_y}{n_y}} } \leqslant z_{\alpha/2}) > 1-\alpha$$

## Statistical Hypothesis Test


A Statistical hypothesis testing is a procedure to know if a sample confirms (or not) what we think about the population we extracted the sample from.

For a generic hypothesis test, there are two hypotheses: (i) Null hypothesis (no effect); and (ii) Alternative hypothesis (there is effect). The data of the sample should provide evidence to reject the null hypothesis, i.e. an effect exists in the population.


For performing a statistical hypothesis test, we follow the steps:

1. Formulate the null hypothesis (as $H_0$, what we think about the population) and the alternative hypothesis.
2. Build an acceptance interval. 
3. In case the null hypothesis is in the interval then we accept the null hypothesis, otherwise we reject it.


There are two errors:

* Type I ($\alpha$): The null hypothesis is rejected when null is true. The results show an effect (p-value lower than signifficance level) even there is no effect. It may happen because the sample (lucky draw).
* Type II ($\beta$): The alternative hypothesis is rejected when null is true. The results do not show an effect even if there is (unlucky draw, the effect is small, high variability, small sample).


