# 1. Bernoulli distribution

## 1.1  Click through rates
- The rate used in web-based statistics
    + Number of ads clicking / number of visitors
    + Number of signups / number of visits
- Model by Bernoulli distribution
    + Binary outcomes: User clicked/Not click, buy/not buy

## 1.2 Bernoulli distribution
- A probabilistic model that represents `a series of independent n trials`
- Each trial must produce 2 outcomes
    + `Success`: P(Success) = p
    + `Failure`: P(Failure) = 1 - p

#### Properties
- Outcome set: $size(\Omega) = 2^n$
- Each Success/Failure event at i-th trial are `mutually independent`

#### Example
- Toss an unfair coin `n` times
    + $H_i$: Get a head at i-th toss
        + $P(H_i) = p$
    + $T_i$: Get a tail at i-th toss
        + $P(T_i) = 1 - p$

#### PMF
- Let X is a random variable that
    + Takes value 1 with probability p
    + Takes value 0 with probability 1-p

$$p(x) = p^x(1-p)^{1-x},\ x \in \{0,1\}$$

# 2. Maximum Likelihood

## 2.1 Maximum Likelihood problem
- Suppose we toss an unfair coin and want to know the `p` of that unfair coin
    + Collect data from `n` tosses as: $\{ x_1, x_2, \dots, x_N \}$, $x \in \{ 0,1 \}$
    + Example: Toss the coin 3 times and get results: $\{ 1, 0, 1 \}$

#### Solve
- Model let unknown p = model param $\theta$
    + $p(x) = \theta^x(1-\theta)^{1-x},\ x \in \{0,1\}$

- Define the Likelihood function
    
    
$$\begin{split}
L(\theta) &= p (data | \theta) \\
    &= \prod\limits_{i=1}^{N} p(x_i | \theta) \\
    &= \prod\limits_{i=1}^{N} \theta ^ {x_i} (1 - \theta)^{1 - x_i}
\end{split}$$

- let $x_1 = 1, x_2=0,x_3=1 $; $L(\theta)$ becomes

$$L(\theta) = \theta^2(1-\theta)$$

- Maximum Likelihood problem: Find the $\theta$ value that maximize $L(\theta)$
    + Find $\theta$ that $\frac{dL}{d\theta} = 0$
    + Result is $\hat{\theta}$, which $\hat{\theta} = \text{argmax}_{\theta}L(\theta)$


$$\begin{split}
\frac{dL}{d\theta} &= 0 \\
    => 2 \theta - 3 \theta^2 &= 0 \\
    => \theta (2 - 3\theta) &= 0 \\
    => \theta &= 0\ or\ \theta = \frac{2}{3}
\end{split}$$

- Choose $\hat{\theta} = \frac{2}{3}$, $L(\hat{\theta}) = \frac{4}{27}$

## 2.2 Log-likelihood
- Usually we use Log-likelihood instead of Likelihood
    + Take log before derivative L
    + Solve maximize log(L) with $\frac{d log(L)}{d \theta} = 0$
+ Reason:
    + Log = monotonically increasing function
    + easy to solve

#### Solve 
- log(L)

$$\begin{split}
log(L) &= log \left( \prod\limits_{i=1}^N \theta ^ {x_i} (1 - \theta)^{1 - x_i}  \right) \\
    &= \sum\limits_{i=1}^{N} \left( x_i log \theta + (1-x_i) log (1 - \theta)\right)
\end{split}$$

- Solve $\frac{d log(L)}{d\theta} = 0$

$$\begin{split}
\frac{d log(L)}{d\theta} &= 0 \\
    => \frac{1}{\theta} \sum\limits_{i=1}^{N} x_i - \frac{1}{1-\theta} \sum\limits_{i=1}^{N} (1 - x_i) &= 0 \\
    => \theta &= \frac{1}{N} \sum\limits_{i=1}^{N} x_i
\end{split}$$