In [1]:
import numpy as np
import pandas as pd

## Bernoulli

The Bernoulli is a discrete Law that gets a parameter $p$ between $[0,1]$


We will call from this moment the distribuition that have a Bernoulli distribution as the following


$ X_i \leadsto Bernoulli(p) $


$X_i \leadsto B(p)$



With the following formula: $P_{p}(X=x)= p^{x}*(1-p)^{1-x}$


where $x$ can assume the following values $ \{0;1\}$


The idea behind this variable is that we're going to do an experiment, where we can get two answers, and only two answers that have a probability all besides that have probability zero, so after the experiment you'll see one result or the other.


Let's deduce some fundamental conceps about this Law:

In [3]:
def Bernoulli(parameter,x):
    if x==1:
        probabilite = parameter
    elif x==0:
        probabilite = 1-parameter
    else:
        probabilite = 0
    
    return probabilite

In [7]:
def Bernoulli_accumulated_probability(parameter,k):
    soma=0
    if k>=0 :
        soma = 1-parameter
    if k>= 1:
        soma = 1
    return soma

## Somme importants results

By the definition:


<div style="margin-bottom: 30px;">
    
<br>
    
<center>
$E[X_i] = \sum_{k=1}^{\infty} x*P(x=k)$
</center
<br> 
<br>
<center>
Mais $P(x=0) = (1-p) ; P(x=1) = p ; P(x=k)=0 \forall x \neq \{1;0\}$
</center>
<br>
<center>
$E[X_i] = p$
</center>
<br>
   
For $E[(X_i)^{2}]:$
    
<br>
    
<center>$E[(X_i)^{2}] = \sum_{k=1}^{\infty} x^{2}*P(x=k) \iff E[(X_i)^{2}] = p$</center>
    
<br>
    
Then
    
<br>
    
<center>$ Var[X_i] = E[(X_i)^{2}] - (E[X_i])^{2} = p - p^{2} = p(1-p)$</center>

<br>

<center>$Var[X_i] = p(1-p)$</center>
    
<br>

    
    
</div>



### Some  Bernoulli estimators

$\mu_1(1) = \overline{X_n}$

The first estimators commes from the method of moments:

<div style="margin-bottom 30px">
<br>
    <center>$ E[\overline{X_n}] =  E[\frac{\sum_{i=1}^{n} X_i}{n}] = \frac{\sum_{i=1}^{n} E[X_i]}{n} = \frac{np}{n} = p$</center>
<br>
    <center>$\tilde{p} = \overline{X_n}$</center>
<br>
   The second estimator commes from the maximum likelihood estimation (MLE): 
<br>
    
Even with the variables are not independent we can right them as a produtory , we gonna optimize the fonction Ln because is easier to work 
    
<br>
    
<center>$Ln(X,p)= \prod_{i=1}^{n} p^{x_i}(1-p)^{x_i} = p^{\sum_{i=1}^{n} x_i}(1-p)^{n-\sum_{i=1}^{n} x_i}$</center>    
<br>
    
<center>$\frac{\partial Ln(X,p)}{\partial p} = (\sum_{i=1}^{n} x_i)p^{(\sum_{i=1}^{n} x_i)-1}(1-p)^{n-(\sum_{i=1}^{n} x_i)}-p^{(\sum_{i=1}^{n} x_i)}(n-(\sum_{i=1}^{n} x_i))(1-p)^{n-(\sum_{i=1}^{n} x_i)-1} =0 \implies$  </center> 
    
<br>
    <center>$\hat{p} = \overline{X_n}$</center>
<br>  
Then the estimators are the same and the following analyses will be the same 
<br>
    
</div>


#### Estimators List: 



1. $\tilde{p} = \overline{X_n}$(MM)

2. $\hat{p} = \overline{X_n}$(MLE)


#### Bias of an estimator

<div style="margin-bottom= 30px;">
    <br>
    <center>$b_{p}[\hat{p}] = E[\hat{p}] - p = 0$</center>
    <br>
    <center>Therefore he is not bias $b_{p}[\hat{p}]=0$</center>
    <br>
</div>

#### Risque of an estimator

<div style="margin-bottom= 30px;">
    <br>
    <center>$R_{p}[\hat{p}] = (b_{p}[\hat{p}])^{2} + Var[\hat{p}]$</center>
    <br>
    <center>$Var[\hat{p}]= Var[\frac{\sum_{k=1}^{n}X_i}{n}]= \frac{\sum_{k=1}^{n}Var[X_i]}{n^{2}}= \frac{p(1-p)}{n}$</center>
    <br>
    <center>$R_{p}[\hat{p}] =  \frac{p(1-p)}{n}$</center>
    <br>
    <br>
</div>



#### Consistency of an estimator

Hypothèses et conditions

1. $(X_n)$ a collection of iid samples from a random variable 
2. $E[X_i]<\infty$

<div style="margin-botto: 30px">
    <br>
    <center>Using The strong law of large numbers(LLN) we get that :</center>
    <br>
    <center>${\displaystyle {\overline {X}}_{n}\ {\overset {\text{a.s.}}{\longrightarrow }}\ E[X_i]=p \qquad {\textrm {when}}\ n\to \infty .} $</center>
    <br>
    <center>And besides, we know that the function $f(x) = x$ is a continuous so using the continuity we have the following : </center>
    <br>
    <center>${\displaystyle f({\overline {X}}_{n}\ ){\overset {\text{a.s.}}{\longrightarrow }}\ f(p) \qquad {\textrm {when}}\ n\to \infty .} \iff {\displaystyle \hat{p}{\overset {\text{a.s.}}{\longrightarrow }}{p} \qquad {\textrm {when}}\ n\to \infty .} \implies {\displaystyle \hat{p}{\overset {\text{Prob}}{\longrightarrow }}{p} \qquad {\textrm {when}}\ n\to \infty .}$</center>
    <br>
    <center>Then $\hat{p}$   is a consistent estimator. </center>
    <br>
</div>



#### Convergence speed

Hypothèses et conditions

1. $(X_n)$ a collection of iid samples from a random variable 
2. $E[X_i]=p$
3. $Var[X_i]=p(1-p)<\infty $


<div style="margin-botto: 30px">
    <br>
    <center>Using The central limit theorem (CLT) we get that :</center>
    <br>
    <center>${\displaystyle \frac{{\sqrt {n}}\left({\bar {X}}_{n}-E[X_i] \right)}{\sqrt{Var[X_i]}}\mathrel {\overset {d}{\longrightarrow }} {\mathcal {N}}\left(0,1\right).}$</center>
    <br>
    <center>And we as know $\hat{p} = \overline{X_n}$ and $E[X_i] = p$ so , just substituing the values we have the following: : </center>
    <br>
    <center>${\displaystyle \frac{{\sqrt {n}}\left(\hat{p}-p \right)}{\sqrt{p(1-p)}}\mathrel {\overset {d}{\longrightarrow }} {\mathcal {N}}\left(0,1\right).}$</center>
    <br>
    <center>Then the convergence speed is the following</center>
    <br>
    <center>${\displaystyle {\sqrt {n}}\left(\hat{p}- p \right)\mathrel {\overset {d}{\longrightarrow }} {\mathcal {N}}\left(0,p(1-p)\right).}$</center>
    <br>
</div>
