In [1]:
# RUN THIS CELL: it loads some style files
from IPython.core.display import HTML, display, Math
with open( './style/custom.css', 'r' ) as f: html_style = f.read()
HTML( html_style )

# Bernoulli random variables

A r.v. $X\in R$ is called a <mark>Bernoulli</mark> if $R=\{0,1\}$. 

Bernoulli r.v. model that has only two possible outcomes which are conventionally called failure and success.

Usually $0$ stands for <mark>failure</mark> and $1$ stands for <mark>success</mark>.  

We call $\Pr(X{=}1)$ the <mark>probability of success</mark> of $X$.
    
(In some examples we allow $R$ to be any that contains only two elements, e.g. very common is $R=\{H,T\}$, where $H$ stands for head and $T$ stands for tail. Then we understand that one of the two values is identified vith succes and the other with failure -- in this course, $H$ is success and $T$ is falure.)

A Bernoulli r.v. $X$ is canonically associated to the event $\{\omega\in\Omega\ :\ X(\omega)=1\}$.

Viceversa, the Bernoulli r.v. canonically associated to the event $E$ is

$\quad$<mark>$1_E(x)$</mark>$\ \ =\ \ \left\{\begin{array}{ll}1&\textrm{ se }\ x\in E\\0&\textrm{ se }\ x\notin E\end{array}\right.$

This is called <mark>indicator function</mark> (or <mark>characteristic function</mark>) of $E$. 

We write <mark>$X\sim B(1,p)$</mark> to say that $X$ is a Bernoulli r.v. with probability of success $p$.

# Binomial random variables

We say tha $X\in\{0,\dots,n\}$ is a <mark>binomial random variable</mark> if for some $X_1,\dots,X_n$ independent Bernoulli r.v.s with success probability $p$. The r.v.

$\displaystyle\quad X\ \ =\ \ \sum^n_{i=1}X_i$

More precisely we say $X$ is a binomial random variable <mark>with parameters $n$ and $p$</mark>.

We write for short <mark>$X\sim B(n,p)$</mark>.

Note that $X$ counts the number of successes in a sequence of $n$ independent experiments (all Bernoulli trials with success probability $p$).

The parameters $n$ and $p$ determine the distribution function which is called the <mark>binomial distribution</mark>

$\displaystyle\quad P(X{=}k)\ \ =\ \ {n\choose k}p^k(1-p)^{n-k}$

Then the cumulative distribution function is

$\displaystyle\quad P(X{\le}k)\ \ =\ \ \sum^k_{i=0}{n\choose i}p^i(1-p)^{n-i}$

Below we plot the probability mass function of $X \sim B(20,p)$ for different $p$ (click on the menu belo the graph).

    Run the cell below and click on the chart to view the distribution.


In [2]:
from scipy.stats import binom, norm                       # libraries for statistical functions
from ipywidgets import interact, FloatSlider, IntSlider   # libraries for interactions with the graphic
from bokeh.io import push_notebook, show, output_notebook, output_file # libraries for graphic output
from bokeh.plotting import figure
output_notebook()
options = dict(plot_height=400,plot_width=700,tools="pan,wheel_zoom,reset,save,crosshair,box_select")

In [4]:
p = .5
n = n_max = 20            # maximal number of trials
x = range(n_max+1)   # inizialization
plot1 = figure(title="PMF of X ~ B(n,p)", x_axis_label = "number of successes", # create an empty figure
               y_axis_label = "probability", x_range=(0,n_max), y_range=(0,0.4), **options )
plot1.title.text_font="times"
plot1.title.text_font_size="16pt"
r1 = plot1.vbar(x, top=[0 for i in x], bottom=0, # initialize barplot
                width=0.8, color="#111188", alpha=0.5 
               )
show(plot1, notebook_handle=True)
    
def update1(n, p):
    x = range(n+1)
    data = {'x':x, 'top': binom.pmf(x,n,p) }
    r1.data_source.data = data
    push_notebook()

interact(update1, 
         n = IntSlider  (description="n", min=10,  max=n_max, step=1,    value=50), 
         p = FloatSlider(description="p", min=0.1, max=0.95,  step=0.05, value=0.5));

interactive(children=(IntSlider(value=20, description='n', max=20, min=10), FloatSlider(value=0.5, description…

Al variare di $n$ la distribuzione si sposta verso destra. La <mark>moda</mark> (il massimo) della distribuzione $np$.

Se invece di $X$ plottiamo la v.a. $X-np$ otteniamo tutte distribuzioni centrate nello $0$. Questo facilita il confronto. Questa operazioe si chiama <mark>centralizzazione</mark>.

Vediamo come interpretare questa v.a. $X-np$. Qui sopra avevamo costruito $X$ a partire da $n$ v.a.i. di Bernoulli $X_1,\dots, X_i$. Consideriamo le variabili $X_i-p$. Possiamo immaginarle come il risultato di un gioco dove in caso di successo guadagno $1-p$ e in caso di insuccesso guadago $-p$, ovvero perdo $p$. Alla fine di $n$ partite la vincita totale (guadagni $-$ perdite) sarà

$\quad\displaystyle\sum^n_{i=1}(X_i-p)\ \ =\ \ \bigg(\sum^n_{i=1}X_i\bigg) - np$

$\quad\displaystyle\phantom{\sum^n_{i=1}(X_i-p)}\ \ =\ \ \bar X - np$

In [None]:
x = range(0,n_max+1)
p = .5
# create an empty figure
plot2 = figure(title="PMF of X-np for  X ~ B(n,p)", 
               x_axis_label = "#successes",y_axis_label = "probability", 
               x_range=(-n_max*p,n_max*p), y_range=(0,0.3), **options )
plot2.title.text_font="times"
plot2.title.text_font_size="16pt"

# initialize and show barplot
r2 = plot2.vbar(x, top=[0 for i in x], bottom=0, width=0.9, 
                color="green", alpha=0.5)
show(plot2, notebook_handle=True)

def update2(n, p):
    x = range(n+1)
    data = {'x':[ i - n*p for i in x], 'top':  binom.pmf(x,n,p) }
    r2.data_source.data = data
    push_notebook()
    
interact(update2, 
         n = IntSlider  (description="n", min=10, max=n_max, step=1, value=50), 
         p = FloatSlider(description="p", min=0.1, max=0.95, step=0.05, value=0.5));

In [None]:
#!jupyter nbconvert 08_Bernoulli\&Binomial.ipynb --to html