#### Run the below code to import all libraries required to run sample code within this notebook

In [1]:
import numpy as np 
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly import graph_objs as go
init_notebook_mode(connected=True)
import numpy as np

import random

from IPython.display import display, Math

from bokeh.io import show, output_notebook
from bokeh.plotting import figure, show
from scipy.stats import norm 
from bokeh import plotting as pl
from bokeh.models import HoverTool, Arrow, OpenHead, NormalHead, VeeHead, Span


output_notebook()

### Solution code

```python
# Just run above code
```

This notebook is going to be short but the topic we will be covering is one of the most important topics needed to understand many aspects of statistics and probability. It is the idea of likelihood. Most courses, textbooks don't really tackle likelihood explicitly, even if they do its just one single equation. We are going to try and gain an intuitive understanding of likelihood. <br>

So the plan for this notebook is- <br> 

1) Definition of the likelihood function 

2) Likelihood function with multiple points 

Through this notebook we are mostly going to used canned examples rather than real data. 


## Definition of likelihood 

To understand likelihood we will start with a simple example- Suppose you work for a manufacturing company that makes nails and you are given 1000 nails and asked to check the precision of the process? How would you do this? 

Well, for step one, you would would measure the length of all the nails and get a distribution of nail lengths. Turns out the distribution you get is an normal distribution. Your data can be nicely represented by a normal distribution with mean of 10 cm and a standard deviation of 0.5 cm. 

So lets actually plot distribution- 


In [2]:
conf_int =0 
xrange =np.linspace(5,15,1000)
pdf = norm(10,0.5).pdf(xrange)
tools_to_show= 'box_zoom,pan,save,hover,reset,tap,wheel_zoom'        


fig = pl.figure(x_range=[5,15], 
                y_range= [0,1.2], 
                plot_height=400,
                tools = tools_to_show,
                title="Distribution of nail lengths",
                x_axis_label= "Nail lengths",
                y_axis_label ="PDF")

fig.line(x=xrange, y= pdf, line_width = 4, legend ="Normal dististribution")

fig.xgrid.grid_line_color = None
fig.y_range.start = 0

shade_x = np.arange(9,9.5,0.001)
shade_region=norm(10,0.5).pdf(shade_x)

shade_region[0] = 0 
shade_region[-1] = 0

fig.patch(shade_x, shade_region, color="red", alpha =0.4, legend= "Area under the curve of 9.0< L <9.5")
fig.legend.location="top_right"

hover = fig.select(dict(type=HoverTool))
hover.tooltips = [("xvalue", "@x"), ("yvalue", "@y")]


show(fig)


### Solution code

```python
# Just run above code
```

So what you are looking at above is the normal distribution of nail lengths. 

Question: What do you think does the red region or the area under the curve represent? 

Answer: The area under the curve represents the probability of finding a nail whose length is between 9.0 cm and 9.5 cm. Why do we care about this? Well your boss asked! 

Now we can say that-   <br>
probability of finding nails between 9.0 cm and 9.5 cm  = p( 9.0 cm <x < 9.5 cm | $\mu$ = 10.0,$\sigma$ = 0.5  )

what the right side of the equation represents is the -  <br>
p( 9.0 cm < length of nail < 9.5 cm GIVEN mean of distribution = 10.0,standard deviation = 0.5  )

Essentially, what we are getting from the above term is given the parameters of the distribution- mean and standard deviation, what is the probability for a set of data points. 

Now lets flip this. Lets ask that given a data point, how does the value of distribution change, as we change parameter values? To do that lets stay with a given nail length. Lets say we ask, for a nail length  = 9.3 cm what is the value of distribution parameters. So lets look at our calculator below. 

In [3]:
conf_int =0 
xrange =np.linspace(5,15,1000)
pdf = norm(10,0.5).pdf(xrange)
tools_to_show= 'box_zoom,pan,save,hover,reset,tap,wheel_zoom'        


def get_sig_lvl(nail_length, mean_dist, std_dist): 
    xrange =np.linspace(5,15,1000)
    pdf = norm(mean_dist,std_dist).pdf(xrange)

    fig = pl.figure(x_range=[5,15], 
                    plot_height=400,
                    tools = tools_to_show,
                    title="PDF",
                    x_axis_label= "Nail lengths",
                    y_axis_label ="Count")
    
    
    fig.line(x=xrange, y= pdf, line_width = 4)
    fig.circle(x=nail_length, y= norm(mean_dist,std_dist).pdf(nail_length), line_width = 4, color ="red")
    fig.line(x=[nail_length,nail_length ], y= [0, norm(mean_dist,std_dist).pdf(nail_length) ], line_width = 2, color= "red")
    fig.line(x=[0,nail_length ], y= [norm(mean_dist,std_dist).pdf(nail_length), norm(mean_dist,std_dist).pdf(nail_length) ], line_width = 2, color= "red")


    fig.xgrid.grid_line_color = None
    fig.y_range.start = 0
    hover = fig.select(dict(type=HoverTool))
    hover.tooltips = [("xvalue", "@x"), ("yvalue", "@y")]
    print("Value of distribution is {}".format(norm(mean_dist,std_dist).pdf(nail_length)))
    show(fig)

    return None 




interact(get_sig_lvl, 
                nail_length = widgets.FloatText(value = 9.3, 
                                                        min =5,
                                                        max = 15, 
                                                        step =0.001), 
              mean_dist= widgets.FloatText(value = 10, 
                                                        min =5,
                                                        max = 15, 
                                                        step =0.001), 
              std_dist= widgets.FloatText(value = 0.5, 
                                                        min =0,
                                                        max = 1, 
                                                        step =0.001)
             
                );


interactive(children=(FloatText(value=9.3, description='nail_length', step=0.001), FloatText(value=10.0, descrâ€¦

### Solution code

```python
# Just run above code
```

In the above figure, we can change the distribution parameters (mean and std) and see what is the value of the distribution. 
The value of the distribution is what is called the likelihood. Now this can be a bit confusing, since it has the same value as the as the probability distribution function, but what is important to remember is that we are fixing the data point and varying the parameters i.e we are asking the question, what is the likelihood that this data belongs to the a distribution with certain parameter values. 

Mathematically we would write it in the following way- 

$L(\theta)  = p(x|\theta)$

where- <br>
$\theta $ is a term that represents the parameters of the distribution. So it represents the mean and the std in our case. <br>
$p(x|\theta)$ represents the probability distribution function in case of a continuous variable or a probability mass function in case of a discrete variable 

In the above equation our unknown is the general parameter $\theta$ suppose we set $\theta$ as just the mean of the distribution $\mu$ and we are going to fix the value of the standard deviation $\sigma = 0.5$. In doing this, the above equation really becomes 


$L(\mu)  = p(x|\mu)$

Now I am not going to go into the mathematically detail of why but typically, we calculate the log likelihood of a distribution rather than the likelihood. To naively state why, it is because the log likelihoods are simpler to calculate. Hence our above equation becomes 

$l(\mu) =\log L(\mu)  =\log p(x|\mu)$ <br>

well in our case of nail length measurements we have a distribution, so we can actually pick a range of values for the mean and plot the likelihood. So lets do that. We are going to pick a range of values such that ($8\lt \mu  \lt 12$). 


In [14]:
mean_range = np.arange(0,20, 0.001)
x_value = 9.3
std= 0.5
likelihood =[]

for mu in mean_range: 
    likelihood.append(np.log(norm(mu,std).pdf(x_value))) 
    
likelihood_max_mean = np.argmax(np.asarray(likelihood))

tools_to_show= 'box_zoom,pan,save,hover,reset,tap,wheel_zoom'        

fig = pl.figure(x_range=[0,20], 
                    plot_height=400,
                    tools = tools_to_show,
                    title=" Plot of log likelihood ",
                    x_axis_label= "Nail length mean value ",
                    y_axis_label ="log likelihood")
    
fig.line(x =mean_range, y =likelihood )
fig.xgrid.grid_line_color = None
v_line = Span(location= mean_range[likelihood_max_mean], dimension ="height", line_color = "red", line_width =3) 

fig.add_layout(v_line )
hover = fig.select(dict(type=HoverTool))
hover.tooltips = [("xvalue", "@x"), ("yvalue", "@y")]

show(fig)
print("Mean value for maximum for likelihood {} cm " .format(mean_range[likelihood_max_mean]))

Mean value for maximum for likelihood 9.3 cm 


### Solution code

```python
# Just run above code
```

well the above plot is interesting. Why you ask? Well we can see that the value of the log likelihood seems to peak at a value and then fall down on both sides. What does this represent? For that, lets look at value around which you see the peak in the log likelihood. The value of the peak seems to be around a mean value of 9.3 cm. Wait a minute! its the same value as our data point! So what we have inadvertently shown is that if we use the mean as a parameter in calculating the likelihood then we can actually get a maximum value for the likelihood. We have shown, inadvertently that the mean is a maximum likelihood estimate for the normal distribution. The maximum likelihood estimate is a super important concept in statistics and machine learning in general. We are going to need a whole separate notebook for that. We are going to talk about it in detail in another notebook. So lets get back to the likelihood function. 

Now what the maximum likelihood also means is that a distribution with a mean of 9.3 cm and std of 0.5 cm will best fit our data. But that does not really make sense, since our data is just a single point. 

The next thing we want to talk about is how do you get the likelihood when you have multiple data points rather than a single data point.
 

## Likelihood with multiple points 

To calculate the likelihood with multiple points can be a bit tricky. Mathematically what we are doing really is - <br> 
$L(\theta)  = p(x_1|\theta)p(x_2|\theta)p(x_3|\theta)p(x_4|\theta)....$

where- <br>
$(x_1, x_2,x_3, x_4.. )$ are data points. <br>

so in our case, this would be us not just taking a single nail but picking many nails and asking the question what would the likelihood look like if we are to vary our parameter. 

We usually write the above equation in a really concise form as - 

$L(\theta)  = \prod \limits_{i=1}^{i =n} p(x_i|\theta)$

where- <br>
$\prod \limits_{i=1}^{i =n} p(x_i|\theta) = p(x_1|\theta)p(x_2|\theta)p(x_3|\theta)p(x_4|\theta)....p(x_n|\theta)$ 


All of the above is what you write in math. How do you do this code? Well first we need data points for us to start writing the likelihood. Suppose I give you a list of points. For example  data_points = (8.5, 9.2, 10.1, 11.4). Can you tell me what the best distribution that will fit this data? Again we are going to assume a std of 0.5 cm. 

So to do that we are going to break it down into the following task- <br>
1) For a given value of the distribution parameter, find the value of the probability density function for individual data points

2) Take their product, this is your likelihood, take a log of this value 

3) Repeat 1 and 2 for each value of the mean 

4) Plot the result of 3 and identify the maximum value of the log likelihood

In [15]:
values = [8.5, 9.2, 10.1, 11.4]
log_likelihood = 0
for x in values: 
    log_likelihood += np.log(norm(10,0.5).pdf(x))


### Solution code

```python
# Just run above code
```

Above we have use a log rule - <br> 
$\log \prod \limits_{i=1}^{i =n} p(x_i|\theta) = \log p(x_1|\theta) +...+p(x_n|\theta)$ <br>

to calculate the log_likelihood. Its a handy little rule especially when dealing with logs

what we have done in the above code cell is just for one value of mean. Where $\mu = 10.0$ 

Question: Can you calculate the log likelihood for a range of means between 0 and 20. 

In [23]:
#answer

def get_llikelihood(mean_value): 
    llhood = 0 
    input_values  =  [8.5, 9.2, 10.1, 11.4]
    for x in input_values: 
        llhood  += np.log(norm(mean_value,0.5).pdf(x))
    return llhood

values = [8.5, 9.2, 10.1, 11.4]
store_likelihood = []
mean_range = np.arange(0,20,0.01)
log_likelihood =[]

for mu in mean_range: 
        log_likelihood.append(get_llikelihood(mu))
        

loc_mean_val= np.argmax(np.asarray(log_likelihood))
fit_mean = mean_range[loc_mean_val]
print(loc_mean_val)
tools_to_show= 'box_zoom,pan,save,hover,reset,tap,wheel_zoom'        

fig = pl.figure(x_range=[0,20], 
                    plot_height=400,
                    tools = tools_to_show,
                    title=" Plot of log likelihood ",
                    x_axis_label= "Nail length mean value ",
                    y_axis_label ="log likelihood")
    
fig.line(x =mean_range, y =log_likelihood )
fig.xgrid.grid_line_color = None
v_line = Span(location= fit_mean, dimension ="height", line_color = "red", line_width =4) 

fig.add_layout(v_line )
hover = fig.select(dict(type=HoverTool))
hover.tooltips = [("xvalue", "@x"), ("yvalue", "@y")]

show(fig)
print("Mean value for maximum for likelihood {} cm " .format(fit_mean))


980


Mean value for maximum for likelihood 9.8 cm 


### Solution code

```python
# Just run above code
```

What does the above plot tell us? It tells us that the log likelihood reaches maximum for a mean of 9.8 cm. This means that a normal distribution with a mean of 9.8 and standard deviation of 0.5 would best fit our data points. 

So in essence what we have found really is, given the whole bunch of data points, we can use the likelihood to get what is the best value of our distribution parameters. This is super useful, since many times we may have no idea really what kind of normal distribution we should use given some data. Well this is a method that lets you get the optimum parameters for the mean of the distribution, you can do this for the standard deviation as well but we will not go into that here. 

With that we will be closing this topic. You will see much more of the likelihood function when we discuss bayes rule and maximum likelihood estimate so jump over to those notebooks if you are interested in learning more about how likelihood is relevant. 

In [7]:
# End of notebook

### Solution code

```python
# End of notebook
```