##Maximum Likelihood Estimation:
###MLE is a method a obtaining the values of the parameters which maximize the joint probability mass function(or pdf) for the observed valus. That is if we have a sample of a normal distribution, MLE gives the values of 'MU(mean)' and 'SIGMA^2(variance)' which maximizes the PMF of a normal distribution, i.e. the values for which the sample has most likely occured.

###MLE is a useful method to find population parameters. It only requires us to make an assumption of which distribution the sample data has come from.

###Let us understand how MLE is performed with an example

In [1]:
import pandas as pd
df=pd.read_csv("MLE.csv")
df

Unnamed: 0,id,Y,X
0,1,2,1
1,2,6,4
2,3,7,5
3,4,9,6
4,5,15,9


In [8]:
#usual method of finding model using least squares
import statsmodels.api as s
import numpy as np

x=df['X']
y=df['Y']
x=s.add_constant(x)
model=s.OLS(y,x).fit()
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                      Y   R-squared:                       0.980
Model:                            OLS   Adj. R-squared:                  0.973
Method:                 Least Squares   F-statistic:                     145.9
Date:                Wed, 19 Jan 2022   Prob (F-statistic):            0.00122
Time:                        10:54:09   Log-Likelihood:                -4.5811
No. Observations:                   5   AIC:                             13.16
Df Residuals:                       3   BIC:                             12.38
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2882      0.755     -0.382      0.7



In [9]:
print("Standard Deviation of error term is:",np.std(model.resid))

Standard Deviation of error term is: 0.6048820983804833


###In the given example, we found a model by the least squares method.
###Now to find the same model using MLE, we need to know the PMF/PDF of the distribution of e(error term). Least squares requires us to have a normal distribution of e, but MLE works for models with any distribution. Let us assume that e has a normal distribution.

In [17]:
from scipy.optimize import minimize

x=df['X']
y=df['Y']
#Define the maximum likelihood function(denoted by L).
#It is the product of the PMF values corresponding to the sample data.
def f(parameters):
  m=parameters[0]
  b=parameters[1]
  sig=parameters[2]
  y_exp=m*x+b
  L=((len(x)*np.log(2*np.pi*sig**2))/2)+((sum((y-y_exp)**2))/(2*(sig**2)))
  return L
model1=minimize(f,np.array([2,2,2]),method="L-BFGS-B")
model1

      fun: 4.581084072761883
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
      jac: array([-1.08357767e-05, -2.39808173e-06, -7.72715225e-06])
  message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 112
      nit: 18
   status: 0
  success: True
        x: array([ 1.61764705, -0.28823541,  0.60488181])

###The procedure involves maximizing the likelihood function. The parameter values which maximize the function are the maximum likelihood estimates.

###In the code above, we used an iterative method called the "B_FBGS_B" method which makes it easier for the computer to calculate. The calculations for minimzing most ML Functions are extremely complicated to do for humans, thus we use iterative process.

###Model1 array stores the parameter values for the maximized L function. 

###Now lets look back at the least squares method. From the result, we can see that the parameter values corresponding to the model,
###y=B0+B1x+e is
###B0=-0.2882
###B1=1.6176
###std. dev. of e=.6048
###Notice that these are exactly the values that we have obtained from the MLE method.
###This proves the validity of the MLE method