docs updates

MatthewReid854 · Nov 19, 2021 · a933bae · a933bae
1 parent 3d69005
commit a933bae
Show file tree

Hide file tree

Showing 6 changed files with 156 additions and 12 deletions.
diff --git a/docs/Changelog.rst b/docs/Changelog.rst
@@ -10,6 +10,7 @@ Changelog
 
 **Summary of changes**
 
+This will be writted closer to the release date.
 
 **New features**
 
@@ -26,8 +27,8 @@ Changelog
 -    Improvements to the API documentation for Convert_data, Datasets, PoF, and Utils modules. This has been a long term body of work to reformat the documentation, and it is finally complete.
 -    The required version of matplotlib has been upgraded to 3.5.0 to enable the above bugfix for the computed_zorder in ALT life stress plots.
 
-**Version: 0.7.1 --- Released 26 Oct 2021**
-'''''''''''''''''''''''''''''''''''''''''''
+**Version: 0.7.1 --- Released: 26 Oct 2021**
+''''''''''''''''''''''''''''''''''''''''''''
 
 **Summary of changes**
 

diff --git a/docs/Development roadmap.rst b/docs/Development roadmap.rst
@@ -17,16 +17,15 @@ The current release schedule is approximately every 6 to 8 weeks for minor relea
 
 **Planned for version 0.8.0 (around Dec 2021)**
 
--    Correct the formatting in the API docs for every function - still need to do Convert_data, Datasets, PoF, and Utils
 -    Within all fitters, use the FNRN format to give speed improvements in the same way as Fit_Weibull_2P_grouped works internally. This will subsequently result in the deprecation of Fit_Weibull_2P_grouped once its advantage is integrated in Fit_Weibull_2P. Need to confirm this method does not introduce too much cumulative error due to floating point precision limitations.
 -    Improvement to the online documentation for how some of these methods work, including the addition of more formulas, algorithms, and better referencing.
--    Make tests for everything that doesn't have a test yet.
--    Add plotting to all things that can plot in order to increase test coverage.
 
 **Planned for version 0.9.0 (around Mar 2022)**
 
 -    Add confidence intervals for Weibull_Mixture, Weibull_CR, Weibull_DS, Weibull_ZI, and Weibull_DSZI
 -    Enable the confidence intervals for CDF, SF, CHF to be extracted programatically from the distribution object.
+-    Make tests for everything that doesn't have a test yet.
+-    Add plotting to all things that can plot in order to increase test coverage.
 
 **Low priority (more of a wish list at this point)**
 

diff --git a/docs/How does Maximum Likelihood Estimation work.rst b/docs/How does Maximum Likelihood Estimation work.rst
@@ -5,4 +5,147 @@
 How does Maximum Likelihood Estimation work
 '''''''''''''''''''''''''''''''''''''''''''
 
-This is a placeholder for a theory document which will be written soon.
+Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a model using a set of data.
+While MLE can be applied to many different types of models, this article will explain how MLE is used to fit the parameters of a probability distribution for a given set of failure and right censored data.
+
+MLE works by calculating the probability of occurrence for each data point (we call this the likelihood) for a model with a given set of parameters.
+These probabilities are summed for all the data points.
+We then use an optimizer to change the parameters of the model in order to maximise the sum of the probabilities.
+This is easiest to understand with an example which is provided below.
+
+There are two major challenges with MLE. These are the need to use an optimizer (making hand calculations almost impossible for distributions with more than one parameter), and the need for a relatively accurate initial guess for the optimizer.
+The initial guess for MLE is typically provided using `Least Squares Estimation <https://reliability.readthedocs.io/en/latest/How%20does%20Least%20Squares%20Estimation%20work.html>`_.
+A variety of `optimizers <https://reliability.readthedocs.io/en/latest/Optimizers.html>`_ are suitable for MLE, though some may perform better than others so trying a few is sometimes the best approach.
+
+There are several advantages of MLE which make it the standard method for fitting probability distributions in most software.
+These are that MLE does not need the equation to be linearizable (which is needed in Least Squares Estimation) so any equation can be modeled.
+The other advantage of MLE is that unlike Least Squares Estimation which uses the plotting positions and does not directly use the right censored data, MLE uses the failure data and right censored data directly, making it more suitable for heavily censored datasets.
+
+The MLE algorithm
+"""""""""""""""""
+
+The MLE algorithm is as follows:
+
+1. Obtain an initial guess for the model parameters (typically done using least squares estimation).
+2. Calculate the probability of occurrence of each data point (f(t) for failures, R(t) for right censored, F(t) for left censored).
+3. Multiply the probabilities (or sum their logarithms which is must more computationally efficient)
+4. Use an optimizer to change the model parameters and repeat steps 2 and 3 until the total probability is maximized.
+
+As mentioned in step 2, different types of data need to be handled differently:
+
++------------------------+-----------------------------------------------------------------+
+| Type of observation    | Likelihood function                                             |
++========================+=================================================================+
+| Failure data           | :math:`L_i(\theta|t_i)=f(t_i|\theta)`                           |
++------------------------+-----------------------------------------------------------------+
+| Right censored data    | :math:`L_i(\theta|t_i)=R(t_i|\theta)`                           |
++------------------------+-----------------------------------------------------------------+
+| Left censored data     | :math:`L_i(\theta|t_i)=F(t_i|\theta)`                           |
++------------------------+-----------------------------------------------------------------+
+| Interval censored data | :math:`L_i(\theta|t_i)=F(t_i^{RI}|\theta) - F(t_i^{LI}|\theta)` |
++------------------------+-----------------------------------------------------------------+
+
+In words, the first equation above means "the likelihood of the parameters (:math:`\theta`) given the data (:math:`t_i`) is equal to the probability of failure at that time (the PDF (:math:`f(t)`)) with that given set of parameters (:math:`\theta`).
+
+Once we have the likelihood (:math:`L_i` ) for each data point, we need to combine them. This is done by multiplying them together (think of this as an AND condition).
+If we just had failures and right censored data then the equation would be:
+
+:math:`L(\theta|D) = c`\prod_{i=1}^{n} f_i(t_i_{failures}|\theta) \times R_i(t_i_{right censored}|\theta)`
+
+In words this means that the likelihood of the parameters of the model (:math:`\theta`) given the data (D) is equal to the product of the values of the PDF (:math:`f(t)`) with the given set of parameters (:math:`\theta`) evaluated at each failure (:math:`t_i_{failures}`), multiplied by the product of the values of the SF (:math:`R(t)`) with the given set of parameters (:math:`\theta`) evaluated at each right censored value (:math:`t_i_{right censored}`).
+
+Probabilities are numbers between 0 and 1, and when you have a lot of these values, their product gets very very small.
+A loss precision occurs because computers can only store so many decimals. Multiplication is also slower than addition for computers.
+To overcome this problem, we can use a logarithm rule to add the log-likelihoods rather than multiply the likelihoods.
+We just need to take the log of the likelihood function (the PDF for failure data and the SF for right censored data), evaluate the probability, and sum the values.
+The parameters that will maximize the log-likelihood function are the same parameters that will maximize the likelihood function.
+
+An example using the Exponential Distribution
+"""""""""""""""""""""""""""""""""""""""""""""
+
+Let's say we have some failure times: t = [27, 64, 3, 18, 8]
+
+We need an initial estimate for time model parameter (:math:`\lambda`) which we would typically get using Least Squares Estimation. For this example, lets start with 0.1 as our first guess for :math:`\lambda`.
+
+For each of these values, we need to calculate the value of the PDF (with the given value of :math:`\Lambda`).
+
+Exponential PDF: :math:`\lambda {\rm e}^{-\lambda t}`
+
+Exponential Log-PDF: :math:`ln(\lambda)-\lambda t`
+
+Now we substitute in :math:`\lambda=0.1 and t = [27, 64, 3, 18, 8]`
+
+:math:`L(\lambda=0.1|t=[27, 64, 3, 18, 8]) = (ln(0.1)-0.1 \times 27) + (ln(0.1)-0.1 \times 64) + (ln(0.1)-0.1 \times 3) + (ln(0.1)-0.1 \times 18) + (ln(0.1)-0.1 \times 8) = -23.512925`
+
+Here's where the optimization part comes in. We need to vary :math:`\lambda` until we maximize the log-likelihood.
+The following graph shows how the log-likelihood varies as lambda varies.
+
+.. image:: images/LL_range.png
+
+This was produced using the following Python code:
+
+.. code:: python
+
+    import matplotlib.pyplot as plt
+    import numpy as np
+    
+    data = np.array([27, 64, 3, 18, 8])
+    
+    lambda_array = np.geomspace(0.01, 0.1, 100)
+    LL = []
+    for l in lambda_array:
+        loglik = np.log(l)-l*data
+        LL.append(loglik.sum())
+    
+    plt.plot(lambda_array, LL)
+    plt.xlabel('$\lambda$')
+    plt.ylabel('Log-likelihood')
+    plt.title('Log likelihood over a range of $\lambda$ values')
+    plt.show()
+
+The optimization process can be done in Python (using scipy) or in Excel (using Solver), or a variety of other software packages. It could even be done by hand, though this is not only tedious, but also limited to single parameter distributions. 
+In the next section, we will look at how the optimization process becomes much harder when there are 2 or more parameters that need to be optimized simultaneously.
+
+So, using the above method, we see that the maximum for the log-likelihood occurred when lambda was around 0.041.
+We can check the value using `reliability` like this:
+
+.. code:: python
+
+    from reliability.Fitters import Fit_Exponential_1P
+    import matplotlib.pyplot as plt
+
+    data = [27, 64, 3, 18, 8]
+    Fit_Exponential_1P(failures=data)
+    plt.show()
+
+    '''
+    Results from Fit_Exponential_1P (95% CI):
+    Analysis method: Maximum Likelihood Estimation (MLE)
+    Optimizer: TNC
+    Failures / Right censored: 5/0 (0% right censored) 
+    
+    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
+       Lambda       0.0416667       0.0186339 0.0173428  0.100105
+     1/Lambda              24         10.7331   9.98947   57.6607 
+    
+    Goodness of fit    Value
+     Log-likelihood -20.8903
+               AICc  45.1139
+                BIC    43.39
+                 AD  2.43793 
+    '''
+
+.. image:: images/MLE_expon.png
+
+An example using the Weibull Distribution
+"""""""""""""""""""""""""""""""""""""""""
+
+Lets use a new dataset that includes both failures and right censored values.
+
+failures = []
+right_censored = []
+
+Once again, we need an initial estimate for the model parameters, and for that we would typically use Least Squares Estimation.
+For the purposes of this example, we will use an initial guess of :math:`\alpha = 20`, :math:`\beta=2`
+
+The rest of this will be writted soon.
diff --git a/docs/images/LL_range.png b/docs/images/LL_range.png
diff --git a/docs/images/MLE_expon.png b/docs/images/MLE_expon.png
diff --git a/reliability/Utils.py b/reliability/Utils.py
@@ -96,10 +96,11 @@ def round_to_decimals(number, decimals=5, integer_floats_to_ints=True):
     Notes
     -----
     Examples (with decimals = 5):
-    1234567.1234567 ==> 1234567.12345
-    0.0001234567 ==> 0.00012345
-    1234567 ==> 1234567
-    0.00 ==> 0
+
+    - 1234567.1234567 ==> 1234567.12345
+    - 0.0001234567 ==> 0.00012345
+    - 1234567 ==> 1234567
+    - 0.00 ==> 0
     """
 
     if np.isfinite(number):  # check the input is not NaN
@@ -5434,7 +5435,7 @@ class MLE_optimization:
     Not all of the above returns are always returned. It depends on which model
     is being used.
 
-    I the MLE method fails then the initial guess (from least squares) will be
+    If the MLE method fails then the initial guess (from least squares) will be
     returned with a printed warning.
     """
 
@@ -5876,7 +5877,7 @@ class ALT_MLE_optimization:
     Not all of the above returns are always returned. It depends on which model
     is being used.
 
-    I the MLE method fails then the initial guess (from least squares) will be
+    If the MLE method fails then the initial guess (from least squares) will be
     returned with a printed warning.
     """