Skip to content

Commit

Permalink
updating docs
Browse files Browse the repository at this point in the history
  • Loading branch information
MatthewReid854 committed Oct 13, 2021
1 parent e0aa12e commit 6c6a5b5
Show file tree
Hide file tree
Showing 9 changed files with 158 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/Changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Version 0.7.0 has a few really useful enhancements. The first of these is the ad
**Other**

- Changed the method used by curve_fit within least_squares. Previously was 'dogleg' which was very slow. Changed to 'trf'. This significantly speeds up the location shifted distributions (Weibull_3P, etc.)
- Changed the group splitting algorithm used in Fit_Weibull_Mixture and Fit_Weibull_CR. The new method is more robust and provides better a better initial guess of the parameters for MLE.
- Changed the group splitting algorithm used in Fit_Weibull_Mixture and Fit_Weibull_CR. The new method is more robust and provides a better initial guess of the parameters for MLE.
- Completed the reformatting of the API docs for all the ALT_Fitters. Still need to do this for the Convert_data, Datasets, PoF, Utils modules. Reformatted API docs for these remaining modules will be part of a future release.

**Version: 0.6.0 --- Released: 23 July 2021**
Expand Down
156 changes: 155 additions & 1 deletion docs/How are the plotting positions calculated.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,158 @@
How are the plotting positions calculated
'''''''''''''''''''''''''''''''''''''''''

This is a placeholder for a theory document which will be written soon.
When we want to fit a probability distribution to a dataset (such as failure times), there are a variety of methods we can use.
The most popular of these methods are Least Squares estimation (LS) and Maximum Likelihood Estimation (MLE).
As a prerequisite to Least Squares Estimation, we need an estimate of the CDF (y-values) for a given dataset (x-values).
Once we have both the x-values and the y-values we can plot the points (x,y) on a graph.
These are called the plotting positions.

There are a variety of different algorithms for obtaining the plotting positions, but the most popular is the rank adjustment method which will be described in detail below.
To introduce the algorithm, we will start with complete data (ie. no censoring) and then we will see how the algorithm needs to be modified when we have censored data.

Rank adjustment for complete data
"""""""""""""""""""""""""""""""""

The plotting positions algorithm for complete data is as follows:

1. sort the data in ascending order
2. create a column (i) for the rank from 1 to n.
3. estimate the CDF using :math:`y=\frac{i-a}{n+1-2a}`.

Where "n" is the number of items (len(x)) and "a" is the heuristic constant. For this example we will let a = 0.3 which will give Benard's approximation of the median rank plotting positions (the default in most software). Other heuristics are discussed below.
Something you may notice about the formula for y is that it is independent of x. You will always obtain the same y values for any array of x values of the same length.

Let's do an example using the dataset x = [25, 43, 53, 65, 76, 86, 95, 115, 132, 150]

.. image:: images/plotting_positions_1.PNG

We can now plot the x and y values to obtain the plotting positions as shown in the image below.
The dashed blue line is a Weibull_2P distribution that has been fitted to the data.
This is just for illustrative purposes to show that the empirical CDF (the calculated y-values) and the CDF of the fitted model should roughly align.

.. image:: images/plotting_positions_2.png

Rank adjustment for censored data
"""""""""""""""""""""""""""""""""

The algorithm above provides the rank (i) simply by using the item number (1 to n) when the x-values are sorted.
When we have right censored data, the ranks need to be adjusted using a few modifications to the original algorithm.
The rank adjustment algorithm for right censored data is as follows:

1. sort the data in ascending order
2. create a column (i) for the rank from 1 to n.
3. create a column (m) of the reverse rank from n to 1.
4. calculate the adjusted rank as :math:`j = j_{i-1}+\frac{n+1-j_{i-1}}{1+m}`. If the first item is a failure, then the adjusted rank of the first failure is j = 1. If the first item is not a failure, the the adjusted rank of the first failure is :math:`j=\frac{number of leading censored values}{n - 1}`. Leave the rows with censored items blank.
5. estimate the CDF using :math:`\frac{j-a}{n+1-2a}`.

Let's do an example using the dataset x = [150, 340+, 560, 800, 1130+, 1720, 2470+, 4210+, 5230, 6890]. In this dataset the values with + are right censored.

.. image:: images/plotting_positions_3.PNG

You can check this using Python like this:

.. code:: python
from reliability.Probability_plotting import plotting_positions
failures = [150,560,800,1720,5230,6890]
right_censored = [340,1130,2470,4210]
x,y=plotting_positions(failures=failures,right_censored=right_censored)
print('x =',x)
print('y =',y)
'''
x = [ 150 560 800 1720 5230 6890]
y = [0.06730769 0.1741453 0.28098291 0.40562678 0.61336657 0.82110636]
'''
We can now plot the x and y values to obtain the plotting positions as shown in the image below.
The dashed blue line is an Exponential_1P distribution that has been fitted to the data.
This is just for illustrative purposes to show that the empirical CDF (the calculated y-values) and the CDF of the fitted model should roughly align.
Note that only the failures are plotted as the right censored data does not have an empirical estimate for the CDF.

.. image:: images/plotting_positions_4.png

Plotting heuristics
"""""""""""""""""""

The plotting positions alogrithm uses the formula :math:`y=\frac{i-a}{n+1-2a}`.
We can set the heuristic constant "a" to be any value from 0 to 1 and we will get different estimates.
Some of these are better than others, but the most popular is a = 0.3 (Benard's approximation of the median ranks (typically just called "median rank")) as this is generally the most accurate.

Published literature has been produced on the following Heuristics:

+-------------------------------+------------+
| Method | a |
+===============================+============+
| Blom | 0.375 |
+-------------------------------+------------+
| Benard (Median) | 0.3 |
+-------------------------------+------------+
| Hazen (Modified Kaplan Meier) | 0.5 |
+-------------------------------+------------+
| Herd-Johnson (Mean) | 0 |
+-------------------------------+------------+
| Modal | 1 |
+-------------------------------+------------+
| Beard | 0.31 |
+-------------------------------+------------+
| Gringorten | 0.44 |
+-------------------------------+------------+
| Larsen | 0.567 |
+-------------------------------+------------+
| One-Third | 1/3 |
+-------------------------------+------------+
| Cunane | 0.4 |
+-------------------------------+------------+

There is another modification to the :math:`y=\frac{i-a}{n+1-2a}` formula to make it :math:`y=\frac{i-a}{n+b}` which allows "b" to be independent of "a".
The Kaplan Meier method uses this formula with a=0 and b=0 (making it :math:`y=\frac{i}{n}`).
The `Filliben estimate <https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot#Filliben's_estimate>`_ also uses this method with further modifications to the first and last items of the CDF.

The formula of :math:`y=\frac{i-a}{n+1-2a}` is not the only way to obtain plotting positions. There are other methods involving `Beta and F distributions <https://www.reliawiki.com/index.php/Parameter_Estimation#Beta_and_F_Distributions_Approach>`_.

Within reliability, the heuristic constant "a" is accepted for all the probability plots as well as in the `Nonparametric.RankAdjustment <https://reliability.readthedocs.io/en/latest/Rank%20Adjustment.html>`_ method. The median ranks method is generally the default for most software (including in Reliasoft and MINITAB).

Transformations for PDF, SF, HF, CHF
""""""""""""""""""""""""""""""""""""

The algorithms described above provide the empirical estimate of the CDF.
With some simple `transformations <https://reliability.readthedocs.io/en/latest/Equations%20of%20supported%20distributions.html#relationships-between-the-five-functions>`_ it is possible to obtain the empirical estimate of the SF and CHF.
Less commonly (but still mathematically possible) we can obtain the empirical estimate of the PDF and HF.
As you can see in the image below, the PDF and HF do not form smooth curves due to the need to take the derivative of a non-continuous function.
The following example illustrates how plot_points can be used to generate a scatterplot of the plotting positions for any of the five functions.
The Weibull distribution used to generate the data is also overlayed for comparison.

.. code:: python
from reliability.Distributions import Weibull_Distribution
from reliability.Probability_plotting import plot_points
import matplotlib.pyplot as plt
dist = Weibull_Distribution(alpha=100,beta=2)
data = dist.random_samples(1000,seed=1)
functions = ['PDF','CDF','SF','HF','CHF']
i = 0
for function in functions:
plt.subplot(151+i)
if function == 'PDF':
dist.PDF()
elif function == 'CDF':
dist.CDF()
elif function == 'SF':
dist.SF()
elif function == 'HF':
dist.HF()
elif function == 'CHF':
dist.CHF()
plot_points(failures=data,func=function)
plt.title(function)
i+=1
plt.gcf().set_size_inches(12,4)
plt.tight_layout()
plt.show()
.. image:: images/plotting_positions_5.png

Binary file added docs/images/plotting_positions_1.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/plotting_positions_2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/plotting_positions_3.PNG
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/plotting_positions_4.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/plotting_positions_5.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion reliability/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from reliability import Convert_data

__title__ = 'reliability'
__version__ = "0.7.0"
__version__ = "0.8.0"
__description__ = 'A Python library for reliability engineering'
__url__ = 'https://reliability.readthedocs.io/en/latest/index.html'
__author__ = 'Matthew Reid'
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

setuptools.setup(
name="reliability",
version="0.7.0",
version="0.8.0",
description="Reliability Engineering toolkit for Python",
author="Matthew Reid",
author_email="alpha.reliability@gmail.com",
Expand Down

0 comments on commit 6c6a5b5

Please sign in to comment.