Skip to content

A tool to investigate statistical distribution of univariate observations. Plots skewness and kurtosis (together with bootsrapped observations) on a Pearson diagram, and compares to common statsitical distributions.

SchildCode/PearsonPlot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 

Repository files navigation

PearsonPlot

A tool to investigate statistical distribution of univariate observations. It plots the skewness-vs-kurtosis of your data (together with bootsrapped samples) on a diagram for comparison with common statsitical distributions. Use it to investigate the possible statistical model describing your data. Three variants of the plot can be generated, depending on the range of skewness and kurtosis in your data.

The workbook is a further development of Karl Pearson's diagram (www.wikipedia.org/wiki/Pearson_distribution), which was revived by Cullen & Frey. This workbook vastly improves upon the "Cullen and Frey plot" in R by comparing the observations with many more common distibutions, ones with either fixed skewness & kurtosis (i.e. shown as a point in the chart, e.g. Normal, Exponential) or varying skewness & kurtosis (i.e. shown as a line in the chart, e.g. Log-normal, Weibull).

A note of caution

  • Skewness and kurtosis statistics are very dependent on the sample size. Even hundreds of observations do not give a reliable estimate of the true population kurtosis and skewness. Therefore, the proximity of the observation skewness and kurtosis (the black circle on the diagrams below) to one of the standard distributions in the diagrams, is only indicative, and might be coincidental. This issue is remediated by the bootstrapped samples (the red dots in the diagrams), which show the possible area of the true population kurtosis and skewness. The 1500 bootstrapped-sample dots indicate roughly the 99.9% confidence bounds of the polulation distribution.
  • Many statistical distibutions with heavy tails (e.g. Cauchy, Lévy, Student's t with v<2 degrees of freedom) have undefined skewness or kurtosis, irrespective of the number of observations. This means that individual random extreme values can throw the skewness or kurtosis of the whole data set. Such "undefined" skewness or kurtosis can be seen as a vertical or horizontal spread of bootstrapped observations along the border in Chart 3.
  • When you are aware of these limitations, these diagrams are very useful. It has for example helped me discover that thermal conductivity of mineral-wool insulation (which I initially assumed to be log-normal) is actually log-logistic, as it is a mixture of two components (air and fibres), not a homogeneous material.

User-instructions

This workbook contains a Visual Basic for Applications (VBA) macro to analyze the statistical distribution of your data, and to generate plots, so you have to activate the macros first time you open the workbook.

  • STEP 1: Paste your observations into column A of sheet "InputData". There is no limit to the number of values.
  • STEP 2: The data is analyzed automatically and plotted when you click on the tab for one of the plots (Chart1, Chart2 or Chart3). The sample skewness and kurtosis of your data is plotted as a black circle, and bootstrapped values are plotted with small red dots, to show the range of possible values of the polulation skewness and kurtosis.

Output options

Chart 1
Chart 1: Distributions with positive skew in the range 0 to +2 (plotted squared), and excess kurtosis up to +6.
(The example observations plotted on this chart are normally-distributed).


Chart 2
Chart 2: Distributions with skewness in the range +3 to +3, and excess kurtosis up to +8.
(The example observations plotted on this chart are from a negative-exponential distribution)


Chart 3
Chart 3: This chart can show all valid values of skewness (±∞) and excess kurtosis (-3 to +∞). Both parameters are squashed into the range ±1. Skewness-squared is subtracted from kurtosis before squashing, making distributions with kurtosis ∝ skewness² into horizontal lines (e.g. Poisson, Beta with α→0), and the "impossible region" becomes bounded below a horizontal line (squashed excess kurtosis -0.5), and so can be omitted from the chart.
(The example observations plotted on this chart are from a power-law distribution, which rarely occurs in the physical world, and coincidentally have the same kurtosis as the discrete Poisson distrubution)

Licence

GPL3

Author and copyright

peter.schild@oslomet.no

About

A tool to investigate statistical distribution of univariate observations. Plots skewness and kurtosis (together with bootsrapped observations) on a Pearson diagram, and compares to common statsitical distributions.

Resources

Stars

Watchers

Forks

Packages

No packages published