2.2.4 Data Distributions

GregLawson edited this page Sep 21, 2011 · 6 revisions

Probability distributions can be used to verify that columns have been correctly converted. Outliers should be examined for possible parsing errors.

  • Benford's Law (http://en.wikipedia.org/wiki/Benford's_law) - for detecting realistic data. What is the binary version of this? Also values of one and zero are very common.
  • Normal Distribution - error distributions, model residuals, measurement errors
  • Uniform distributions - round -off errors
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.