Non-parametric distributions #29

cmaumet · 2014-11-21T16:38:45Z

This is a proposal to add two terms to describe non-parametric distributions:

non-parametric distribution: "Probability distribution estimated empirically on the data without assumptions on the shape of the probability distribution."
non-parametric symmetric distribution: "Probability distribution estimated empirically on the data assuming only symmetry of the probability distribution."

This proposal was made up with @nicholst. Definitions were also discussed with @khelm at incf-nidash/nidm-specs#191.

A few comments:

I was unsure whether those terms should go under discrete probability distribution or continuous probability distribution. The estimated non-parametric distribution is discrete but there is no assumption on the discreetness of the underlying true distribution. @nicholst: would you like to comment on this?
I did not know how to set a new numerical identifier, so I used STAT_XXXX and STATO_YYYY. I am happy to update these if you can let me know how to proceed (@agbeltran, @proccaserra)?

Any comment is very welcome! Thank you.

nicholst · 2014-11-22T20:20:05Z

Thanks @cmaumet. Quick comment on discrete/continuous:

A nonparametric distribution can be discrete or continuous. Can there be 3 terms? "Discrete nonparametric" and "continuous nonparametric", and then "nonparametric" the parent of these two?

In my experience many nonparametric test procedures are agnostic to whether data are continuous or discrete (though there are a class of methods that work exclusively with discrete, i.e. categorical data).

proccaserra · 2014-11-23T21:35:14Z

@cmaumet @nicholst : these terms could be added but we need to watch out for one thing when developing STATO, namely, avoiding asserted multiple parent hierarchy.
Then 2 things:
-distribution symmetry: we could used the notion of 'skewness' to formally define the class and set the axiom.
-non-parametric distribution: I was consulting the following link http://reference.wolfram.com/language/guide/NonparametricStatisticalDistributions.html
for clarifying how non-parametric distributions were generated and how to define them. Is this representative of what you need?

Finally, @cmaumet , we need to get back to you regarding the issue of identifiers and possibly agree on a process as diff on owl files can be painful.

Many thanks for the input!

nicholst · 2014-11-23T23:20:01Z

About
http://reference.wolfram.com/language/guide/NonparametricStatisticalDistributions.html,
yes, that's a good description of what we're talking about.

On "multiple parent hierarchy" and "axioms", this is where I'm beyond my
comfort zone :)

On Sun, Nov 23, 2014 at 9:35 PM, Philippe Rocca-Serra <
notifications@github.com> wrote:

@cmaumet https://github.com/cmaumet @nicholst
https://github.com/nicholst : these terms could be added but we need to
watch out for one thing when developing STATO, namely, avoiding asserted
multiple parent hierarchy.
Then 2 things:
-distribution symmetry: we could used the notion of 'skewness' to formally
define the class and set the axiom.
-non-parametric distribution: I was consulting the following link
http://reference.wolfram.com/language/guide/NonparametricStatisticalDistributions.html
for clarifying how non-parametric distributions were generated and how to
define them. Is this representative of what you need?

Finally, @cmaumet https://github.com/cmaumet , we need to get back to
you regarding the issue of identifiers and possibly agree on a process as
diff on owl files can be painful.

Many thanks for the input!

—
Reply to this email directly or view it on GitHub
#29 (comment).

Thomas Nichols, PhD
Professor, Head of Neuroimaging Statistics
Department of Statistics & Warwick Manufacturing Group
University of Warwick, Coventry CV4 7AL, United Kingdom

Web: http://warwick.ac.uk/tenichols
Email: t.e.nichols@warwick.ac.uk
Phone, Stats: +44 24761 51086, WMG: +44 24761 50752
Fax: +44 24 7652 4532

nicholst · 2014-11-24T09:58:33Z

I realise I didn't respond to @proccaserra's comment on symmetry. Indeed, a distribution with zero skew is symmetric.

About asserting multiple parent hierarchies... does this mean that from a bucket of concepts

discrete nonparametric
continuous nonparametric
symmetric nonparametric
you would model this one concept "nonparametric distribution" with different attributes?

proccaserra · 2014-11-24T10:14:13Z

@nicholst, this is indeed what we are looking at. Also, we were looking at the way these 'non parametric distribution' differ from the 'parametric ones' and it seems that we could model this by these distributions are computed/estimated from the data (all data, binned data,censored data, kernel) whereas the parametric ones are not.

nicholst · 2014-11-24T10:44:30Z

@proccaserra... hmmm, well, in practice, parametric distributions are also estimated from the data, it's just there are parameters that define the distribution. That is, there is no one "Gaussian" distribution, there are an infinite number conveniently indexed by just two values, mean and variance.

But I don't want to turn this into a theoretical counting-the-number-of-angels-on-the-head-of-a-pin exercise. I see two reasons to represent distributions in STATO: Model assumptions and test statistic sampling distributions.

Every statistical procedure makes some some sort of assumptions on the data. These assumptions take the form of an assertion that the data follow a given distribution (and also about the dependency--or lack there of--of multiple observations; but that's a different issue). If that distribution can be described by a finite number of values, we call it "parametric", if an uncountable or infinite number of values are needed to describe the distribution we call it "nonparametric". Most models assume Gaussianity; models for count data typically assume Poisson, Binomial or Negative Binomial distributions.

A "Hypothesis Test" procedure produces test statistic. That test statistic typically follows one of a small number of named distributions, like standard Normal (aka Gaussian) (which, for once, is just one, single, entity, no parameters... mean 0, variance 1), or a t distribution (which, does have a special type of parameter, the "degrees of freedom").

A model makes assumptions on the data; when the data assumptions are satisfied, then we can trust that the test statistic produced will follow the usual test statistic distribution. But the data and test statistic's distributions are different. E.g. a two-sample t-test assumes the data are Normally distributed, independent and have a common variance; given those assumptions, the test statistic it produces will follow a Student's T distribution with n1+n2-2 degrees-of-freedom.

A nonparametric two-sample permutation test assumes the data are independent and identically distributed from some (i.e. "nonparametric" ) distribution. The test statistic created has no particular sampling distribution, and thus is also nonparametric.

Does this clarify or only muddle?

agbeltran · 2014-11-28T11:03:41Z

@nicholst thanks for all the explanations! It does clarify, thanks.
@cmaumet about the identifiers, we are planning to set up some automatic way to assign them (e.g. URIgen service), but until then, I hope it is OK if we assign the STATO ids.

So, I will merge this PR now, but I'll change the two the term non-parametric distribution from being a child of discrete probability distribution (http://purl.obolibrary.org/obo/STATO_0000117) (as in the commit/PR) to a child of probability distribution (http://purl.obolibrary.org/obo/STATO_0000225).

I will also assign the STATO identifiers next.

Non-parametric distributions

agbeltran · 2014-11-28T11:22:00Z

One more point!

@proccaserra suggests to keep non-parametric distribution and add also symmetric distribution, but not the pre-coordinated term non-parametric symmetric distribution (as that term can be combined through the other two)

@cmaumet @nicholst it would be great to discuss further about your use cases for these terms

So, I will assign a STATO ID for non-parametric distribution only. I will keep non-parametric symmetric distribution for now (without ID) until we discuss further. I will open another item for that discussion. Thanks!

nicholst · 2014-11-28T11:54:22Z

... to keep non-parametric distribution and add also symmetric distribution, but not the pre-coordinated term non-parametric symmetric distribution.

Yup, this makes sense.

add terms for non-parametric distributions

71d30cb

agbeltran added a commit that referenced this pull request Nov 28, 2014

Merge pull request #29 from cmaumet/parametric_dist

84411b1

Non-parametric distributions

agbeltran merged commit 84411b1 into ISA-tools:dev Nov 28, 2014

agbeltran mentioned this pull request Nov 28, 2014

Non-parametric symmetric distribution #31

Closed

proccaserra added a commit that referenced this pull request Dec 4, 2014

fixing metadata closes #23, closes #29

6e83483

cmaumet mentioned this pull request Dec 8, 2014

Non-parametric statistics incf-nidash/nidm-specs#233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-parametric distributions #29

Non-parametric distributions #29

cmaumet commented Nov 21, 2014

nicholst commented Nov 22, 2014

proccaserra commented Nov 23, 2014

nicholst commented Nov 23, 2014

nicholst commented Nov 24, 2014

proccaserra commented Nov 24, 2014

nicholst commented Nov 24, 2014

agbeltran commented Nov 28, 2014

agbeltran commented Nov 28, 2014

nicholst commented Nov 28, 2014

Non-parametric distributions #29

Non-parametric distributions #29

Conversation

cmaumet commented Nov 21, 2014

nicholst commented Nov 22, 2014

proccaserra commented Nov 23, 2014

nicholst commented Nov 23, 2014

nicholst commented Nov 24, 2014

proccaserra commented Nov 24, 2014

nicholst commented Nov 24, 2014

agbeltran commented Nov 28, 2014

agbeltran commented Nov 28, 2014

nicholst commented Nov 28, 2014