-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-parametric distributions #29
Conversation
Thanks @cmaumet. Quick comment on discrete/continuous: A nonparametric distribution can be discrete or continuous. Can there be 3 terms? "Discrete nonparametric" and "continuous nonparametric", and then "nonparametric" the parent of these two? In my experience many nonparametric test procedures are agnostic to whether data are continuous or discrete (though there are a class of methods that work exclusively with discrete, i.e. categorical data). |
@cmaumet @nicholst : these terms could be added but we need to watch out for one thing when developing STATO, namely, avoiding asserted multiple parent hierarchy. Finally, @cmaumet , we need to get back to you regarding the issue of identifiers and possibly agree on a process as diff on owl files can be painful. Many thanks for the input! |
About On "multiple parent hierarchy" and "axioms", this is where I'm beyond my On Sun, Nov 23, 2014 at 9:35 PM, Philippe Rocca-Serra <
Thomas Nichols, PhD Web: http://warwick.ac.uk/tenichols |
I realise I didn't respond to @proccaserra's comment on symmetry. Indeed, a distribution with zero skew is symmetric. About asserting multiple parent hierarchies... does this mean that from a bucket of concepts
|
@nicholst, this is indeed what we are looking at. Also, we were looking at the way these 'non parametric distribution' differ from the 'parametric ones' and it seems that we could model this by these distributions are computed/estimated from the data (all data, binned data,censored data, kernel) whereas the parametric ones are not. |
@proccaserra... hmmm, well, in practice, parametric distributions are also estimated from the data, it's just there are parameters that define the distribution. That is, there is no one "Gaussian" distribution, there are an infinite number conveniently indexed by just two values, mean and variance. But I don't want to turn this into a theoretical counting-the-number-of-angels-on-the-head-of-a-pin exercise. I see two reasons to represent distributions in STATO: Model assumptions and test statistic sampling distributions. Every statistical procedure makes some some sort of assumptions on the data. These assumptions take the form of an assertion that the data follow a given distribution (and also about the dependency--or lack there of--of multiple observations; but that's a different issue). If that distribution can be described by a finite number of values, we call it "parametric", if an uncountable or infinite number of values are needed to describe the distribution we call it "nonparametric". Most models assume Gaussianity; models for count data typically assume Poisson, Binomial or Negative Binomial distributions. A "Hypothesis Test" procedure produces test statistic. That test statistic typically follows one of a small number of named distributions, like standard Normal (aka Gaussian) (which, for once, is just one, single, entity, no parameters... mean 0, variance 1), or a t distribution (which, does have a special type of parameter, the "degrees of freedom"). A model makes assumptions on the data; when the data assumptions are satisfied, then we can trust that the test statistic produced will follow the usual test statistic distribution. But the data and test statistic's distributions are different. E.g. a two-sample t-test assumes the data are Normally distributed, independent and have a common variance; given those assumptions, the test statistic it produces will follow a Student's T distribution with n1+n2-2 degrees-of-freedom. A nonparametric two-sample permutation test assumes the data are independent and identically distributed from some (i.e. "nonparametric" ) distribution. The test statistic created has no particular sampling distribution, and thus is also nonparametric. Does this clarify or only muddle? |
@nicholst thanks for all the explanations! It does clarify, thanks. So, I will merge this PR now, but I'll change the two the term I will also assign the STATO identifiers next. |
Non-parametric distributions
One more point! @proccaserra suggests to keep @cmaumet @nicholst it would be great to discuss further about your use cases for these terms So, I will assign a STATO ID for |
Yup, this makes sense. |
This is a proposal to add two terms to describe non-parametric distributions:
non-parametric distribution
: "Probability distribution estimated empirically on the data without assumptions on the shape of the probability distribution."non-parametric symmetric distribution
: "Probability distribution estimated empirically on the data assuming only symmetry of the probability distribution."This proposal was made up with @nicholst. Definitions were also discussed with @khelm at incf-nidash/nidm-specs#191.
A few comments:
discrete probability distribution
orcontinuous probability distribution
. The estimated non-parametric distribution is discrete but there is no assumption on the discreetness of the underlying true distribution. @nicholst: would you like to comment on this?STAT_XXXX
andSTATO_YYYY
. I am happy to update these if you can let me know how to proceed (@agbeltran, @proccaserra)?Any comment is very welcome! Thank you.