Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contrasts #23

Closed
nicholst opened this issue May 12, 2014 · 18 comments
Closed

Contrasts #23

nicholst opened this issue May 12, 2014 · 18 comments

Comments

@nicholst
Copy link

Thanks for quickly acting on @nicholsn's request for contrast, STATO_0000290. I'm afraid, though, the definition is highly ANOVA-dependent:

A contrast is a data item which is the output of a linear combination of 2
or more factor level means with coefficients that sum to zero.

More generally, contrasts are used for tests of the General Linear Hypothesis. As detailed in the Encyclopedia of Biostatistics, all that a contrast must satisfy is an estimability constraint (which doesn't necessarily correspond to a sum-to-zero constraint; e.g. consider a contrast used to select out a single regression slope coefficient in a GLM implementing an ANCOVA).

Further, from discussions over at NI-DM we've discovered that contrast has an ambiguous definition. Wikipedia defines 3 meanings (in my words)

  • Contrast (Vector) - Fixed and known coefficients that define the weights in the weighted sum of parameter estimates.
  • Contrast (Estimate) - Actual estimate obtained by computed the weighted sum of parameter estimates using the Contrast Vector.
  • Contrast (Variable) - Random variable consisting of the weighted sum of parameter estimators (i.e. random variables corresponding to yet-to-be-observed data).

Of these, I'd call the last of dubious value (except when doing derivations), but in my experience the term "contrast" is often used ambiguously for the first two.

So! I don't know what the best action is. To exactly mimic usage in the wild, I'd say you need two terms "contrast vector" and "contrast estimate", each of which have "contrast" as a nickname/alias. But I don't know if that's allowed. (For NI-DM, we ended up with "ContrastWeights" and "ContrastMap", basically short for "ContrastEstimateMap", since we always get images/maps of estimates.)

For your consideration, here are definitions of each.

Contrast Estimate:  A linear combination of elements of a parameter 
vector based on values forming a Contrast Vector. 
Contrast Vector: A fixed and known set of values used for defining a 
Contrast Estimate.  In the context of the General Linear Model, a Contrast Vector
must be estimable, i.e., produce a uniquely determined Contrast Estimate.
@proccaserra
Copy link
Member

@nicholst, great input, thanks again for putting us back on track. So we'll need more time to first do more reading and second, have more use cases on how those terms would be used (context, query) in order to justify the additions. Do you have examples you could point us too? bw.

@nicholst
Copy link
Author

I'm not sure I understand about use cases. Do you mean examples of contrast vectors that don't sum to one?

For whatever quirks of history, the brain imaging community 'rolls their own' design matrices and contrast vectors to do linear modelling and inference. That's the only reason we're modelling it; here are some book chapters on the G(eneral)LM and contrasts that show how that community uses contrasts.

In R, people touch contrasts only through setting of the default, i.e. options(contrasts = c("contr.treatment", "contr.poly")) or with the function C to set the contrast in situ. But I just saw there is a contrast R package and it is a feature of Frank Harrell's rms. And for kicks, I just found this SPSS example on how to specify contrast vectors.

Does this help at all?

@proccaserra
Copy link
Member

@nicholst , I meant how you intend to use STATO classes covering those elements for annotation purpose. What is that you need to tag with STATO uri, what kind of SPARQL query would you currently need? Do you want your users to be able to retrieve the constrats used for their data analysis ?
Thanks.

@nicholsn
Copy link

Here's an example from NI-DM for fMRI results. For an overview diagram generated from the turtle file see this figure.

Say we want to find the location of all the maps generated from some ContrastEstimation Activity. We could use the following SPARQL query:

SELECT ?location 
FROM <https://provenance.ecs.soton.ac.uk/store/documents/2299.ttl>
WHERE {?activity a nidm:ContrastEstimation . 
    ?ContrastMap prov:wasGeneratedBy ?activity . 
    ?ContrastMap prov:atLocation ?location .}

Does that help?

@proccaserra
Copy link
Member

@nicholsn, It definitely helps! another question: do you need to distinguish between contrast types (e.g. profile,difference....)?

@nicholsn
Copy link

I'll defer to @nicholst. Tom, any thoughts here?

@proccaserra
Copy link
Member

Just to qualify further my question, would you be running queries such as " get all contrast estimation generated using 'polynomial contrast' or 'helmert constrat' (see http://stat.ethz.ch/R-manual/R-patched/library/stats/html/contrast.html ) .
Also, in the SPM software (http://www.ernohermans.com/wp-content/uploads/2011/11/spm8_startersguide.pdf) , so-called 'T-contrast' and 'F-contrast' are defined but these seems to be 'local' to the software and are in fact constrats where the coefficient sum is null (the anova case we currently are confined to). thx

@nicholst
Copy link
Author

Hi,

I can't cite other software off-hand, but I learned about the General
Linear Hypothesis from different linear model Stats texts (eg Neter,
Wasserman, Knuter; Graybill), and they give estimability (ie contrast in
the row space of the design matrix) as the condition for a valid contrast,
not sum-to-zero.

Maybe we need to distinguish between GLH contrasts and ANOVA contrasts?!!

-Tom

(Apologies for short msg from my phone)

On Wednesday, May 28, 2014, Philippe notifications@github.com wrote:

Just to qualify further my question, would you be running queries such as
" get all contrast estimation generated using 'polynomial contrast' or
'helmert constrat' (see
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/contrast.html )
.
Also, in the SPM software (
http://www.ernohermans.com/wp-content/uploads/2011/11/spm8_startersguide.pdf)
, so-called 'T-contrast' and 'F-contrast' are defined but these seems to be
'local' to the software and are in fact constrats where the coefficient sum
is null (the anova case we currently are confined to). thx


Reply to this email directly or view it on GitHubhttps://github.com//issues/23#issuecomment-44390598
.


Thomas Nichols, PhD
Principal Research Fellow, Head of Neuroimaging Statistics
Department of Statistics & Warwick Manufacturing Group
University of Warwick, Coventry CV4 7AL, United Kingdom

Web: http://warwick.ac.uk/tenichols
Email: t.e.nichols@warwick.ac.uk
Phone, Stats: +44 24761 51086, WMG: +44 24761 50752
Fax: +44 24 7652 4532

@nicholst
Copy link
Author

nicholst commented Jun 6, 2014

The last message was sent from a phone; here are the references I wanted to provide:

  • John Neter, Michael Kutner, William Wasserman, Christopher Nachtsheim, John Neter. "Applied Linear Statistical Models". McGraw-Hill/Irwin, 1996.
  • Chapter 6 in: Franklin A. Graybill. "Theory and Application of the Linear Model". Cengage Learning, 2000. (Or any of the earlier editions of this classic.)

@proccaserra
Copy link
Member

Suggestion:
how about having 'contrast' and 'contrast estimate' in STATO but 'contrast map' and 'contrast estimate map' in NI_DM.
The latter 2 classes could be defined as types of 'data item' associating of spatial coordinates (defined according to a reference) and 'contrast' or 'contrast estimate' respectively, since those maps are specific to imaging techniques.

@cmaumet
Copy link
Contributor

cmaumet commented Nov 24, 2014

@proccaserra: I like your proposal! One minor comment, could we be more precise and use contrast weights instead of contrast (cf. further discussion on the naming of this term as part of NIDM at incf-nidash/nidm-specs#36 and incf-nidash/nidm-specs#47)?

I agree we can keep Contrast Estimate Map in NIDM. But I think we would directly re-use the contrast weights term from STATO as this is also represented by a vector/matrix (i.e. not a map) in our context.

Example of nidm:ContrastWeights entity:

niiri:contrast_id a prov:Entity , nidm:ContrastWeights ;
    rdfs:label "Contrast: Listening > Rest" ;
    prov:value "[1, 0, 0]"^^xsd:string ;
    nidm:statisticType nidm:TStatistic ;
    nidm:contrastName "listening > rest"^^xsd:string .

@nicholst
Copy link
Author

Since there is ambiguity about what exact a "contrast" is, I do like @cmaumet's suggestion to use contrast weights instead of just contrasts.

@proccaserra
Copy link
Member

@cmaumet @nicholst : one last request. I have now added a class in stato (STATO_0000322) with the following label 'contrast weight' (singular). I just want to double check with you that when you use 'contrast weights' (plural), you refer to a 'contrast weight vector' (a set of contrast weights that define a contrast).
(explanation: we only allow for class labels to be in the singular form).
I'll add an alternative term 'contrast weights' though.
If fine by you, i would then define as follows: 'contrast has_part some ('contrast weight vector' and mean)
then I would have a 'contrast estimate' as a child of 'model parameter estimate' about some 'contrast weight', unless you prefer 'contrast estimate vector

Finally, as an example of usage, I could import the NI-DM 'contrast map' class to show how things are linked between spatial coordinates and 'contrast'

@cmaumet
Copy link
Contributor

cmaumet commented Dec 4, 2014

@proccaserra: thanks!

Yes, that's right: we use 'contrast weights' to refer to a 'contrast weight vector'. However, we decided to avoid the term "vector" because sometimes a matrix of weights is needed (e.g. for F-tests, more discussion at incf-nidash/nidm-specs/issues/36). So, to avoid using a plural maybe 'contrast weight matrix' would be more general?

It would be great if you link to nidm:ContrastMap (which is really a ContrastEstimateMap). Right now the link with spatial coordinates is only done through the attribute nidm:inCoordinateSpace.

@nicholst
Copy link
Author

nicholst commented Dec 4, 2014

@proccaserra, contrast weight does sound odd, as it seems to refer to a single element of a contrast vector/matrix; do we want contrast weight vector & contrast weight matrix as children of contrast weight, or is that just getting too complicated?

@proccaserra
Copy link
Member

@nicholst, in fact 'contrast weight' is declared with a synonym: 'contrast coefficient', and it indeed does refer to a single element associated to a specific mean in a contrast vector. As per the definition of 'contrast weight', 'contrast weight vector | matrix' can not be children as both refer would be referring to sets of the parent class. Maybe the one remaining crease is the following: make contrast matrix a parent of 'contrast weight vector' where a vector is defined as a matrix [1,n] ? This granularity level may not be needed unless you'd like to link to specific data structures (Array, lists) in the definitions

@nicholst
Copy link
Author

Hi @proccaserra, sorry to revisit this, but the contrast weight matrix (STATO_0000323 in dev branch) has attracted some scrutiny over in incf-nidash/nidm-specs#305 .

Firstly the use of singular generates grammatical number errors in the definition; I know there are class label rules that cause this, but there must be some way to work around it to get a English-readable definition.

Secondly, as mentioned in the discussion that kicked off this issue (#23), contrasts have a more general definition that given in the wikipedia entry. In particular, contrasts don't have to sum to zero and they need not weight "means". In the setting of the General Linear Hypothesis, contrasts are simply weighting parameter estimates.

So, presently the definition is

  • a contrast weight matrix is a information content entity which holds a set of contrast weight, coefficient used in a weighting sum of means defining a contrast

I'd like to propose (changes in bold)

  • a contrast weight matrix is a information content entity which holds a set of contrast weights, the coefficients used in a weighteding sum of means parameter estimates that definesing a contrast

This of course implies that contrast http://purl.obolibrary.org/obo/STATO_0000290 also needs a tweak. Currently it is

  • A contrast is the weighted sum of group means, the c_j coefficients represent the assigned weights of the means (these must sum to 0 for orthogonal contrasts)

and wants to be

  • A contrast is the weighted sum of parameter estimates group means, the c_j coefficients represent the assigned weights of the parameter estimatesmeans (these must sum to 0 for orthogonal contrasts)

I'm not sure where the orthogonal bit came from; a pair of contrasts is orthogonal if their inner product is zero, but that's a rather minor point. Also, I've dropped the "c_j" since that isn't explained anywhere (what's c? what's j?).

If you're OK with this I'll get @cmaumet's help to make a PR.

@jbpoline
Copy link

A small suggestion for contrast weight matrix to make it more general, not only for ANOVAs, and also to be able to refer to contrast of the true parameters defining the null (not only the contrast of the estimates). This follows Christensen's "Plane answers ...".

  • a contrast weight matrix is an information content entity which holds a set of contrast weights, the coefficients used in a weighted sum of means parameters or parameter estimates that defines a contrast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants