Classification for analysis #222
Comments
Thanks for getting involved. Just for clarification, how is the number "95" in a measurement name related to repetition number "3"? I guess it would also make sense to discuss the mapping here, if anyone else would like to get involved @phidahl @MartaUrb @chrherold. |
For the control measurements I had names such as "co" or "ctrl", and in this case just the first measurement is identified as control and the others are none, and everything is identified as repeat 1. Let's say I have 3 donors:
Hope I was more clear now and thank you for your interest! |
@Alexandru-Emil: Thank you for participating in improving ShapeOut! We are currently in the process of planning and designing bigger changes for the user interface and the more feedback we get the better. A (semi)-automatic identification of control/treatment and repeats for the linear mixed models analysis seems like a good idea also for the "general" user. I think it should not be too strictly derived from the name though.
So my suggestion would be as follows - and I am more than happy to discuss this in terms of practicability to implement (@paulmueller ) and to use (@Alexandru-Emil ). Lets assume a dialogue box that opens when pushing a button (named e.g. "auto assign").
Which leaves the identification of the repeat. Here, no fixed phrase will be helpful so one would need to tell the dialogue at which position to look for similarities to build a repeat group. After applying this search the drop down menus should all be set and manual re-assignment is still possible. And another automatic assignment would be possible to completely reset and redo the assignments. BTW: @paulmueller, is there a reason why "repetition" ends at 9? |
No, I think this was just the initial design. |
OK, I guess we should lift that limit with the automated detection, if you think this detection can be implemented without big trouble. |
The repeatment selection could also be done with an integer spin box control. |
Spin box sounds good! |
I just talked to a collegue who faced the issue of having only 9 repetition numbers and now I'm happy to see that this issue is alreday discussed :) Thanks again to @Alexandru-Emil for participating! |
@maikherbig I am planning an extensive online documentation on readthedocs.org that will include a section "how to cite". In general, I think asking for a citation in the data output by a program is not a good idea. |
… repetition for mixed model analysis (#222)
I implemented an automated classification (with similarity analysis to determine the repeat number). Please test the installer of the development version: [EDIT] |
Thanks @paulmueller for implementing the automated classification. I have played a little bit with the new feature and I like the concept a lot. I have the following feedback, that should be discussed by more frequent users of the feature than I am.
|
|
@chrherold |
For my test data it did work so far. I think this will be very useful and time-saving.
… Am 07.08.2018 um 01:37 schrieb Maik Herbig ***@***.***>:
@chrherold <https://github.com/chrherold>
1a) If you set the state and repetition number of two experiments to the same value (f.e. two times Control No. 2) means that the data of these two experiments will be pooled. I think it is a nice feature and it should be allowed to do that.
1b) In mixed situations, lme4 approximates the missing data by maximum likelihood estimation. This means in your example that you have to skip repetition number 2 for Treatment and repetition number 3 for Control and lme4 would then estimate these data-points.
The LMM based test should preferably be used as a paired test and the experimental design should be chosen accordingly, but in principle it is possible to do an unpaired test by giving each experiment a different repetition number. Having a different repetition number for each experiment (e.g. Control goes from 1 to 3 and Treatment from 4 to 6) means lots of data-points have to be estimated, and the model will likely not converge. If the LMM does not converge, it is documented in the output .txt like that:
„convergence code: 0
unable to evaluate scaled gradient
Model failed to converge: degenerate Hessian with 1 negative eigenvalues“
To answer your questions one by one:
Do you avoid numbering of repeats and give all the measurements the same number?: No. For LMM we actually need repeats. Otherwise it is not possible to get any information about the Random error. Hence there would be an error message from lmer: “grouping factors must have > 1 sampled level”
Do you give a different number to every experiment?: This would equal an unpaired test, which is possible but often the LMM does not converge. Therefore, I suggest to think about an experimental design beforehand which permits pairing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#222 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMPYzFJirQr39jdkpz5ngpCH6hTuKO7Yks5uONMdgaJpZM4VMuPZ>.
|
@paulmueller and @maikherbig : thank you for the detailed explanations. I guess the following changes to the system might be useful:
In general it seems to work nicely for paired data - which is the purpose. So especially 2) is just cosmetics. I think it is ok to demand manual corrections if the experimental design is not ideal. |
This is now implemented. Please test again: |
I have one suggestion. It would be useful for the analysis to be able to classify in control and treatment according to the name given to each measurement, for example, to write that if the measurement name contains "co" then it should be recognized as control for the analysis. The same for repetition (example: if the measurement name contains 95, then this is repeat number 3). It would save a lot of time when working with many measurements; now you have to define everything one by one.
The text was updated successfully, but these errors were encountered: