full vs aggregated data #14

dendesfelder · 2020-02-14T11:48:45Z

Bug report/feedback

Module: Fitting

Describe what issue or problem you are experiencing on the application.

In which situation does biodosetools use the full data (i.e. every cell is one data point) or the aggregated data. My impression is, that the aggregated data is used if glm(...) doesn't ptoduce an error. Otherwise the software switches to the get_fit_maxlik_method(...) function and the full data is used. I think for both methods it should be consistent whether the aggregated or the full model is used. I noticed recently, that the full model can cause some substantial underestimation of the uncertainty if there are conditional dependencies between the observations. So, for the moment I would suggest that we stick to the aggregated data for the glm as well as for the maxLik case. This part needs some thorough thinking.

In addition, I think that weighting by 1/disp is only performed for the glm(...) but not for the get_fit_maxlik_method(...). Maybe this should be consistent, too? Do we actually need the weights?

A nice alternative for the rather complicated code of the constraint ML optimization in get_fit_maxlik_method(...) could be the package addreg which is designed for Poisson regressions with identity link. I also have the feeling that this package is more robust than our current implementation.

Please attach an image if it helps to visualize the problem.

…, addresses part of #14 as well

jorgeegm · 2021-03-11T13:50:14Z

My opinion is that as a general approach aggregated data is safer and sensitive to detect glm.disp , the use of 1/disp can mask the effect of sampling a poisson distribution detected by the glm.disp , however this modification change past published coeficients , more effort is needed to test the effect of sampling to few cells at higher doses 3 , 4 , 5 Gy . In general 60 to 80 cells for 5 Gy. I think that we need more influence of theses doses 500 to 300 cells scored to obtain a mean to get more influence on the beta parameter reducing the sampling error and varations between labs.

aldomann added this to the 3.4.0 milestone Oct 7, 2020

aldomann added the enhancement New feature or request label Oct 11, 2020

aldomann mentioned this issue Oct 12, 2020

GLM quasi-Poisson doesn't detect overdispersed data #20

Closed

aldomann added a commit that referenced this issue Oct 14, 2020

Stopped using weights in fitting algorithms (glm and glm.nb). Fixes #20…

98e5aff

…, addresses part of #14 as well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

full vs aggregated data #14

full vs aggregated data #14

dendesfelder commented Feb 14, 2020

jorgeegm commented Mar 11, 2021

full vs aggregated data #14

full vs aggregated data #14

Comments

dendesfelder commented Feb 14, 2020

Bug report/feedback

jorgeegm commented Mar 11, 2021