Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full vs aggregated data #14

Open
dendesfelder opened this issue Feb 14, 2020 · 1 comment
Open

full vs aggregated data #14

dendesfelder opened this issue Feb 14, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@dendesfelder
Copy link

Bug report/feedback

  • Module: Fitting

Describe what issue or problem you are experiencing on the application.

In which situation does biodosetools use the full data (i.e. every cell is one data point) or the aggregated data. My impression is, that the aggregated data is used if glm(...) doesn't ptoduce an error. Otherwise the software switches to the get_fit_maxlik_method(...) function and the full data is used. I think for both methods it should be consistent whether the aggregated or the full model is used. I noticed recently, that the full model can cause some substantial underestimation of the uncertainty if there are conditional dependencies between the observations. So, for the moment I would suggest that we stick to the aggregated data for the glm as well as for the maxLik case. This part needs some thorough thinking.

In addition, I think that weighting by 1/disp is only performed for the glm(...) but not for the get_fit_maxlik_method(...). Maybe this should be consistent, too? Do we actually need the weights?

A nice alternative for the rather complicated code of the constraint ML optimization in get_fit_maxlik_method(...) could be the package addreg which is designed for Poisson regressions with identity link. I also have the feeling that this package is more robust than our current implementation.

Please attach an image if it helps to visualize the problem.

@aldomann aldomann added this to the 3.4.0 milestone Oct 7, 2020
@aldomann aldomann added the enhancement New feature or request label Oct 11, 2020
aldomann added a commit that referenced this issue Oct 14, 2020
@jorgeegm
Copy link

My opinion is that as a general approach aggregated data is safer and sensitive to detect glm.disp , the use of 1/disp can mask the effect of sampling a poisson distribution detected by the glm.disp , however this modification change past published coeficients , more effort is needed to test the effect of sampling to few cells at higher doses 3 , 4 , 5 Gy . In general 60 to 80 cells for 5 Gy. I think that we need more influence of theses doses 500 to 300 cells scored to obtain a mean to get more influence on the beta parameter reducing the sampling error and varations between labs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants