Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit bias correction (or not) in log retransformation, composite correction #70

Open
appling opened this issue Jan 1, 2016 · 1 comment
Projects

Comments

@appling
Copy link
Contributor

appling commented Jan 1, 2016

Jack Lewis writes:

When used with log-transformed responses in regression models, I apply the composite method in log-space, summing the regression predictions and interpolated residuals before retransforming back to original units. This approach reproduces the original concentrations exactly. On pg 11, your paper says residual-corrected predictions are "retransformed with the same method as for linear regressions if the residuals correction is done in log space". If I understand what you are doing, you are a applying a bias-correction with the retransformation of the residual-corrected prediction. In the typical situation where your interpolation data set is the same as your regression data set, the residual corrected predictions are exactly equal to the observed log-transformed concentrations, so applying a bias-correction will overpredict the original concentrations. Am I missing something?

I (Alison) write:

Shoot - that line of text is indeed unclear and/or incorrect. Looking at the code right now I see that I only apply a bias correction in making the original regression predictions. In the residual correction step, if log-transformation is selected, I log the predictions, do the residual correction, and then retransform with a simple exp() and no bias correction.

Jack Lewis writes:

If loadComp takes the predictions back to log-space, the bias-correction should be un-done; otherwise the predictions will be biased in log-space and you won't be working with the original residuals. Practically speaking this may not make much difference because the residuals will be altered in such a way that you get the original observations back, but it could affect some types of interpolation. In general, bias-correction is required when we take the anti-log of a mean, E[log(y)|x], because it is not equal to the mean of the anti-logs. With the composite method applied in log space, we're not taking the anti-log of a mean; we're adding the residual back first (which ensures that we get the original observations in both log space and arithmetic space), so the theory doesn't apply. It seems that a retransformation bias-correction may not be needed at all with the composite method.

I think Jack and I are in agreement that no bias correction is needed in the composite correction phase, but I am now less sure whether bias correction has a role in the regression prediction phase. On the one hand, I now share Jack's concern that the predictions are biased in log space during the composite correction. On the other hand,

  • to not bias-correct the regression predictions would make them biased in linear space and inconsistent with what I think of as the true predictions of the regression model
  • though it seems odd not to be working with the original regression residuals, I can't think of any reason why we should have to stick with those original residuals. After all, the composite method correction is an intentional deviation from the regression model, and we could make any number of other arbitrary decisions - we could do the correction in arithmetic space, or using a smoothing spline, etc. So maybe it's not a problem that we've abandoned the regression residuals, though we might want to start talking about them a bit differently.
  • the currently implemented method already ensures that we get the original observations in both log and arithmetic space, which seems to achieve the fundamental goal of the composite method. Maybe it's a failure of my imagination, but I cannot yet think of a type of interpolation for which this would not be what we want.

I'll need to think about this more. In the meantime, further thoughts from anybody are welcome here.

@jacaronda
Copy link

Alison regarding your comment: "to not bias-correct the regression predictions would make them biased in linear space and inconsistent with what I think of as the true predictions of the regression model". For an interpolation in linear space, I believe the residuals-correction obviates the need for the retransformation correction. After all, the goal is to predict the actual observations, not the mean of y for a given x. If the residuals-correction is done in log space (which I suspect is generally better), again we are no longer estimating the mean of the theoretical distribution of y for a given x. We are trying to predict the actual y, so the theory of retransformation bias does not apply. Note: for the values in between the observed y, it makes a big difference whether the interpolation is done in log or arithmetic space. See attachment. Do we have examples where interpolation in linear space gives more realistic results than interpolation in log space?
Composite method with log-transformed regression v2.docx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
maintenance
Could Do
Development

No branches or pull requests

2 participants