Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to pass weights to varbvs() #18

Open
timflutre opened this issue Nov 7, 2017 · 13 comments
Open

allow to pass weights to varbvs() #18

timflutre opened this issue Nov 7, 2017 · 13 comments

Comments

@timflutre
Copy link

Hi Peter,

In practice, it is frequent to perform a two-stage analysis:

  1. get one BLUP per individual via a somewhat sophisticated mixed model (e.g. to account for spatial and/or temporal covariance in errors);

  2. use them in a regression on marker genotypes to find QTLs.

However, some individuals may have more records than others (due to missing data or to a specific experimental design). This is reflected in the variance of the BLUPs. One then often want to use thess variances in stage 2 as weights (see the option in stats::lm and lme4::lmer for instance; it is also available in GEMMA).

Could a weights option be implemented in varbvs?

Thanks in advance,
Tim

@pcarbo
Copy link
Owner

pcarbo commented Nov 7, 2017

@timflutre Yes, I think this is possible, although I would have to redo most of the derivations, so it isn't trivial. Is this motivated by a specific project you are working on?

@timflutre
Copy link
Author

Basically everybody around me is doing such two-stage analyses, so the literature is quite large. Moreover, here is a recent preprint investigating around this. We can easily understand why it is a common scenario: fitting mixed model with "sophisticated" error covariances in addition to a large (huge) number of genetic markers as predictors is impossible with current softwares, especially when one want to perform variable selection on the genetic markers, hence the analysis being done in two steps.

I thought it would be as simple as WLS (weighted least squares) compare to simple LS, given that the weights are known. But I trust you on assessing the impact on varbvs much better than me!

@pcarbo
Copy link
Owner

pcarbo commented Nov 7, 2017

@timflutre Thanks for the background. It isn't conceptually or technically difficult to do, but would involve reworking a substantial portion of the code. I will keep this mind and return to it when I have a chance.

@timflutre
Copy link
Author

Ok, no worries, thanks!

@aksarkar
Copy link

@timflutre If the weights are w_1, ..., w_n, I think you can premultiply X (markers) and Y (BLUP) by diag(sqrt(w_1), ..., sqrt(w_n)) and apply varbvs on the transformed data (based on properties of GLS)

@pcarbo
Copy link
Owner

pcarbo commented Nov 29, 2017

@aksarkar Thanks for this suggestion. So perhaps it is simpler to implement than I thought. Do you happen to know if the likelihood is correct after this transformation? I haven't worked through the math, but it seems like there may be an additional factor that needs to be included in the likelihood after the transformation.

@aksarkar
Copy link

aksarkar commented Dec 5, 2017

The assumption in GLS is y ~ N(X beta, Sigma). Letting LL* = Sigma, L^-1 y ~ N(L^-1 X beta, I). So the likelihood should be correct.

The assumption in WLS is that Sigma is diag(w_1, ..., w_n), so L is diag(sqrt(w_1), ..., sqrt(w_n)). So my suggestion above is not quite correct (take the reciprocal of the diagonal entries).

@pcarbo
Copy link
Owner

pcarbo commented Dec 6, 2017

@aksarkar Do you need this option as well?

@pcarbo
Copy link
Owner

pcarbo commented Jan 31, 2018

@timflutre @aksarkar varbvs version 2.5-2 now has a weights option (for family = "gaussian" only), as well as a resid.vcov option to allow for residuals with a specified covariance matrix (e.g., a kinship matrix).

@pcarbo pcarbo closed this as completed Jan 31, 2018
@timflutre
Copy link
Author

Thanks @pcarbo !

However, when I launch varbvs with weights, it looks like the summary function doesn't report the proportion of variance explained anymore. Is it on purpose?

@pcarbo
Copy link
Owner

pcarbo commented Feb 1, 2018

@timflutre Yes, I haven't yet updated the PVE calculations to allow for weighted samples. Do you need the PVE estimates?

@pcarbo pcarbo reopened this Feb 1, 2018
@timflutre
Copy link
Author

That would be great! I would expect the PVE to increase when weights are used. It is also interesting to look at the PVE to get a sense of whether or not the SNPs that are available are numerous enough and well distributed to tag all LD blocks. In my case, I am not sure of that because I only have 10k SNPs and the pairwise LD quickly becomes small for short distances. I am currently in the process of genotyping more SNPs by sequencing. So I would expect the PVE to increase when using, say, 60k SNPs compare to only 10k.

@pcarbo
Copy link
Owner

pcarbo commented Feb 1, 2018

@timflutre Okay, I've reopened this issue and will work out the PVE calculations for weights samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants