allow to pass weights to varbvs() #18

timflutre · 2017-11-07T15:13:05Z

Hi Peter,

In practice, it is frequent to perform a two-stage analysis:

get one BLUP per individual via a somewhat sophisticated mixed model (e.g. to account for spatial and/or temporal covariance in errors);
use them in a regression on marker genotypes to find QTLs.

However, some individuals may have more records than others (due to missing data or to a specific experimental design). This is reflected in the variance of the BLUPs. One then often want to use thess variances in stage 2 as weights (see the option in stats::lm and lme4::lmer for instance; it is also available in GEMMA).

Could a weights option be implemented in varbvs?

Thanks in advance,
Tim

The text was updated successfully, but these errors were encountered:

pcarbo · 2017-11-07T15:36:53Z

@timflutre Yes, I think this is possible, although I would have to redo most of the derivations, so it isn't trivial. Is this motivated by a specific project you are working on?

timflutre · 2017-11-07T16:04:25Z

Basically everybody around me is doing such two-stage analyses, so the literature is quite large. Moreover, here is a recent preprint investigating around this. We can easily understand why it is a common scenario: fitting mixed model with "sophisticated" error covariances in addition to a large (huge) number of genetic markers as predictors is impossible with current softwares, especially when one want to perform variable selection on the genetic markers, hence the analysis being done in two steps.

I thought it would be as simple as WLS (weighted least squares) compare to simple LS, given that the weights are known. But I trust you on assessing the impact on varbvs much better than me!

pcarbo · 2017-11-07T16:56:55Z

@timflutre Thanks for the background. It isn't conceptually or technically difficult to do, but would involve reworking a substantial portion of the code. I will keep this mind and return to it when I have a chance.

timflutre · 2017-11-07T16:59:54Z

Ok, no worries, thanks!

aksarkar · 2017-11-29T21:48:47Z

@timflutre If the weights are w_1, ..., w_n, I think you can premultiply X (markers) and Y (BLUP) by diag(sqrt(w_1), ..., sqrt(w_n)) and apply varbvs on the transformed data (based on properties of GLS)

pcarbo · 2017-11-29T22:02:41Z

@aksarkar Thanks for this suggestion. So perhaps it is simpler to implement than I thought. Do you happen to know if the likelihood is correct after this transformation? I haven't worked through the math, but it seems like there may be an additional factor that needs to be included in the likelihood after the transformation.

aksarkar · 2017-12-05T20:52:19Z

The assumption in GLS is y ~ N(X beta, Sigma). Letting LL* = Sigma, L^-1 y ~ N(L^-1 X beta, I). So the likelihood should be correct.

The assumption in WLS is that Sigma is diag(w_1, ..., w_n), so L is diag(sqrt(w_1), ..., sqrt(w_n)). So my suggestion above is not quite correct (take the reciprocal of the diagonal entries).

pcarbo · 2017-12-06T18:18:51Z

@aksarkar Do you need this option as well?

pcarbo · 2018-01-31T02:52:20Z

@timflutre @aksarkar varbvs version 2.5-2 now has a weights option (for family = "gaussian" only), as well as a resid.vcov option to allow for residuals with a specified covariance matrix (e.g., a kinship matrix).

timflutre · 2018-02-01T08:26:33Z

Thanks @pcarbo !

However, when I launch varbvs with weights, it looks like the summary function doesn't report the proportion of variance explained anymore. Is it on purpose?

pcarbo · 2018-02-01T13:58:48Z

@timflutre Yes, I haven't yet updated the PVE calculations to allow for weighted samples. Do you need the PVE estimates?

timflutre · 2018-02-01T14:20:27Z

That would be great! I would expect the PVE to increase when weights are used. It is also interesting to look at the PVE to get a sense of whether or not the SNPs that are available are numerous enough and well distributed to tag all LD blocks. In my case, I am not sure of that because I only have 10k SNPs and the pairwise LD quickly becomes small for short distances. I am currently in the process of genotyping more SNPs by sequencing. So I would expect the PVE to increase when using, say, 60k SNPs compare to only 10k.

pcarbo · 2018-02-01T16:04:03Z

@timflutre Okay, I've reopened this issue and will work out the PVE calculations for weights samples.

pcarbo added the enhancement label Nov 7, 2017

pcarbo closed this as completed Jan 31, 2018

pcarbo reopened this Feb 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow to pass weights to varbvs() #18

allow to pass weights to varbvs() #18

timflutre commented Nov 7, 2017

pcarbo commented Nov 7, 2017

timflutre commented Nov 7, 2017

pcarbo commented Nov 7, 2017

timflutre commented Nov 7, 2017

aksarkar commented Nov 29, 2017

pcarbo commented Nov 29, 2017 •

edited

aksarkar commented Dec 5, 2017

pcarbo commented Dec 6, 2017

pcarbo commented Jan 31, 2018

timflutre commented Feb 1, 2018

pcarbo commented Feb 1, 2018 •

edited

timflutre commented Feb 1, 2018

pcarbo commented Feb 1, 2018

allow to pass weights to varbvs() #18

allow to pass weights to varbvs() #18

Comments

timflutre commented Nov 7, 2017

pcarbo commented Nov 7, 2017

timflutre commented Nov 7, 2017

pcarbo commented Nov 7, 2017

timflutre commented Nov 7, 2017

aksarkar commented Nov 29, 2017

pcarbo commented Nov 29, 2017 • edited

aksarkar commented Dec 5, 2017

pcarbo commented Dec 6, 2017

pcarbo commented Jan 31, 2018

timflutre commented Feb 1, 2018

pcarbo commented Feb 1, 2018 • edited

timflutre commented Feb 1, 2018

pcarbo commented Feb 1, 2018

pcarbo commented Nov 29, 2017 •

edited

pcarbo commented Feb 1, 2018 •

edited