-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced some unique() calls in power_prune() with mean() #556
Replaced some unique() calls in power_prune() with mean() #556
Conversation
…calar iv.se values
I also applied this reasoning to the n.cas and n.con variables, but have not tested that aspect since I am dealing with a continuous outcome. Might want to check into the "method 1" approach as well, didn't test that either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks phageghost! This looks like a good improvement. Thanks for flagging and fixing.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #556 +/- ##
===========================================
- Coverage 48.81% 38.70% -10.12%
===========================================
Files 31 31
Lines 5248 5247 -1
===========================================
- Hits 2562 2031 -531
- Misses 2686 3216 +530 ☔ View full report in Codecov by Sentry. |
@@ -341,14 +339,15 @@ power_prune <- function(dat,method=1,dist.outcome="binary") | |||
r2sum<-sum(r2) # sum of the r-squares for each SNP in the instrument | |||
# F<-r2sum*(n-1-k)/((1-r2sum*k ) | |||
if(dist.outcome == "continuous"){ | |||
iv.se<- 1/sqrt(unique(dat2$samplesize.outcome)*r2sum) #standard error of the IV should be proportional to this | |||
iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) #standard error of the IV should be proportional to this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would feel a bit safer to me if you could you add in na.rm = TRUE
to the mean()
call, i.e.,
iv.se<- 1/sqrt(mean(dat2$samplesize.outcome, na.rm = TRUE)*r2sum)
if(any(is.na(n.cas)) || any(is.na(n.con))) { | ||
warning("dist.outcome set to binary but number of cases or controls is missing. Will try using total sample size instead but power pruning will be less accurate") | ||
iv.se<- 1/sqrt(unique(dat2$samplesize.outcome)*r2sum) | ||
} | ||
iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I would add the na.rm = TRUE
, also I wonder if we should round as well, i.e.,
iv.se<- 1/sqrt(round(mean(dat2$samplesize.outcome, na.rm = TRUE))*r2sum)
} | ||
iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) | ||
} else { | ||
iv.se<-1/sqrt(mean(n.cas)*mean(n.con)*r2sum) #standard error of the IV should be proportional to this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same here, I would amend to
iv.se<-1/sqrt(round(mean(n.cas, na.rm = TRUE))*round(mean(n.con, na.rm = TRUE))*r2sum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Tom! I believe I've accepted those changes now. Did it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work Philip - I think you probably ticked check boxes to say you'd viewed the comment (or something like) - so I have just made a new release with the na.rm = TRUE
s in.
Thanks for this. Don't worry about the codecov failures (the overall codecov % just depends upon how busy the OpenGWAS server is). If you could make my suggested amends I think the code will be a bit safer. |
This ensures that iv.se will be a scalar not a vector even when the outcome.samplesize varies within outcome subsets.
The current behavior assumes that the samplesize is the same within each outcome subset, and that unique will therefore return a scalar. When this assumption is violated, it returns a vector of samplesize values that is likely to differ in length from the total size of the subset and will therefore produce a size mismatch when assigning it as a column in that data.frame.
Removing the unique() call and allowing iv.se to be a vector of the size of the outcome subset will not produce an error but it will cause the outcome subsets to be selected according to the largest samplesize among any of the IV SNPs in that subset, which is not the desired behavior.
Using the mean() of the samplesize ensures that iv.se is the same for every SNP in the outcome subset (and that the selection logic will therefore work properly) while taking into account all the SNPs in the subset.