Replaced some unique() calls in power_prune() with mean() #556

phageghost · 2024-08-29T22:19:16Z

This ensures that iv.se will be a scalar not a vector even when the outcome.samplesize varies within outcome subsets.

The current behavior assumes that the samplesize is the same within each outcome subset, and that unique will therefore return a scalar. When this assumption is violated, it returns a vector of samplesize values that is likely to differ in length from the total size of the subset and will therefore produce a size mismatch when assigning it as a column in that data.frame.

Removing the unique() call and allowing iv.se to be a vector of the size of the outcome subset will not produce an error but it will cause the outcome subsets to be selected according to the largest samplesize among any of the IV SNPs in that subset, which is not the desired behavior.

Using the mean() of the samplesize ensures that iv.se is the same for every SNP in the outcome subset (and that the selection logic will therefore work properly) while taking into account all the SNPs in the subset.

…calar iv.se values

phageghost · 2024-08-29T22:27:58Z

I also applied this reasoning to the n.cas and n.con variables, but have not tested that aspect since I am dealing with a continuous outcome. Might want to check into the "method 1" approach as well, didn't test that either.

mightyphil2000

Thanks phageghost! This looks like a good improvement. Thanks for flagging and fixing.

codecov · 2024-09-02T12:06:14Z

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 38.70%. Comparing base (7976c28) to head (d7ecf52).
Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
R/format_mr_results2.R	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #556       +/-   ##
===========================================
- Coverage   48.81%   38.70%   -10.12%     
===========================================
  Files          31       31               
  Lines        5248     5247        -1     
===========================================
- Hits         2562     2031      -531     
- Misses       2686     3216      +530

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

remlapmot · 2024-09-02T15:18:29Z

R/format_mr_results2.R

@@ -341,14 +339,15 @@ power_prune <- function(dat,method=1,dist.outcome="binary")
 				r2sum<-sum(r2) # sum of the r-squares for each SNP in the instrument
 				# F<-r2sum*(n-1-k)/((1-r2sum*k )
 				if(dist.outcome == "continuous"){
-					iv.se<- 1/sqrt(unique(dat2$samplesize.outcome)*r2sum) #standard error of the IV should be proportional to this
+					iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) #standard error of the IV should be proportional to this


It would feel a bit safer to me if you could you add in na.rm = TRUE to the mean() call, i.e.,

iv.se<- 1/sqrt(mean(dat2$samplesize.outcome, na.rm = TRUE)*r2sum)

remlapmot · 2024-09-02T15:20:01Z

R/format_mr_results2.R

 					if(any(is.na(n.cas)) || any(is.na(n.con))) {
 						warning("dist.outcome set to binary but number of cases or controls is missing. Will try using total sample size instead but power pruning will be less accurate")
-						iv.se<- 1/sqrt(unique(dat2$samplesize.outcome)*r2sum) 
-					}
+						iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) 


Again I would add the na.rm = TRUE, also I wonder if we should round as well, i.e.,

iv.se<- 1/sqrt(round(mean(dat2$samplesize.outcome, na.rm = TRUE))*r2sum)

remlapmot · 2024-09-02T15:21:13Z

R/format_mr_results2.R

-					}
+						iv.se<- 1/sqrt(mean(dat2$samplesize.outcome)*r2sum) 
+					} else {
+                    	iv.se<-1/sqrt(mean(n.cas)*mean(n.con)*r2sum) #standard error of the IV should be proportional to this


And same here, I would amend to

iv.se<-1/sqrt(round(mean(n.cas, na.rm = TRUE))*round(mean(n.con, na.rm = TRUE))*r2sum)

thanks Tom! I believe I've accepted those changes now. Did it work?

It didn't work Philip - I think you probably ticked check boxes to say you'd viewed the comment (or something like) - so I have just made a new release with the na.rm = TRUEs in.

remlapmot · 2024-09-02T15:23:52Z

Thanks for this.

Don't worry about the codecov failures (the overall codecov % just depends upon how busy the OpenGWAS server is).

If you could make my suggested amends I think the code will be a bit safer.

Replaced some unique() calls in power_prune() with mean() to ensure s…

0a0c75f

…calar iv.se values

phageghost mentioned this pull request Aug 29, 2024

[BUG]: function power_prune() does not behave correctly when SNPs in a given outcome-study subset have varying sample sizes #557

Open

Check for N/As before taking mean of n.cas and n.con

d7ecf52

mightyphil2000 approved these changes Sep 2, 2024

View reviewed changes

remlapmot reviewed Sep 2, 2024

View reviewed changes

mightyphil2000 approved these changes Sep 2, 2024

View reviewed changes

mightyphil2000 merged commit d70ec00 into MRCIEU:master Sep 3, 2024
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replaced some unique() calls in power_prune() with mean() #556

Replaced some unique() calls in power_prune() with mean() #556

phageghost commented Aug 29, 2024

phageghost commented Aug 29, 2024 •

edited

Loading

mightyphil2000 left a comment

codecov bot commented Sep 2, 2024 •

edited

Loading

remlapmot Sep 2, 2024

remlapmot Sep 2, 2024

remlapmot Sep 2, 2024

mightyphil2000 Sep 2, 2024

remlapmot Sep 6, 2024

remlapmot commented Sep 2, 2024

Replaced some unique() calls in power_prune() with mean() #556

Replaced some unique() calls in power_prune() with mean() #556

Conversation

phageghost commented Aug 29, 2024

phageghost commented Aug 29, 2024 • edited Loading

mightyphil2000 left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 2, 2024 • edited Loading

Codecov Report

remlapmot Sep 2, 2024

Choose a reason for hiding this comment

remlapmot Sep 2, 2024

Choose a reason for hiding this comment

remlapmot Sep 2, 2024

Choose a reason for hiding this comment

mightyphil2000 Sep 2, 2024

Choose a reason for hiding this comment

remlapmot Sep 6, 2024

Choose a reason for hiding this comment

remlapmot commented Sep 2, 2024

phageghost commented Aug 29, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading