Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effective sample size or total N? #95

Closed
nievergeltlab opened this issue Dec 1, 2017 · 3 comments
Closed

Effective sample size or total N? #95

nievergeltlab opened this issue Dec 1, 2017 · 3 comments

Comments

@nievergeltlab
Copy link

Hi,

I am working with a lot of cohort study data, which has a high ratio of controls to cases (e.g. 4:1). LDSC h2 estimates are highly dependent upon the value put in for N, which is calculated as N cases + N controls. I am fairly sure that because of the high number of controls, calculating N this way is causing my h2 to be under-estimated.

To illustrate the extent of the problem, a response to an earlier issues comment states, "For example, if the entries in you N column are half what they should be, then you will over-estimate h2 by a factor of two." .

Naturally then if the entries in the N column are twice what they should be, you will under-estimate h2 by a factor of two. With cohort data, the N will definitely be overstated, as the excess controls don't actually do much at all to SNP odds ratio estimates or their variances (recall that beyond a 3:1 control to case ratio, excess controls basically add nothing to estimate precision).

Should I use effective sample size instead of overall N? I.e. from https://www.nature.com/articles/nprot.2014.071 , Neff = 2 / (1/Ncases + 1/Ncontrols). Does choice of N also bias rg estimates?

Thanks!
Adam

@rkwalters
Copy link
Collaborator

rkwalters commented Dec 1, 2017 via email

@nievergeltlab
Copy link
Author

I see, great, thanks!

@pjordab
Copy link

pjordab commented Dec 2, 2021

Hi Raymond and users,

First of all many thanks for your excellent program and previous answers.

I'm using LDSC with the following command:

~/project/tools/ldsc/ldsc.py
--rg sumstats1, sumstats2
--ref-ld-chr /eur_w_ld_chr/
--w-ld-chr ~/project/tools/ldsc/eur_w_ld_chr/
--out sumtats1-sumstats2

In my input file I'm using the total sample size for case-control studies in the n column. I understand from the previous comments that using the flag --samp-prev will account for the proportion of cases and controls and the effective sample size. My question is:
if I haven't use this flag it is still appropiate to use the total sample size in the input file?

Many thanks!

Paloma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants