Effective sample size or total N? #95

nievergeltlab · 2017-12-01T20:11:16Z

Hi,

I am working with a lot of cohort study data, which has a high ratio of controls to cases (e.g. 4:1). LDSC h2 estimates are highly dependent upon the value put in for N, which is calculated as N cases + N controls. I am fairly sure that because of the high number of controls, calculating N this way is causing my h2 to be under-estimated.

To illustrate the extent of the problem, a response to an earlier issues comment states, "For example, if the entries in you N column are half what they should be, then you will over-estimate h2 by a factor of two." .

Naturally then if the entries in the N column are twice what they should be, you will under-estimate h2 by a factor of two. With cohort data, the N will definitely be overstated, as the excess controls don't actually do much at all to SNP odds ratio estimates or their variances (recall that beyond a 3:1 control to case ratio, excess controls basically add nothing to estimate precision).

Should I use effective sample size instead of overall N? I.e. from https://www.nature.com/articles/nprot.2014.071 , Neff = 2 / (1/Ncases + 1/Ncontrols). Does choice of N also bias rg estimates?

Thanks!
Adam

rkwalters · 2017-12-01T22:11:34Z

Hi Adam, The case/control imbalance should be mostly handled by the conversion to liability scale by suppling the sample prevalence (--samp-prev) and the population prevalence (--pop-prev) when you run ldsc. See this post <https://groups.google.com/d/msg/ldsc_users/yJT-_qSh_44/MmKKJYsBAwAJ> for detail on how the effective N calculation is a component of the observed-to-liability transform. Cheers, Raymond

…

On Dec 1, 2017, at 3:11 PM, Adam X. Maihofer ***@***.***> wrote: Hi, I am working with a lot of cohort study data, which has a high ratio of controls to cases (e.g. 4:1). LDSC h2 estimates are highly dependent upon the value put in for N, which is calculated as N cases + N controls. I am fairly sure that because of the high number of controls, calculating N this way is causing my h2 to be under-estimated. To illustrate the extent of the problem, a response to an earlier issues comment states, "For example, if the entries in you N column are half what they should be, then you will over-estimate h2 by a factor of two." . Naturally then if the entries in the N column are twice what they should be, you will under-estimate h2 by a factor of two. With cohort data, the N will definitely be overstated, as the excess controls don't actually do much at all to SNP odds ratio estimates or their variances (recall that beyond a 3:1 control to case ratio, excess controls basically add nothing to estimate precision). Should I use effective sample size instead of overall N? I.e. from https://www.nature.com/articles/nprot.2014.071 <https://www.nature.com/articles/nprot.2014.071> , Neff = 2 / (1/Ncases + 1/Ncontrols). Does choice of N also bias rg estimates? Thanks! Adam — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#95>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvbhY4TAU64TYhZPXAuSipcMSTpfTks5s8F1kgaJpZM4Qy2nT>.

nievergeltlab · 2017-12-03T03:12:32Z

I see, great, thanks!

pjordab · 2021-12-02T18:55:52Z

Hi Raymond and users,

First of all many thanks for your excellent program and previous answers.

I'm using LDSC with the following command:

~/project/tools/ldsc/ldsc.py
--rg sumstats1, sumstats2
--ref-ld-chr /eur_w_ld_chr/
--w-ld-chr ~/project/tools/ldsc/eur_w_ld_chr/
--out sumtats1-sumstats2

In my input file I'm using the total sample size for case-control studies in the n column. I understand from the previous comments that using the flag --samp-prev will account for the proportion of cases and controls and the effective sample size. My question is:
if I haven't use this flag it is still appropiate to use the total sample size in the input file?

Many thanks!

Paloma

nievergeltlab closed this as completed Dec 3, 2017

char4816 mentioned this issue Aug 21, 2019

'beta' & 'SE' works but 'OR' & 'p-value' does not, how to preprocesss summary statistics (Neff) brielin/Popcorn#7

Closed

jaamarks mentioned this issue Aug 12, 2020

which N to use for large case/control imbalance data #191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effective sample size or total N? #95

Effective sample size or total N? #95

nievergeltlab commented Dec 1, 2017

rkwalters commented Dec 1, 2017 via email

nievergeltlab commented Dec 3, 2017

pjordab commented Dec 2, 2021

Effective sample size or total N? #95

Effective sample size or total N? #95

Comments

nievergeltlab commented Dec 1, 2017

rkwalters commented Dec 1, 2017 via email

nievergeltlab commented Dec 3, 2017

pjordab commented Dec 2, 2021