In [22]:
import pandas as pd

df = pd.read_csv("insomnia_female_ukb2b_EUR_sumstats_20190311_with_chrX_mac_100.txt", sep="\t")

# Delete SNP and SNPID_UKB(Because it will detect several SNP columns and get confused....)
df_clean = df.drop(columns=["SNP", "SNPID_UKB"])

df_clean.to_csv("insomnia_for_ldsc.txt", sep=" ", index=False)


## Step 1: QC and reformat GWAS summary statistics for ldsc

In [23]:
!python ldsc-2.0.1/munge_sumstats.py \
    --sumstats insomnia_for_ldsc.txt \
    --N-cas 66976 \
    --N-con 141982 \
    --signed-sumstats STAT,0 \
    --snp RSID_UKB \
    --a1 A1 \
    --a2 A2 \
    --p P \
    --out Insomnia_ldsc_output \
    --merge-alleles ldsc_inputs/w_hm3.snplist

*********************************************************************
* LD Score Regression (LDSC)
* Version 2.0.0
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./munge_sumstats.py \
--sumstats insomnia_for_ldsc.txt \
--N-cas 66976.0 \
--N-con 141982.0 \
--out Insomnia_ldsc_output \
--merge-alleles ldsc_inputs/w_hm3.snplist \
--snp RSID_UKB \
--a1 A1 \
--a2 A2 \
--p P \
--signed-sumstats STAT,0 

Interpreting column names as follows:
A1:	Allele 1, interpreted as ref allele for signed sumstat.
STAT:	Directional summary statistic as specified by --signed-sumstats.
P:	p-Value
A2:	Allele 2, interpreted as non-ref allele for signed sumstat.
RSID_UKB:	Variant ID (e.g., rs number)
MAF:	Allele frequency

Reading list of SNPs for allele merge from ldsc_inputs/w_hm3.snplist
Read 1217311 SNPs for allele merge

  merge_alleles = pd.read_csv(args.merge_alleles, compression=compression, header=0,
  dat_gen = pd.read_csv(args.sumstats, delim_whitespace=True, header=0,
  jj[ii] = match


## Step 2: run ldsc to calculate total heritability

In [24]:
!python ldsc-2.0.1/ldsc.py \
    --h2 Insomnia_ldsc_output.sumstats.gz \
    --ref-ld-chr ldsc_inputs/for_h2/eur_w_ld_chr/ \
    --w-ld-chr ldsc_inputs/for_h2/eur_w_ld_chr/ \
    --out Insomnia_ldsc_h2 

*********************************************************************
* LD Score Regression (LDSC)
* Version 2.0.0
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--out Insomnia_ldsc_h2 \
--h2 Insomnia_ldsc_output.sumstats.gz \
--ref-ld-chr ldsc_inputs/for_h2/eur_w_ld_chr/ \
--w-ld-chr ldsc_inputs/for_h2/eur_w_ld_chr/ 

Beginning analysis at Wed Apr 16 16:45:40 2025
Reading summary statistics from Insomnia_ldsc_output.sumstats.gz ...
Read summary statistics for 1136632 SNPs.
Reading reference panel LD Score from ldsc_inputs/for_h2/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1293150 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from ldsc_inputs/for_h2/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read regression w

  'i.e., \ell_j := \sum_k p_k(1-p_k)r^2_{jk}, where p_k denotes the MAF '
  'i.e., \ell_j := \sum_k (p_k(1-p_k))^a r^2_{jk}, where p_k denotes the MAF '
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=Tr

## Step 3: run ldsc to calculate heritability enrichment by functional annotation

In [28]:
!python ldsc-2.0.1/ldsc.py \
    --h2 Insomnia_ldsc_output.sumstats.gz \
    --ref-ld-chr ldsc_inputs/for_enrichment/Baseline/baseline. \
    --ref-ld-chr ldsc_inputs/for_enrichment/GenoSkylinePlus/GSplus_Tier3_1KGphase3. \
    --w-ld-chr ldsc_inputs/for_enrichment/weights/weights.hm3_noMHC. \
    --overlap-annot \
    --frqfile-chr ldsc_inputs/for_enrichment/genotype/1000G.EUR.QC. \
    --out Insomnia_ldsc_enrichment

*********************************************************************
* LD Score Regression (LDSC)
* Version 2.0.0
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--out Insomnia_ldsc_enrichment \
--h2 Insomnia_ldsc_output.sumstats.gz \
--ref-ld-chr ldsc_inputs/for_enrichment/GenoSkylinePlus/GSplus_Tier3_1KGphase3. \
--w-ld-chr ldsc_inputs/for_enrichment/weights/weights.hm3_noMHC. \
--overlap-annot  \
--frqfile-chr ldsc_inputs/for_enrichment/genotype/1000G.EUR.QC. 

Beginning analysis at Wed Apr 16 17:13:44 2025
Reading summary statistics from Insomnia_ldsc_output.sumstats.gz ...
Read summary statistics for 1136632 SNPs.
Reading reference panel LD Score from ldsc_inputs/for_enrichment/GenoSkylinePlus/GSplus_Tier3_1KGphase3.[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 11

  'i.e., \ell_j := \sum_k p_k(1-p_k)r^2_{jk}, where p_k denotes the MAF '
  'i.e., \ell_j := \sum_k (p_k(1-p_k))^a r^2_{jk}, where p_k denotes the MAF '
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=True, na_values='.', **kwargs)
  return pd.read_csv(fh, delim_whitespace=Tr