## SoS Workflow:

This is the options and the SoS code to run the LDSC pipeline using your own data. 

## Command Interface:

In [116]:
!sos run LDSC_Code.ipynb -h

usage: sos run LDSC.ipynb [workflow_name | -t targets] [options] [workflow_options]
  workflow_name:        Single or combined workflows defined in this script
  targets:              One or more targets to generate
  options:              Single-hyphen sos parameters (see "sos run -h" for details)
  workflow_options:     Double-hyphen workflow-specific parameters

Workflows:
  make_annot
  munge_sumstats_no_sign
  munge_sumstats_sign
  calc_ld_score
  calc_enrichment

Sections
  make_annot:
    Workflow Options:
      --bed VAL (as str, required)
                        path to bed file
      --bim VAL (as str, required)
                        path to bim file
      --annot VAL (as str, required)
                        name of output annotation file
  munge_sumstats_no_sign: This option is for when the summary statistic file
                        does not contain a signed summary statistic (Z or Beta).
                        In this case,the program will calculate Z for you based

## Make Annotation File:

In [93]:

[make_annot]

# Make Annotated Bed File

# path to bed file
parameter: bed = str 
#path to bim file
parameter: bim = str
#name of output annotation file
parameter: annot = str
bash: 
    python2 make_annot.py --bed-file {bed} --bimfile {bim} --annot-file {annot}

## Munge Summary Statistics (Option 1: No Signed Summary Statistic):

In [None]:
#This option is for when the summary statistic file does not contain a signed summary statistic (Z or Beta). 
#In this case,the program will calculate Z for you based on A1 being the risk allele
[munge_sumstats_no_sign]



#path to summary statistic file
parameter: sumst = str
#path to Hapmap3 SNPs file, keep all columns (SNP, A1, and A2) for the munge_sumstats program
parameter: alleles = "w_hm3.snplist"
#path to output file
parameter: output = str

bash: 
    python2 munge_sumstats.py --sumstats {sumst} --merge-alleles {alleles} --out {output} --a1-inc

## Munge Summary Statistics (Option 2: No Signed Summary Statistic):

In [None]:
# This option is for when the summary statistic file does contain a signed summary statistic (Z or Beta)
[munge_sumstats_sign]



#path to summary statistic file
parameter: sumst = str
#path to Hapmap3 SNPs file, keep all columns (SNP, A1, and A2) for the munge_sumstats program
parameter: alleles = "w_hm3.snplist"
#path to output file
parameter: output = str

bash: 
    python2 munge_sumstats.py --sumstats {sumst} --merge-alleles {alleles} --out {output}

## Calculate LD Scores:

**Make sure to delete SNP,CHR, and BP columns from annotation files if they are present otherwise this code will not work. Before deleting, if these columns are present, make sure that the annotation file is sorted.**

In [None]:
#Calculate LD Scores
#**Make sure to delete SNP,CHR, and BP columns from annotation files if they are present otherwise this code will not work. Before deleting, if these columns are present, make sure that the annotation file is sorted.**
[calc_ld_score]

#Path to bim file
parameter: bim = str
#Path to annotation File. Make sure to remove the SNP, CHR, and BP columns from the annotation file if present before running.
parameter: annot_file = str
#name of output file
parameter: output = str
#path to Hapmap3 SNPs file, remove the A1 and A2 columns for the Calculate LD Scores program 
parameter: snplist = "w_hm3.snplist"

bash: 
    python2 ldsc.py --bfile {bim} --l2 --ld-wind-cm 1 --annot {annot_file} --thin-annot --out {output} --print-snps {snplist}

## Calculate Functional Enrichment using Annotations:

In [None]:
#Calculate Enrichment Scores for Functional Annotations

[calc_enrichment]

#Path to Summary statistics File
parameter: sumstats = str
#Path to Reference LD Scores Files (Base Annotation + Annotation you want to analyze, format like minimal working example)
parameter: ref_ld = str
#Path to LD Weight Files (Format like minimal working example)
parameter: w_ld = str
#path to frequency files (Format like minimal working example)
parameter: frq_file = str
#Output name
parameter: output = str

bash:
    python2 ldsc.py --h2 {sumstats} --ref-ld-chr {ref_ld} --w-ld-chr {w_ld} --overlap-annot --frqfile-chr {frq_file} --out {output}