Skip to content
adoebley edited this page May 4, 2022 · 11 revisions

Welcome to the Griffin wiki!

This wiki contains instructions for running a demo of Griffin on a small bam file. There are two steps which can be run independently for the demo:

  1. GC correction
  2. Nucleosome profiling

To run this demo, we recommend that you use a conda environment to install the correct package versions. Testing was performed using conda 4.10.3 (https://docs.conda.io/en/latest/miniconda.html).

This demo uses relative paths which assume a specific file structure within your Griffin folder, however, you can use different paths for your work if desired (ex. if you keep your reference genome in a different location), you just need to update the config.yaml to reflect the new paths.

Instructions for initializing the conda environment. Use this conda environment for all steps.
conda create --name griffin_demo python=3.7.4
conda activate griffin_demo or source activate griffin_demo
pip install snakemake==5.19.2
conda install pandas=1.3.2
conda install scipy=1.7.1
conda install pyBigWig=0.3.17
pip install matplotlib==3.4.1
conda install -c bioconda samtools=1.13
conda install -c bioconda bedtools=2.29.2
pip install pybedtools==0.8.0 #also installs pysam-0.19.0

GC and mappability correction

Total time to run griffin_GC_and_mappability_correction: ~45 minutes

  1. If you haven't already activated the conda environment, activate it:
    conda activate griffin_demo

  2. Copy snakemakes/griffin_GC_and_mappability_correction/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo):
    mkdir run_demo
    cp -r snakemakes/griffin_GC_and_mappability_correction run_demo

  3. Download the reference genome from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take ~10 minutes):
    wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    gunzip hg38.fa.gz
    mv hg38.fa Ref/

  4. Download the mappability track (1.2gb) from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
    wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
    mv k100.Umap.MultiTrackMappability.bw Ref/

  5. Convert the demo cram file to bam and create an index (takes ~1 minute):
    samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram

    samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam

  6. Navigate to the folder with the snakefile
    cd run_demo/griffin_GC_and_mappability_correction/

  7. Open the samples.yaml (run_demo/griffin_GC_and_mappability_correction/config/samples.yaml) and update the path to the demo bam file:

    samples:
      Healthy_demo: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam

  8. If you do NOT want the snakemake to use the default 8 CPUs, open the cluster_slurm.yaml (run_demo/griffin_GC_and_mappability_correction/config/cluster_slurm.yaml) and edit the ncpus parameter (line 18 and line 23) for the GC_counts step (other parameters in this file are not used unless launching to a slurm cluster). Increasing the CPUs will parallelize genomic regions to make the analysis run faster, as long as your computer has the CPUs available.

  9. Run the snakemake (expected runtime: ~15 minutes with 8 CPU):
    snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 -np #dry run to print a list of jobs
    snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 #runs one job at a time

  10. The outputs should be identical to the expected outputs in demo/griffin_GC_correction_demo_files/expected_results/:
    Healthy_demo.GC_bias.txt md5sum: 29d34798c67edad2c371cedb94b3a8b8

Nucleosome profiling

Total time to run the griffin_nucleosome_profiling demo: ~15 minutes

  1. If you haven't already activated the conda environment, activate it:
    conda activate griffin_demo

  2. Copy snakemakes/griffin_nucleosome_profiling/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo): mkdir run_demo #if you haven't already made this directory
    cp -r snakemakes/griffin_nucleosome_profiling run_demo

  3. If you haven't already downloaded the reference genome (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take a few minutes, you can also symlink an existing copy into the Ref folder):
    wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    gunzip hg38.fa.gz
    mv hg38.fa Ref/

  4. If you haven't already downloaded the mappability track (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
    wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
    mv k100.Umap.MultiTrackMappability.bw Ref/

  5. If you haven't already converted the demo cram file to bam file, convert it (takes ~1 minute):
    samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram

    samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam

  6. Navigate to the folder with the snakefile:
    cd run_demo/griffin_nucleosome_profiling/

  7. Open the sites.yaml (run_demo/griffin_nucleosome_profiling/config/sites.yaml) and update the path to the demo sites file (if you have run the filter sites demo, you can use the path to your results instead):

    site_lists:
       CTCF_demo: ../../demo/griffin_nucleosome_profiling_demo_files/sites/CTCF.hg38.1000.txt

  8. Open the samples.GC.yaml (run_demo/griffin_nucleosome_profiling/config/samples.yaml) and update the path to the demo bam file and GC correction file (if you have run the GC correction demo, you can use the path to your results instead):

    samples:
      Healthy_demo:
        bam: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
        GC_bias: ../../demo/griffin_GC_correction_demo_files/expected_results/Healthy_demo.GC_bias.txt

  9. Run the snakemake (expected runtime: ~1 minute):
    snakemake -s griffin_nucleosome_profiling.snakefile --cores 1 -np #dry run to print a list of jobs
    snakemake -s griffin_nucleosome_profiling.snakefile --cores 1

  10. The outputs should be identical to the expected outputs in demo/griffin_nucleosome_profiling_demo_files/expected_results/:
    Healthy_demo.GC_corrected.coverage.tsv md5: bb7c6b730d44bf201f4091836aa970d4
    Healthy_demo.uncorrected.coverage.tsv md5: 9ade7b4d069ba39b2d00a5f0609d41cd

Clone this wiki locally