Home

Welcome to the Griffin wiki!

This wiki contains instructions for running a demo of Griffin on a small bam file. There are two steps which can be run independently for the demo:

GC correction
Nucleosome profiling

To run this demo, we recommend that you use a conda environment to install the correct package versions. Testing was performed using conda 4.10.3 (https://docs.conda.io/en/latest/miniconda.html).

This demo uses relative paths which assume a specific file structure within your Griffin folder, however, you can use different paths for your work if desired (ex. if you keep your reference genome in a different location), you just need to update the config.yaml to reflect the new paths.

Instructions for initializing the conda environment. Use this conda environment for all steps.
conda create --name griffin_demo python=3.7.4
conda activate griffin_demo or source activate griffin_demo
pip install snakemake==5.19.2
conda install pandas=1.3.2
conda install scipy=1.7.1
conda install pyBigWig=0.3.17
pip install matplotlib==3.4.1
conda install -c bioconda samtools=1.13
conda install -c bioconda bedtools=2.29.2
pip install pybedtools==0.8.0 #also installs pysam-0.19.0

GC and mappability correction

Total time to run griffin_GC_and_mappability_correction: ~45 minutes

If you haven't already activated the conda environment, activate it:
conda activate griffin_demo
Copy snakemakes/griffin_GC_and_mappability_correction/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo):
mkdir run_demo
cp -r snakemakes/griffin_GC_and_mappability_correction run_demo
Download the reference genome from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take ~10 minutes):
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
mv hg38.fa Ref/
Download the mappability track (1.2gb) from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
mv k100.Umap.MultiTrackMappability.bw Ref/
Convert the demo cram file to bam and create an index (takes ~1 minute):
samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram

samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
Navigate to the folder with the snakefile
cd run_demo/griffin_GC_and_mappability_correction/
Open the samples.yaml (run_demo/griffin_GC_and_mappability_correction/config/samples.yaml) and update the path to the demo bam file:

samples:
Healthy_demo: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
If you do NOT want the snakemake to use the default 8 CPUs, open the cluster_slurm.yaml (run_demo/griffin_GC_and_mappability_correction/config/cluster_slurm.yaml) and edit the ncpus parameter (line 18 and line 23) for the GC_counts step (other parameters in this file are not used unless launching to a slurm cluster). Increasing the CPUs will parallelize genomic regions to make the analysis run faster, as long as your computer has the CPUs available.
Run the snakemake (expected runtime: ~15 minutes with 8 CPU):
snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 -np #dry run to print a list of jobs
snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 #runs one job at a time
The outputs should be identical to the expected outputs in demo/griffin_GC_correction_demo_files/expected_results/:
Healthy_demo.GC_bias.txt md5sum: 29d34798c67edad2c371cedb94b3a8b8

Nucleosome profiling

Total time to run the griffin_nucleosome_profiling demo: ~15 minutes

If you haven't already activated the conda environment, activate it:
conda activate griffin_demo
Copy snakemakes/griffin_nucleosome_profiling/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo): mkdir run_demo #if you haven't already made this directory
cp -r snakemakes/griffin_nucleosome_profiling run_demo
If you haven't already downloaded the reference genome (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take a few minutes, you can also symlink an existing copy into the Ref folder):
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
mv hg38.fa Ref/
If you haven't already downloaded the mappability track (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
mv k100.Umap.MultiTrackMappability.bw Ref/
If you haven't already converted the demo cram file to bam file, convert it (takes ~1 minute):
samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram

samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
Navigate to the folder with the snakefile:
cd run_demo/griffin_nucleosome_profiling/
Open the sites.yaml (run_demo/griffin_nucleosome_profiling/config/sites.yaml) and update the path to the demo sites file (if you have run the filter sites demo, you can use the path to your results instead):

site_lists:
CTCF_demo: ../../demo/griffin_nucleosome_profiling_demo_files/sites/CTCF.hg38.1000.txt
Open the samples.GC.yaml (run_demo/griffin_nucleosome_profiling/config/samples.yaml) and update the path to the demo bam file and GC correction file (if you have run the GC correction demo, you can use the path to your results instead):

samples:
  Healthy_demo:
    bam: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
    GC_bias: ../../demo/griffin_GC_correction_demo_files/expected_results/Healthy_demo.GC_bias.txt
Run the snakemake (expected runtime: ~1 minute):
snakemake -s griffin_nucleosome_profiling.snakefile --cores 1 -np #dry run to print a list of jobs
snakemake -s griffin_nucleosome_profiling.snakefile --cores 1
The outputs should be identical to the expected outputs in demo/griffin_nucleosome_profiling_demo_files/expected_results/:
Healthy_demo.GC_corrected.coverage.tsv md5: bb7c6b730d44bf201f4091836aa970d4
Healthy_demo.uncorrected.coverage.tsv md5: 9ade7b4d069ba39b2d00a5f0609d41cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the Griffin wiki!

GC and mappability correction

Nucleosome profiling

Clone this wiki locally