GitHub - bionanogenomics/virtual_karyotype_snakemake: This repository contains a Snakemake workflow for the generation and visualization of virtual karyotypes.

Bionano - Clinical Affairs Virtual Karyotype Workflow with Snakemake

This repository contains a Snakemake workflow for the generation and visualization of virtual karyotypes.

Overview

The workflow performs the following steps:

Run a virtual karyotype.
Generate plotting parameter files.
Visualize the virtual karyotype.
Rotate the ideogram images.
Merge rotated ideogram images.
Copy the input data to a separate location.
Package the results for easy sharing.

Requirements

Snakemake
Conda (for environment management) The specific Python, R, and tool dependencies are handled using Conda environments. For each rule, a conda environment is specified that will be automatically created and activated by Snakemake.

Conda installation

wget https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh
#Follow-installation prompts
bash Miniconda3-py311_23.5.2-0-Linux-x86_64.sh

Mamba installation

conda install -n base -c conda-forge mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake

This will install snakemake into an isolated software environment, that has to be activated with

$ conda activate snakemake
$ snakemake --help

Repository setup

Clone this repository and initialize OMKar submodule (VK github repository)

git clone https://joeyestabrook@bitbucket.org/bionanoclinicalaffairs/virtual_karyotype_snakemake.git
git submodule init
git submodule update
cd OMKar 
git pull origin master

Modify the config.yaml: Specify the paths for the required resources (centro,cytoband,resolved_cytoband)

Parameters

centro

Description: Path to the file containing information on centromeres for the human genome (version hg38).

cytoband

Description: Path to the cytoband file. This file provides cytogenetic banding pattern information, which is essential for generating the virtual karyotype plots.

resolved_cytoband

Description: Path to the updated cytoband file, which contains 800 band resolution and is formatted to include gieStain and base cytoband ids.

basedir

Description: Base directory for the virtual karyotype analyses. This might be the directory where all the analysis outputs will be saved or where some primary resources are located.

Generating Input Files for `run_vk`` Rule

The input files required for the run_vk rule are derived from a de novo assembly. Here's how they are generated:

De Novo Assembly:

Before using this workflow, a de novo assembly should be carried out. The output from this assembly will provide essential files that will be used as inputs for the run_vk rule.

Directory Structure:

The relevant files from the de novo assembly can be found in the following directory structure:

Smap File: output/contigs/exp_refineFinal1_sv/merged_smaps/exp_refineFinal1_merged_filter_inversions.smap
Copy Number exp File: output/contigs/alignmolvref/copynumber/cnv_calls_exp.txt
Copy Number rcmap File: output/contigs/alignmolvref/copynumber/cnv_rcmap_exp.txt
Annotation Xmap File: output/contigs/annotation/exp_refineFinal1_merged.xmap

Please ensure that these files are in the directory structure outlined below. Once these files are in place, you can run the run_vk.

Prepare your input samples: The workflow assumes input samples are placed under the samples directory with the structure:

samples/
    sample1/
        exp_refineFinal1_merged_filter_inversions.smap
        cnv_calls_exp.txt
        cnv_rcmap_exp.txt
        exp_refineFinal1_merged.xmap
    sample2/
    ...

WORKFLOW OVERVIEW

RUNNING WORKFLOW

Do a dry-run of snakemake to ensure proper execution prior to launching the workflow.

screen

$(snakemake) snakemake -np --verbose

Execute snakemake workflow

snakemake -p --verbose --use-conda -c 1 &> snakemake_workflow.log

Detach screen

ctrl-a ctrl-d

Output

The primary output of this workflow includes:

Processed virtual karyotype results in the vk_results directory for each sample.
Packaged results for easy sharing in the packaged_results directory.

Logging

Logs for each rule are saved in the logs/scripts directory. They can be used for troubleshooting and monitoring the progress of the workflo

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
OMKar @ e54c88d		OMKar @ e54c88d
envs		envs
images		images
input		input
resources		resources
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bionano - Clinical Affairs Virtual Karyotype Workflow with Snakemake

Overview

Requirements

Repository setup

Parameters

Generating Input Files for `run_vk`` Rule

De Novo Assembly:

Directory Structure:

WORKFLOW OVERVIEW

RUNNING WORKFLOW

Output

Logging

About

Releases

Packages

Languages

bionanogenomics/virtual_karyotype_snakemake

Folders and files

Latest commit

History

Repository files navigation

Bionano - Clinical Affairs Virtual Karyotype Workflow with Snakemake

Overview

Requirements

Repository setup

Parameters

Generating Input Files for `run_vk`` Rule

De Novo Assembly:

Directory Structure:

WORKFLOW OVERVIEW

RUNNING WORKFLOW

Output

Logging

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages