Skip to content

Latest commit

 

History

History
266 lines (165 loc) · 9.15 KB

README.md

File metadata and controls

266 lines (165 loc) · 9.15 KB

kGWASflow

A modular, flexible, and reproducible Snakemake workflow to perform k-mers-based GWAS.

Anaconda-Server Badge Anaconda-Server Badge install with bioconda Anaconda-Server Badge Snakemake License DOI

Table of Contents

Summary

kGWASflow is a Snakemake pipeline developed for performing k-mers-based genome-wide association study (GWAS) based on the method developed by Voichek et al. (2020). It performs several pre-GWAS analyses, including read trimming, quality control, and k-mer counting. It implements the kmersGWAS method into an easy to use and accessible workflow. The pipeline also contains post-GWAS analyses, such as mapping k-mers to a reference genome, finding and mapping the source reads of k-mers, assembling source reads into contigs, and mapping them to a reference genome. kGWASflow is also highly customizable and offers users multiple options to choose from depending on their needs.

More information and explanations on how to install, configure and run kGWASflow are provided in the kGWASflow Wiki.

kGWASflow is out on G3 Genes|Genomes|Genetics:

  • Corut, A. K., & Wallace, J. G. (2024). kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3: Genes, Genomes, Genetics, 14(1), jkad246. https://doi.org/10.1101/2023.07.10.548365

My project-1-3


Installation

Installing via Bioconda

‼️This is the preferred method.‼️

In order to use this workflow, you need conda to be installed (to install conda, please follow the instructions here).

# Create a new conda environment with kgwasflow 
# and its dependencies
conda create -c bioconda -n kgwasflow kgwasflow

# Activate kGWASflow conda environment
conda activate kgwasflow

# test kGWASflow
kgwasflow --help

Installing via GitHub (Alternative Method)

Alternatively, kGWASflow can be installed by cloning the GitHub repository.

# Clone this repository to your local machin
git clone https://github.com/akcorut/kGWASflow.git

# Change into the kGWASflow directory
cd kGWASflow

After cloning the GitHub repo, you can install snakemake and the other dependencies using conda as below:

# This assumes conda is installed
conda env create -f environment.yaml

# Activate kGWASflow conda environment
conda activate kGWASflow

Finally, you can install kGWASflow using the setup script as below:

# Install kgwasflow
python setup.py install

# Test kgwasflow
kgwasflow --help

Other Options:

The other options on how to deploy this workflow can be found in the Snakemake Workflow Catalog.


Configuration

Initializing kGWASflow

To configure kGWASflow, you first need to initialize a new kGWASflow working directory by following the below steps:

# Activating the conda environment
conda activate kgwasflow

# Initializing a new kgwasflow working dir
kgwasflow init --work-dir path/to/your/work_dir 

or

# Activating the conda environment
conda activate kgwasflow

# Change into your preferred working directory
cd path/to/your/work_dir 

# Initializing a new kgwasflow working dir
kgwasflow init

kGWASflow Working Directory Structure

This command will initialize a new kGWASflow working directory with the default configuration files. Below is the directory structure of the working directory:

path/to/your/work_dir 
├── config
│   ├── config.yaml
│   ├── phenos.tsv
│   └── samples.tsv
└── test

kGWASflow Configuration Files

Below are the configuration files generated by kgwasflow init command:

  • config/config.yaml is a YAML file containing the workflow configuration.

  • config/samples.tsv is a TSV file containing the sample information.

  • config/phenos.tsv is a TSV file containing the phenotype information.

For more information about each configuration file, please see kGWASflow Wiki.


Usage

After initializing (kgwasflow init) step and modifying the configuration files, kGWASflow can be run as below:

# Activating the conda environment
conda activate kgwasflow

# Change into your preferred working directory
cd path/to/your/work_dir 

# Run kgwasflow
kgwasflow run -t 16

Below are some of the run examples of kGWASflow:

Run examples:

1. Run kGWASflow with the default config file, default arguments and 16 threads:

  kgwasflow run -t 16 --snake-default

2. Run kGWASflow with a custom config file and default settings:

  kgwasflow run -t 16 -c path/to/custom_config.yaml

3. Run kGWASflow with user defined working directory:

  kgwasflow run -t 16 --work-dir path/to/work_dir

4. Run kGWASflow in dryrun mode to see what tasks would be executed:

  kgwasflow run -t 16 -n

5. Run kGWASflow using mamba as the conda frontend:

  kgwasflow run -t 16 --conda-frontend mamba

6. Run kGWASflow and generate an HTML report:

  kgwasflow run -t 16 --generate-report

Information about how to use kGWASflow with Snakemake commands can be found in the Snakemake Workflow Catalog.


Testing

In order to test kGWASflow using an E.coli dataset (Earle et al. 2016, Rahman et al. 2018)

# Activating the conda environment
conda activate kgwasflow

After activating kGWASflow, you can perform a test run as axplained below:

Test examples:

1. Run the kGWASflow test in dryrun mode to see what tasks would be executed:

   kgwasflow test -t 16 -n

2. Run the kGWASflow test using the test config file with 16 threads:

   kgwasflow test -t 16

3. Run the kGWASflow test and define the test working directory:

   kgwasflow test -t 16 --work-dir path/to/test_work_dir

Authors

kGWASflow was developed by Adnan Kivanc Corut.


Issues

For Issues: https://github.com/akcorut/kGWASflow/issues

Contributions to the development of kGWASflow are welcome! Create Pull Requests to fix bugs or recommend new features!


Maintainers


Citation

‼️kGWASflow is out now on G3: https://doi.org/10.1101/2023.07.10.548365 ‼️

If you use kGWASflow in your research, please cite using the DOI: https://doi.org/10.1101/2023.07.10.548365 and the original method paper by Voichek et al. (2020):

  • Corut, A. K., & Wallace, J. G. (2024). kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3: Genes, Genomes, Genetics, 14(1), jkad246. https://doi.org/10.1101/2023.07.10.548365

  • Voichek, Y., Weigel, D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat Genet 52, 534–540 (2020). https://doi.org/10.1038/s41588-020-0612-7


License

kGWASflow is licensed under the MIT license.