Cycads

Description

Cycads is a tool for quality control & error profile analysis of long-read sequencing data.

Installation

git clone --depth 1 https://github.com/QYanwei/Cycads.git
conda env create --file Cycads/environment.yml --name cycads_env
conda activate cycads_env
cd Cycads && pip install .
cycads --help

Quick start

The example below generates HTML report from test/ecoli.fq.gz:

cycads --fastq test/ecoli.fq.gz --output_dir test --sample_name fastq_output

Usages

FASTQ quality control

cycads --fastq test/ecoli.fq.gz --output_dir test --sample_name fastq_output

FASTQ filtering

Should set the custom filtering parameter value by users

cycads --fastq test/ecoli.fq.gz --filter --output_dir test --sample_name fastq_output

FASTQ quality control and alignment-based error analysis

cycads --fastq test/ecoli.fq.gz --reference test/ecoli.reference.fasta --output_dir test --sample_name alignment_output

Alignment-based error analysis based on a pre-existing BAM file

cycads --bam test/test.bam --output_dir test --sample_name bam_output

Parameters details

                         === Cycads 0.4.0 ===
============================================================================
          Quality control & Data filtering & Error analysis
                          for Long-read sequencing
============================================================================

usage: cycads [-h] [-f FASTQ_PATH] [-b BAM_PATH] [-r REFERENCE_PATH] [-o OUTPUT_DIR] [-n SAMPLE_NAME] [-p PLATFORM] [-s N] [--seed SEED] [-T N] [-F] [-e N]
            [-Q MIN_BASE_QUALITY] [--min_length MIN_READ_LENGTH] [--max_length MAX_READ_LENGTH] [--trim_5_end N] [--trim_3_end N] [-d TARGET_DEPTH]
            [-g GENOME_SIZE] [--min_homopolymer_size MIN_HOMOPOLYMER_SIZE] [--max_homopolymer_size MAX_HOMOPOLYMER_SIZE]
            [--max_homopolymer_indel_size MAX_HOMOPOLYMER_INDEL_SIZE] [--alignment_threads THREADS] [--sort_threads THREADS] [--minimap2_arguments ARGUMENTS]
            [--minimap2 MINIMAP2] [--samtools SAMTOOLS] [--pyfastx PYFASTX]

A tool for quality control & error profile analysis of long-read sequencing data

options:
-h, --help            show this help message and exit

I/O:
Input/output arguments.

-f FASTQ_PATH, --fastq FASTQ_PATH
                      Input FASTQ file. Supported extensions include *.fastq and *.fastq.gz. (default: None)
-b BAM_PATH, --bam BAM_PATH
                      Input BAM file. (default: None)
-r REFERENCE_PATH, --reference REFERENCE_PATH
                      Reference FASTA file. (default: None)
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
                      Output direcotry. (default: cycads_output)
-n SAMPLE_NAME, --sample_name SAMPLE_NAME
                      Sample name displayed in output reports. (default: sample)
-p PLATFORM, --platform PLATFORM
                      Design for CycloneSEQ data, also adopt to ONT and PB data. (default: cyclone)

FASTQ:
Arguments for FASTQ analyses. Only effective when FASTQ_PATH is supplied.

-s N, --sample N      Only include a random sample of N reads from the input FASTQ file to accelerate evaluation. (default: 10000)
--seed SEED           Random seed for sampling. (default: 1)
-T N, --check_terminal_bases N
                      Analyze N bases at both ends of each read. (default: 200)

Filtering:
Arguments for filtering the input FASTQ file. Only effective when FASTQ_PATH is supplied.

-F, --filter          Output filtered FASTQ file. Analyses are always based on the input FASTQ file. (default: False)
-e N, --extract N     Randomly extract N reads from the input FASTQ file. (default: None)
-Q MIN_BASE_QUALITY, --min_base_quality MIN_BASE_QUALITY
                      Remove reads with mean base quality less than MIN_BASE_QUALITY. (default: 7)
--min_length MIN_READ_LENGTH
                      Remove reads shorter than MIN_READ_LENGTH. (default: 1)
--max_length MAX_READ_LENGTH
                      Remove reads longer than MAX_READ_LENGTH. (default: 1000000000)
--trim_5_end N        Trim N bases from the 5' end of each read. (default: 0)
--trim_3_end N        Trim N bases from the 3' end of each read. (default: 0)
-d TARGET_DEPTH, --target_depth TARGET_DEPTH
                      Downsample FASTQ file to TARGET_DEPTH. Requires GENOME_SIZE to be supplied. (default: None)
-g GENOME_SIZE, --genome_size GENOME_SIZE
                      Genome size of sequenced sample. Required if TARGET_DEPTH is set. (default: None)

Homopolymers:
Arguments related to homopolymer analyses.

--min_homopolymer_size MIN_HOMOPOLYMER_SIZE
                      Do not analyze homopolymers shorter than MIN_HOMOPOLYMER_SIZE. (default: 2)
--max_homopolymer_size MAX_HOMOPOLYMER_SIZE
                      Do not analyze homopolymers longer than MAX_HOMOPOLYMER_SIZE. (default: 9)
--max_homopolymer_indel_size MAX_HOMOPOLYMER_INDEL_SIZE
                      Analyze homopolymer expansion/contraction up to MAX_HOMOPOLYMER_INDEL_SIZE. (default: 4)

Alignment:
Arguments for read alignment. Only effective when FASTQ_PATH and REFERENCE_PATH are supplied.

--alignment_threads THREADS
                      Number of threads used in read alignment. (default: 4)
--sort_threads THREADS
                      Number of threads used in sorting aligned segments. (default: 1)
--minimap2_arguments ARGUMENTS
                      Alignment arguments to be passed to minimap2. (default: -ax map-ont --secondary=no --MD --eqx -I 10G)

Dependencies:
Arguments for custom paths to external binary dependencies. Cycads searches for binary dependencies in the following order: 1. arguments specified here; 2.
the `dependencies` folder in Cycads installation path; 3. the system $PATH environmental variable.

--minimap2 MINIMAP2   Path to Minimap2. (default: None)
--samtools SAMTOOLS   Path to samtools. (default: None)
--pyfastx PYFASTX     Path to pyfastx. (default: None)

Example output

Result folder example:

.
├── aligned_reads.bam
├── aligned_reads.bam.bai
├── bam.pickle
├── fastq_summary.txt
├── fq.pickle
├── HTML_report
│   ├── query_all_error_item.barplot.png
│   ├── query_all_substitution_errors.barplot.png
│   ├── query_deletion_frequency.barplot.png
│   ├── query_events_curve_idy.displot.png
│   ├── query_homopolymer_length_event.lineplot.png
│   ├── query_insertion_frequency.barplot.png
│   ├── read_gc_histplot.barplot.png
│   ├── read_head_base_content.lineplot.png
│   ├── read_head_base_quality.lineplot.png
│   ├── read_homopolymer_frequency.lineplot.png
│   ├── read_length_biostat.barplot.png
│   ├── read_length_cumulative.barplot.png
│   ├── read_length_histplot_nolog.barplot.png
│   ├── read_length_quality_cross.scatterplot.png
│   ├── read_quality_histplot.barplot.png
│   ├── read_relative_position_avg_qual.lineplot.png
│   ├── read_tail_base_content.lineplot.png
│   ├── read_tail_base_quality.lineplot.png
│   └── summary.html
├── input.symlink.fastq.gz -> /path/to/test/ecoli.fq.gz
└── input.symlink.fastq.gz.fxi

1 directory, 26 files

Please donwload the HTML_report folder and open the summary.html to check the result with Web Browser, such as Google Chrome and so on.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github/workflows		.github/workflows
.idea		.idea
bin		bin
cycads		cycads
test		test
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
INSTALL		INSTALL
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cycads

Description

Installation

Quick start

Usages

Parameters details

Example output

Report demo

Citation

About

Releases 1

Packages

Contributors 2

Languages

License

QYanwei/Cycads

Folders and files

Latest commit

History

Repository files navigation

Cycads

Description

Installation

Quick start

Usages

Parameters details

Example output

Report demo

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages