czid_pipeline (formerly idseq_pipeline)

This repository has been deprecated and archived. It remains available for reproducibility purposes only.

For the latest versions of CZ ID data processing workflows, please see https://github.com/chanzuckerberg/czid-workflows.

Purpose

This is a CLI that allows you to execute the different data processing stages required in CZ ID.

Usage

To install after cloning:

$ pip install -e .

Developers

When merging a commit to master, you need to increase the version number in czid_pipeline/version/__init__.py:

if results are expected to change, increase the 2nd number
if results are not expected to change, increase the 3rd number.

Release notes

1.8.7
- Bug fix for count_reads and non-host read counts.
1.8.4 ... 1.8.6
- Minor code quality, documentation, and logging improvements.
1.8.0 ... 1.8.3
- Upload a status file that indicates when a job has completed.
- Add a dedicated semaphore for S3 uploads.
- Code quality and documentation improvements.
- Restore capability to run non-host alignment from the development environment.
- Try a more relaxed LZW fraction if the initial filter leaves 0 reads
1.7.2 ... 1.7.5
- General code style changes and code cleanup.
- Convert string exceptions and generic exceptions to RuntimeErrors.
- Change some print statements for python3.
- Add more documentation.
1.7.1
- Truncate enormous inputs to 75 mil paired end / 150 mil unpaired reads.
- Support input fasta with pre-filtered host, e.g. project NID.
- Many operational improvements.
1.7.0
- Add capability to further filter out host reads by filtering all the hits from gsnapping host genomes. (i.e. gsnap hg38/patron5 for humans).
1.6.3 ... 1.6.1
- Handle bogus 0-length alignments output by gsnap without crashing.
- Fix crash for reruns which reuse compatible results from a previous run.
- Fix crash for samples with unpaired reads.
- Improve hit calling performance.
1.6.0
- Fix fasta downloads broken by release 1.5.0, making sure only hits at the correct level are output in the deduped m8.
- Fix fasta download for samples with unpaired reads by eliminating merged fasta for those samples.
- Extend the partial fix in release 1.5.1 to repair more of the broken reports. Full fix requires rerun with updated webapp.
- Correctly aggregate counts for species with unclassified genera, such as e.g. genus-less species 1768803 from family 80864.
- Fix total count in samples with unpaired reads (no longer doubled).
- Fix crash when zero reads remain after host filtering.
- Fix bug in enforcing command timeouts that could lead to hangs.
- Fix performance regression in stage 2 (non-host alignment) introduced with 1.5.0.
- Deduplicate and simplify much of stage 2, and improve performance by parallelizing uploads and downloads.
1.5.1
- Fix bug introduced in 1.5.0 breaking samples with non-species-specific deuterostome hits.
1.5.0
- Identify hits that match multiple species within the same genus as "non species specific" hits to the genus.
1.4.0
- Version result folder.
1.3.0
- Fix bug causing alignment to run before host subtraction in samples with unpaired reads.
- Include ERCC gene counts from STAR.
1.2.0
- Synchronize pair order after STAR to improve sensitivity in 10% of samples with paired-end reads.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
idseq_pipeline		idseq_pipeline
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
pull_request_template.md		pull_request_template.md
push_reference_update.sh		push_reference_update.sh
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

czid_pipeline (formerly idseq_pipeline)

Purpose

Usage

Developers

Release notes

About

Releases

Packages

Contributors 7

Languages

License

chanzuckerberg/czid-pipeline

Folders and files

Latest commit

History

Repository files navigation

czid_pipeline (formerly idseq_pipeline)

Purpose

Usage

Developers

Release notes

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages