Skip to content

Commit

Permalink
rebrand
Browse files Browse the repository at this point in the history
  • Loading branch information
morsecodist committed Jan 13, 2022
1 parent a2720d7 commit 65f04c4
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 30 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ jobs:
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: target/release/idseq-dedup${{ matrix.os == 'windows-latest' && '.exe' || '' }}
asset_name: idseq-dedup-${{ runner.os }}
file: target/release/czid-dedup${{ matrix.os == 'windows-latest' && '.exe' || '' }}
asset_name: czid-dedup-${{ runner.os }}
tag: ${{ github.ref }}
overwrite: true

24 changes: 12 additions & 12 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[package]
name = "idseq-dedup"
name = "czid-dedup"
version = "0.1.0"
authors = ["morsecodist <todd@morsecodist.io>"]
description = "Deduplicates reads from FASTA and FASTQ files"
Expand Down
30 changes: 15 additions & 15 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,40 @@
# idseq-dedup
# czid-dedup

![Rust](https://github.com/chanzuckerberg/idseq-dedup/workflows/CICD/badge.svg) [![codecov](https://codecov.io/gh/chanzuckerberg/idseq-dedup/branch/main/graph/badge.svg?token=LMcriTjfuH)](coverage) [![GitHub license](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://github.com/chanzuckerberg/idseq-web/blob/master/LICENSE) ![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)
![Rust](https://github.com/chanzuckerberg/czid-dedup/workflows/CICD/badge.svg) [![codecov](https://codecov.io/gh/chanzuckerberg/czid-dedup/branch/main/graph/badge.svg?token=LMcriTjfuH)](coverage) [![GitHub license](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://github.com/chanzuckerberg/czid-web/blob/master/LICENSE) ![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)

idseq-dedup reads single- or paired-end FASTA or FASTQ files and outputs versions of those files with duplicate reads removed. A duplicate read in this case is a read that is either identical to another read or shares a prefix of length `-l` with another read. Paired reads are only considered identical if both reads (or read prefixes, specified by `-l`) are duplicates to both reads in a previous pair.
czid-dedup reads single- or paired-end FASTA or FASTQ files and outputs versions of those files with duplicate reads removed. A duplicate read in this case is a read that is either identical to another read or shares a prefix of length `-l` with another read. Paired reads are only considered identical if both reads (or read prefixes, specified by `-l`) are duplicates to both reads in a previous pair.

In addition to the de-duplicated FASTA or FASTQ outputs, idseq-dedup also outputs a cluster file which makes it possible to identify clusters of duplicate reads. The file lists the representative cluster read ID for each initial read ID, where the representative cluster read ID is the read ID that makes it into the output file. If a read is found to be a duplicate of a previous read, it will be filtered out of the FASTA/FASTQ output and paired with the read ID of the previous duplicate read in the cluster output file. Representative cluster read IDs are paired with themselves. The order of the input files is preserved. The representative read will always be the first read of its type.
In addition to the de-duplicated FASTA or FASTQ outputs, czid-dedup also outputs a cluster file which makes it possible to identify clusters of duplicate reads. The file lists the representative cluster read ID for each initial read ID, where the representative cluster read ID is the read ID that makes it into the output file. If a read is found to be a duplicate of a previous read, it will be filtered out of the FASTA/FASTQ output and paired with the read ID of the previous duplicate read in the cluster output file. Representative cluster read IDs are paired with themselves. The order of the input files is preserved. The representative read will always be the first read of its type.

FASTA/FASTQ parsing provided by [rust-bio](https://github.com/rust-bio/rust-bio).

## Installation

### Binary

We release binaries for Linux, MacOS, and Windows. To install one, download the appropriate binary for your operating system from one of our [releases](https://github.com/chanzuckerberg/idseq-dedup/releases/).
We release binaries for Linux, MacOS, and Windows. To install one, download the appropriate binary for your operating system from one of our [releases](https://github.com/chanzuckerberg/czid-dedup/releases/).

### From Source

1. [Install rust/cargo](https://www.rust-lang.org/tools/install) if you haven't already
1. `git clone https://github.com/chanzuckerberg/idseq-dedup.git`
1. `cd idseq-dedup`
1. `git clone https://github.com/chanzuckerberg/czid-dedup.git`
1. `cd czid-dedup`
1. `cargo build --release`
1. Your executable will be at `idseq-dedup/target/release/idseq-dedup` (with `.exe` if you're on windows)
1. Your executable will be at `czid-dedup/target/release/czid-dedup` (with `.exe` if you're on windows)

## Usage

Run:

```bash
idseq-dedup --help
czid-dedup --help
```

for usage information:

```
USAGE:
idseq-dedup [OPTIONS] --deduped-outputs <deduped-outputs>... --inputs <inputs>...
czid-dedup [OPTIONS] --deduped-outputs <deduped-outputs>... --inputs <inputs>...
FLAGS:
-h, --help Prints help information
Expand All @@ -52,19 +52,19 @@ OPTIONS:
Deduplicate a single-end FASTA:

```bash
idseq-dedup -i my-fasta.fasta -o my-deduped-fasta.fasta
czid-dedup -i my-fasta.fasta -o my-deduped-fasta.fasta
```

Deduplicate a single-end FASTQ (same as FASTA)

```bash
idseq-dedup -i my-fastq.fastq -o my-deduped-fastq.fastq
czid-dedup -i my-fastq.fastq -o my-deduped-fastq.fastq
```

Deduplicate paired-end reads (note, inputs are paired to outputs by order not name):

```bash
idseq-dedup \
czid-dedup \
-i my-fasta-r1.fasta \
-i my-fasta-r2.fasta \
-o my-deduped-fasta-r1.fasta \
Expand All @@ -74,12 +74,12 @@ idseq-dedup \
Deduplicate only considering a prefix of length `70`:

```bash
idseq-dedup -l 70 -i my-fasta.fasta -o my-deduped-fasta.fasta
czid-dedup -l 70 -i my-fasta.fasta -o my-deduped-fasta.fasta
```

Custom cluster file name:

```bash
idseq-dedup -i my-fasta.fasta -o my-deduped-fasta.fasta -c custom-cluster.csv
czid-dedup -i my-fasta.fasta -o my-deduped-fasta.fasta -c custom-cluster.csv
```

0 comments on commit 65f04c4

Please sign in to comment.