# Fastq API Usage

## Import package

In [7]:
from pyBioTools import Fastq
from pyBioTools.common import jhelp

## index_reads

In [8]:
jhelp(Fastq.Filter)

**Filter** (input_fn, output_fn, min_len, min_qual, remove_duplicates, qual_offset, verbose, quiet, progress, kwargs)

Filter fastq reads based on their length, mean quality and the presence of duplicates. Can also be used to concatenate reads from multiple files in a single one.

---

* **input_fn** (required) [list(str)]

Fastq file path or directory containing fastq files or list of files, or regex or list of regex. It is quite flexible.

* **output_fn** (required) [str]

Destination fastq file. Automatically gzipped if the .gz extension is found

* **min_len** (default: None) [int]

Minimal reads length

* **min_qual** (default: None) [float]

Minimal mean read PHRED quality

* **remove_duplicates** (default: False) [bool]

If true duplicated reads with the same read id are discarded

* **qual_offset** (default: 33) [int]

Quality scoring system off set. Nowadays pretty much everyone uses +33

* **verbose** (default: False) [bool]

* **quiet** (default: False) [bool]

* **progress** (default: False) [bool]

* **kwargs**



### Basic usage

In [9]:
Fastq.Filter ("./data/sample_1.fastq", "./output/sample_1_filtered.fastq", min_len=100, min_qual=7, remove_duplicates=True, verbose=True)

[01;34m## Running Fastq Filter ##[0m
[32m	Parsing reads[0m
[37m	[DEBUG]: Reading file ./data/sample_1.fastq[0m
[37m	[DEBUG]: End of file ./data/sample_1.fastq[0m
[32m	Read counts summary[0m
[32m	 total_reads: 12,000[0m
[32m	 valid_reads: 10,882[0m
[32m	 low_qual_reads: 643[0m
[32m	 short_reads: 474[0m
[32m	 source files: 1[0m
[32m	 duplicate_reads: 1[0m


### All fastq from a directory instead and write to compressed fastq

In [10]:
Fastq.Filter ("./data/", "./output/sample_1_filtered.fastq.gz", min_len=100, min_qual=7, verbose=True)

[01;34m## Running Fastq Filter ##[0m
[32m	Parsing reads[0m
[37m	[DEBUG]: Reading file ./data/sample_1.fastq[0m
[37m	[DEBUG]: End of file ./data/sample_1.fastq[0m
[37m	[DEBUG]: Reading file ./data/sample_2.fastq[0m
[37m	[DEBUG]: End of file ./data/sample_2.fastq[0m
[32m	Read counts summary[0m
[32m	 total_reads: 24,000[0m
[32m	 valid_reads: 21,809[0m
[32m	 low_qual_reads: 1,304[0m
[32m	 short_reads: 887[0m
[32m	 source files: 2[0m
