# Fastq API Usage

## Import package

In [8]:
from pyBioTools import Fastq
from pyBioTools.common import jhelp

## index_reads

In [3]:
jhelp(Fastq.filter_reads)

**filter_reads** (src_fn, dest_fn, min_len, min_qual, remove_duplicates, qual_offset, kwargs)

Filter fastq reads based on their length, mean quality and the presence of duplicates. Can also be used to concatenate reads from multiple files in a single one.

---

* **src_fn** (required) [str]

Fastq file path or directory containing fastq files or list of files, or regex or list of regex. It is quite flexible.

* **dest_fn** (required) [str]

* **min_len** (default: None) [int]

Minimal reads length

* **min_qual** (default: None) [float]

Minimal mean read PHRED quality

* **remove_duplicates** (default: False) [bool]

If true duplicated reads with the same read id are discarded

* **qual_offset** (default: 33) [int]

Quality scoring system off set. Nowadays pretty much everyone uses +33

* **kwargs**

Allow to pass extra options such as verbose and quiet



### Basic usage

In [5]:
Fastq.filter_reads (src_fn="./data/sample_1.fastq", dest_fn="./output/sample_1_filtered.fastq", min_len=100, min_qual=7, remove_duplicates=True, verbose=True)

Parsing reads
Reads processed : 12000 reads [00:01, 6423.35 reads/s]

Read counts summary
	total_reads: 12,000
	valid_reads: 10,882
	low_qual_reads: 643
	short_reads: 474
	source files: 1
	duplicate_reads: 1



### All fastq from a directory instead and write to compressed fastq

In [7]:
Fastq.filter_reads (src_fn="./data/", dest_fn="./output/sample_1_filtered.fastq.gz", min_len=100, min_qual=7, verbose=True)

Parsing reads
Reads processed : 24000 reads [00:32, 728.40 reads/s]

Read counts summary
	total_reads: 24,000
	valid_reads: 21,809
	low_qual_reads: 1,304
	short_reads: 887
	source files: 2

