Set of scripts for processing RNAseq data files.

Converts qseq sequence files to fastq files.

Usage: python3 convert_qseq_to_fastq -i [input qseq file] -o [output fastq file] [options]

Written in python 3.

Assumes qseq files are formatted like those from UCLA's BSCRC sequencing core: tab-separated file, columns are: instrument name, run ID, lane number, tile number, x coordinate, y coordinate, index, end number, read, quality scores, and filter (1=passes, 0 = fails).

Can take gzipped or uncompressed files as input.

Can filter out reads that fail the Illumina quality filter (default), or keep them in the output (option --nofilter).

Demultiplexes fastq files - reads with the same barcode sequence are output to the same fastq files.

Usage: python3 ['set_A' or 'set_B'] ['paired' or 'single'] [path to directory with files]

This works with paired-end or single-end reads, and gzipped or uncompressed files.

It will identify Illumia TruSeq LT set A or set B indices, depending on which you specify. (Info about barcode sequences here:

The script assumes names of fastq files are formatted like this:

s_*1*.fastq* file with read 1 sequence

s_*2*.fastq* file with barcodes

s_*3*.fastq* file with read 2 sequence (if paired-end)

The script will include barcodes with 1 mismatched base in the demultiplexed file for that barcode. Because barcode sequences from UCLA's BSCRC sequencing core include 7 bases, the script will match the full 7-base barcode, the barcode with a "." in the first position instead of the correct character (this seems to be fairly common), or the 7-base barcode with one of the bases incorrect. For example, all of these barcode sequences will match index 25:

ACTGATA (the actual barcode)

.CTGATA (the barcode with a "." in the first position)

ATTGATA (the barcode with one base mismatch in the second position)

The script will output files named after the barcode sequence: index_25.fastq (index25_read1.fastq and index25_read2.fastq if paired-end), as well as a file for the unmatched reads: NOMATCH.fastq.

Filters reads in fastq files by overall read quality. Keeps any reads that have at least the given percent of bases at or above the given quality.

usage: python3 -i [input file] -o [output file] -q [quality cutoff] -p [percent]

So to keep reads with at least 80% of the bases at or above a score of 20, set -q to 20 and -p to 80.


