This repository contains TAC-seq data analysis software.
- Linux-based OS
- FASTX-Toolkit
- Git
Use the following commands to setup TAC-seq data analysis software on terminal:
- Install FASTX-Toolkit
- Install Git
- Download the analysis software using Git:
git clone https://github.com/hindrek/TAC-seq-data-analysis
- Navigate to analysis location:
cd TAC-seq-data-analysis
- Make
tacseq
executable:chmod +x tacseq
Analyze TAC-seq data.
options:
-h
display help and exit
commands:
prep
prepare samples (FASTQ files) for countingcount
count reads and molecules per sample and target
Prepare samples (FASTQ files) for counting.
mandatory:
-i
input file: gzip compressed/uncompressed FASTQ file or '-' as standard input (stdin)-t
target file: target file format is based on FASTX Barcode Splitter barcode file format-o
output directory
optional:
-h
display help and exit-m
mismatches: number of allowed mismatches per target sequence (default: 5)
Count reads and molecules per sample and target.
mandatory:
-i
input directory:tacseq prep
output directory
optional:
-h
display help and exit-u
UMI threshold (default: 2)
Target file is a text file which contains a list of targets. Each line has to contain a target ID (must be alphanumeric) which is followed by the target sequence (only A, C, G and T characters are allowed). Target ID and sequence are separated by a TAB character.
Target file example:
TARGET1 TAGGATAGGTGGATTCGGGAACTCCCCGATAGTTTTGTCACATCGACATACTAA
TARGET2 CCAAAGCTTCAACGGACATAGTGTACATACCTACCGTGTTTCCCAGCACCTTCC
TARGET3 CTGCTGTTGCCGCCTGGGGTTTACGCGTGTTGGAGATTGAGTAGCCTCCTCGGC
tacseq prep
outputs a directory for each sample with:
- 3 sub-directories with files for each target:
- loci
- umis
- merged
- 2 intermediate files:
- trimmed.fasta
- umi_joined.fasta
tacseq count
outputs read and molecule counts per target for each sample.
- Step 1 - prepare samples:
./tacseq prep -i example/samples/sample1/sample1.fastq.gz -t example/targets.txt -o example/output/sample1/ -m 5
- Step 2 - count molecules and write results to
counts.tsv
file:./tacseq count -i output/sample1/ -u 2 > counts.tsv