Skip to content

Latest commit

 

History

History
175 lines (126 loc) · 3.85 KB

glossary.rst

File metadata and controls

175 lines (126 loc) · 3.85 KB

Glossary

File formats

yaml

Language to serialize objects. Used in the CGAT testing framework. (YAML).

bam

Format to store genomic alignments in a compressed format.

(BAM).

bed File containing genomic intervals. (BED).

vcf

Variant call format.

gtf General transfer format. Format to store genes and transcripts.

gff General feature format.

bigwig

Compressed format for displaying numerical values across genomic ranges (BIGWIG).

fasta

Sequence format.

wiggle

Format for displaying numerical values across genomic ranges (Wiggle).

psl

Genomic alignment format. The format is described in detail

(PSL.

sam

Format to store genomic alignments

(SAM).

gdl

gdl

tsv

Tab separated values. In these tables, records are separated by new-line characters and fields by tab characters. Lines with comments are started by the # character and are ignored. The first uncommented line should contain the column headers. For example:

# This is a comment

gene_id length gene1 1000 gene2 2000 # Another comment

svg

pass

edge list

pass

fastq

Sequence format containing quality scores, more background is

here

sra

sra

axt

axt

agp

AGP format

rdf

Resource description framework

Other terms

test directory

Directory that contains the test.yaml, input and

reference files for testing scripts.

experiment

experiment

replicate

replicate

graph graph

track

track

graph graph

submit host

pass

execution host

pass

edge list

pass

task

pass

sphinxreport

sphinxreport

query

pass

target

pass

code directory

pass

go

pass

goslim pass

fastq

pass

tss

Transcription start site

production pipeline

A pipeline that performs common tasks on a certain type of data. The idea of a production pipeline is to provide common preprocessing of data and a first look. A project pipeline might then take data from one or more production pipeline to glean biological insight.

project pipeline

A pipeline that is project specific. Usually code is developed

first inside a project pipeline. When it becomes generally

useful, it may be refactored into a production pipeline.

stdin

Unix standard input. Most CGAT tools read data from stdin.

stdout

Unix standard output. Most CGAT tools output data to stdout.

stderr

Unix standard error. This is where errors go.

loglevel

Verbosity of logging information. The logging level can be determined by the --verbose option. A

level of 0 means no logging output, while 1 is information messages only, while 2 outputs also debugging information.