Skip to content

Commit

Permalink
Formatting README
Browse files Browse the repository at this point in the history
  • Loading branch information
bioinfo-ut committed Sep 22, 2017
1 parent def7130 commit d6ecc55
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 66 deletions.
58 changes: 46 additions & 12 deletions README.FastGT.md
Expand Up @@ -11,27 +11,61 @@ FastGT has two binaries - gmer_counter and gmer_caller. Pre-compiled binaries ar

* Compilation
Change int src subdirectory and type:
'''
```
make gmer_counter
make gmer_caller
'''
```

* Usage

First one has to prepare the database of specific k-mers for each allele of polymorphism of interest.
* Usage
First one has to prepare the database of specific k-mers for each allele of polymorphism of interest.
k-mer databases are available from http://bioinfo.ut.ee/FastGT/

K-mers are counted from raw reads in FASTQ file using program gmer_counter:
'''
```
gmer_counter -db DATABASE FASTQ_FILE(S) > COUNTS_FILE.txt
'''
```

Then the genotypes are called using program gmer_caller:
'''
```
gmer_caller COUNTS_FILE.txt > GENOTYPE_FILE.txt
'''
```

Genotype file can be converted to VCF format:
'''
Genotype file can be converted to VCF format:
```
generate_vcf.pl GENOTYPE_FILE.txt > GENOTYPE_FILE.vcf
'''
```

Additional options for gmer_counter:
```
-db DATABASE - SNP/KMER database file
-dbb DBBINARY - binary database file
-w FILENAME - write binary database to file
-32 - use 32-bit integeres for counts (default 16-bit)
--max_kmers NUM - maximum number of kmers per node
--header - print header row
--total - print the total number of kmers per node
--unique - print the number of nonzero kmers per node
--kmers - print individual kmer counts (default if no other output)
--compile_index - Add read index to database and write it to file
--distribution NUM - print kmer distribution (up to given number)
--num_threads - number of worker threads (default 24)
--low_memory - optimize for low memory usage
-D - increase debug level
```

Additional options for gmer_caller:
```
--training_size NUM - Use NUM markers for training (default 100000)
--runs NUMBER - Perfom NUMBER runs of model training (use 0 for no training)
--num_threads NUM - Use NUM threads (min 1, max 32, default 16)
--header - Print table header
--non_canonical - Output non-canonical genotypes
--prob_cutoff - probability cutoff for calling genotype (default 0)
--alternatives - Print probabilities of all alternative genotypes
--info - Print information about individual
--no_genotypes - Print only summary information, not actual genotypes
--model TYPE - Model type (full, diploid, haploid)
--params PARAMS - Model parameters (error, p0, p1, p2, coverage, size, size2)
--coverage NUM - Average coverage of reads
-D - increase debug level
```
106 changes: 52 additions & 54 deletions README.GenomeTester4.md
@@ -1,75 +1,73 @@
GenomeTester4 package
# GenomeTester4 package

GenomeTester4 is a toolkit for creating and manipulating k-mer lists. It
contains 3 programs: glistmaker, glistcompare and glistquery. It is
developed by Department of Bioinformatics, University of Tartu and
distributed under GPL version 3.0 (or later).


1. Quick usage for the impatient



distributed under GPL version 3.0 (or later).

1. Quick usage for the impatient
```
glistmaker|glistcompare|glistquery -h
```
Prints out quick description of command line arguments


Prints out quick description of command line arguments



```
glistmaker FASTA_FILES -w WORD_LENGTH -o OUTPUT_NAME

```
Generates a list of all unique k-mers in input files with their frequencies. The output file is
named OUTPUT_NAME_WORD_LENGTH_0_0.list



named OUTPUT_NAME_WORD_LENGTH_0_0.list
```
glistquery LIST_FILE

Prints out all k-mers and their frequencies in list file



```
Prints out all k-mers and their frequencies in list file
```
glistquery LIST_FILE -q WORD

Prints out the frequency of given word in list file



```
Prints out the frequency of given word in list file
```
glistquery LIST_FILE -f WORD_FILE

Prints out the frequencies of all words in WORD_FILE in list file



```
Prints out the frequencies of all words in WORD_FILE in list file
```
glistcompare LIST_FILE_1 LIST_FILE_2 --union -o OUTPUT_NAME

```
Generates union of all words in both input lists. The frequencies in final
list are sums of both input frequencies.



list are sums of both input frequencies.
```
glistcompare LIST_FILE_1 LIST_FILE_2 --intersection -o OUTPUT_NAME

Generates intersection of words in both input lists. The frequencies in
final list are smaller frequencies of either list



```
Generates intersection of words in both input lists. The frequencies in
final list are smaller frequencies of either list
```
glistcompare LIST_FILE_1 LIST_FILE_2 --difference -o OUTPUT_FILE

```
Generates complement (words in the first file and NOT in second file) of
words in both input lists. The frequencies in final list are the original
frequencies in the first list.



2. Compiling

frequencies in the first list.
2. Compiling
GenomeTester4 is written in standard C. The only external dependency should
be pthreads library that is standard in all Linux systems.
Binaries compiled with full optimization are included in directory "bin".
be pthreads library that is standard in all Linux systems.
Binaries compiled with full optimization are included in directory "bin".
If you for whatever reason have to compile these manually, just enter into
src subdirectory and type:

src subdirectory and type:
```
make clean
make
```

0 comments on commit d6ecc55

Please sign in to comment.