# MentaLiST quick start

In [1]:
# Help: shows all available commands:
MentaLiST.jl -h

usage: MentaLiST.jl [-h]
                    {call|build_db|list_pubmlst|download_pubmlst|list_cgmlst|download_cgmlst}

commands:
  call              MLST caller, given a sample and a k-mer database.
  build_db          Build a MLST k-mer database, given a list of FASTA
                    files.
  list_pubmlst      List all available MLST schema from
                    www.pubmlst.org.
  download_pubmlst  Dowload a MLST scheme from pubmlst and build a
                    MLST k-mer database.
  list_cgmlst       List all available cgMLST schema from
                    www.cgmlst.org.
  download_cgmlst   Dowload a MLST scheme from cgmlst.org and build a
                    MLST k-mer database.

optional arguments:
  -h, --help        show this help message and exit



# Installing MLST schema
Many options, from custom schema to downloading from pubmlst.org or cgmlst.org.

## List Available pubmlst.org schema

In [2]:
MentaLiST.jl list_pubmlst -h

usage: MentaLiST.jl list_pubmlst [-p PREFIX] [-h]

optional arguments:
  -p, --prefix PREFIX  Only list schema that starts with this prefix.
  -h, --help           show this help message and exit



In [3]:
MentaLiST.jl list_pubmlst -p Campylobacter

2017-07-19T15:20:59.773 - info: Downloading the MLST database xml file...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  110k  100  110k    0     0  16038      0  0:00:07  0:00:07 --:--:-- 16041
Campylobacter concisus/curvus  ID:23
Campylobacter fetus            ID:24
Campylobacter helveticus       ID:25
Campylobacter hyointestinalis  ID:26
Campylobacter insulaenigrae    ID:27
Campylobacter jejuni           ID:28
Campylobacter lanienae         ID:29
Campylobacter lari             ID:30
Campylobacter sputorum         ID:31
Campylobacter upsaliensis      ID:32
10 schema found.


In [4]:
pwd

/projects/pathogist/pfeijao/MentaLiST/docs


## Install a pubmlst.org scheme

In [7]:
MentaLiST.jl download_pubmlst -k 31 -o Campy -s 28 --db Campy/mlst_db_31 

2017-07-18T11:56:19.043 - info: Searching for the scheme ... 
2017-07-18T11:56:19.251 - info: Downloading scheme for Campylobacter jejuni ... 
2017-07-18T11:56:19.254 - info: Downloading profile ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0  35824      0  0:00:08  0:00:08 --:--:-- 36475
2017-07-18T11:56:27.852 - info: Downloading locus aspA ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  211k  100  211k    0     0  25617      0  0:00:08  0:00:08 --:--:-- 25617
2017-07-18T11:56:36.322 - info: Downloading locus glnA ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  297k  100  297k    0     0  35395      0  0:00

In [9]:
# The folder has all the FASTA files and profile for the scheme, and also the kmer database file,
# mlst_db_31.jld on this example.
ls Campy

aspA.tfa           glnA.tfa  glyA.tfa        mlst_db_31.profile  tkt.tfa
campylobacter.txt  gltA.tfa  mlst_db_31.jld  pgm.tfa             uncA.tfa


## Install a custom scheme from FASTA files

In [60]:
MentaLiST.jl build_db -k 25 --db Campy/mlst_db_25 -p Campy/campylobacter.txt -f Campy/*.tfa

2017-07-18T14:10:58.704 - info: Opening FASTA files ... 
2017-07-18T14:11:00.142 - info: Combining results for each locus ...
2017-07-18T14:11:00.76 - info: Saving DB ...
2017-07-18T14:11:02.777 - info: Done!


## List available cgMLST schema from cgmlst.org

In [52]:
MentaLiST.jl list_cgmlst

2017-07-18T14:03:37.335 - info: Downloading the cgmlist HTML to find schema...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4221    0  4221    0     0    551      0 --:--:--  0:00:07 --:--:--  1015
Clostridioides difficile       - ID:3560802
Enterococcus faecium           - ID:991893
Francisella tularensis         - ID:260204
Legionella pneumophila         - ID:1025099
Listeria monocytogenes         - ID:690488
Mycobacterium tuberculosis     - ID:741110
Staphylococcus aureus          - ID:141106
7 schema found.


## Download and install a cgMLST scheme from cgmlst.org

In [54]:
MentaLiST.jl download_cgmlst -o cgmlst/francisella -s 260204 -k 31 --db cgmlst/francisella/db_31

2017-07-18T14:04:30.784 - info: Downloading cgMLST scheme ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  478k    0  478k    0     0  66810      0 --:--:--  0:00:07 --:--:--  150k
2017-07-18T14:04:38.4 - info: Unzipping cgMLST scheme into individual FASTA files for each loci ...
......
2017-07-18T14:04:40.776 - info: 1147 loci found.
2017-07-18T14:04:40.777 - info: Building the k-mer database ...
2017-07-18T14:04:45.107 - info: Opening FASTA files ... 
2017-07-18T14:04:49.918 - info: Combining results for each locus ...
2017-07-18T14:04:55.585 - info: Saving DB ...
2017-07-18T14:04:58.782 - info: Done!


# Calling MLST alleles for a sample

In [13]:
# Help:
MentaLiST.jl call -h

usage: MentaLiST.jl call -o O -s S --db DB [-t T] [-q] [-e] [-j J]
                        [-h] files...

positional arguments:
  files       FastQ input files

optional arguments:
  -o O        Output file with MLST call
  -s S        Sample name
  --db DB     Kmer database
  -t T        A read of length L is discarded if it has at less than
              (L - k) * t hits to the same locus in the kmer database,
              where k is the kmer length. 0 <= t <= 1 (type: Float64,
              default: 0.2)
  -q          Quick filter; if middle kmer of a read are not in the
              kmer DB, the read is discarded. Disabled by default.
  -e          Use external kmc kmer counter. Disabled by default.
  -j J        Skip length between consecutive k-mers. Defaults to 1.
              (type: Int64, default: 1)
  -h, --help  show this help message and exit



In [38]:
MentaLiST.jl call -o campy_call.txt -s SRR5824107 --db Campy/mlst_db_31 ../data/SRR5824107.fastq.gz 

2017-07-18T13:49:18.03 - info: Opening kmer database ... 
2017-07-18T13:49:21.623 - info: Opening fastq file(s) ... 
2017-07-18T13:49:45.456 - info: Writing output ...
2017-07-18T13:49:46.061 - info: Done.


In [39]:
# results:
ls campy_call.*

campy_call.txt  campy_call.txt.ties.txt  campy_call.txt.votes.txt


In [50]:
# Allele calls and ST are on the campy_call.txt file:
column -ts $'\t' campy_call.txt

Sample      aspA  glnA  gltA  glyA  pgm  tkt  uncA  ST   clonal_complex
SRR5824107  2     17    2     3     2    1    5     883  ST-21 complex


In [41]:
# Detailed vote count for each allele:
cat campy_call.txt.votes.txt

Locus	Allele(votes),...
aspA	2(24534), 43(24248), 308(22910), 150(22535), 31(21781), 36(20619), 398(20418), 214(20050), 172(20043), 355(20028)
glnA	17(18752), 520(18064), 234(17676), 526(17676), 607(17505), 549(16828), 35(16016), 347(15669), 200(15619), 227(15613)
gltA	2(34362), 307(33256), 149(32771), 89(32739), 16(32739), 250(32601), 156(32601), 267(30422), 27(30086), 436(30080)
glyA	3(34886), 10(34601), 9(34525), 121(34240), 389(33866), 658(33411), 506(32894), 73(32482), 362(32057), 393(32035)
pgm	2(29174), 20(29033), 69(28510), 693(28261), 70(28041), 355(27996), 357(27690), 340(27633), 33(27324), 887(27285)
tkt	1(33722), 298(32902), 474(32844), 343(32754), 255(32662), 53(31980), 319(31980), 226(31878), 309(31776), 436(31776)
uncA	5(40636), 25(40289), 291(40167), 246(39801), 282(39697), 301(38655), 433(38150), 482(38013), 320(37226), 458(37225)
