GitHub - BGI-Qingdao/MetaTrass: Metagenomics Taxonomic Reads Assembly Single-Species

MetaTrass © BGI-Qingdao

Description:

MetaTrass is the abbreviation from Metagenomics Taxonomic Reads Assembly Single-Species.

MetaTrass is the first metagenome assembly tool implemented by binning-first-assembly-later strategy and it is freely available at https://github.com/BGI-Qingdao/MetaTrass.

MetaTrass integrates the cobarcoding and the reference genome information to achieve high capability of generating high-quality genomes at species level for the human gut microbial commuities.

MetaTrass uses the cobarcoding correlation between short length reads to reduce false positives in conventional taxonomic binning and improve the contiguity of draft assemblies.

Publication:

MetaTrass: A high‐quality metagenome assembler of the human gut microbiome by cobarcoding sequencing

If you find MetaTrass is useful in your research, please cite：

Yanwei Qi, Shengqiang Gu, Yue Zhang, Lidong Guo, Mengyang Xu, Xiaofang Cheng, Ou Wang, Ying Sun, Jianwei Chen, Xiaodong Fang, Xin Liu, Li Deng, and Guangyi Fan. 2022. MetaTrass: a high-quality metagenome assembler of the human gut microbiome by cobarcoding sequencing reads. iMeta e46. https://doi.org/10.1002/imt2.46

Change Log:

v1.3.0 (2022-05-27) - Uploaded a demo test data!
v1.2.0 (2021-08-03) - Finished a version test!
v1.1.0 (2021-07-09) - Fixed a few bugs
v1.0.0 (2021-04-13) - Initial release

Branch release:

v1.0.0 (2022-10-26) - a branch realeased! MetaTrass_lite

Dependencies:

System requirement:

Python3 (version >3.0.0)
GCC (version >4.8.1)

Third-party software:

How to install:

MetaTrass can be installed via the git channel:

git clone https://github.com/BGI-Qingdao/MetaTrass.git
cd  ./MetaTrass/tools/  && g++ -std=c++11 TABrefiner.cpp -o TABrefiner

You can either add MetaTrass's 3rd party dependencies to your system path or put the full path to alias into the folder MetaTrass/tools/ which can be found by MetaTrass easily.

Quick start with a demo dataset

Get the final assemblies for the demo dataset :

python="/path-to-your/python3"

# download the UHGG kraken dataset 
wget -r -np -nH -erobots=off http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v1.0/uhgg_kraken2-db/ 
# configure the reference database
$python ./MetaTrass/tool/fa_split_by_taxid.py -reffna ./kraken2-db/library/library*.fna -outdir ./uhgg_kraken2-fa/
$python ./MetaTrass/tool/ref_genome_size.py -refdir ./uhgg_kraken2-fa/
# assemble all species
rawfq1=./MetaTrass/Test/dir1_cleandata/split_reads.1.fq.gz.clean.gz
rawfq2=./MetaTrass/Test/dir1_cleandata/split_reads.2.fq.gz.clean.gz

output=$outdir/Demo
mkdir -p $output

Trass="./MetaTrass/Trass.py"
ref_db="./uhgg_kraken2-db/"
ref_fa="./uhgg_kraken2-fa/"
ref_gz="./uhgg_kraken2-fa/ref_genome_size.txt"

$python $Trass GC -rawfq1 $rawfq1 -rawfq2 $rawfq2 -outdir $output -runnow no

$python $Trass TB -cleanfq1 $output/dir1_cleandata/split_reads.1.fq.gz.clean.gz \
                  -cleanfq2 $output/dir1_cleandata/split_reads.2.fq.gz.clean.gz \
		   -thread 10 -sample $sample -ref_db $ref_db -genome_size $ref_gz -outdir $output -runnow yes
		   
$python $Trass AP -outdir $output -ref_fa $ref_fa -thread 10 -parallel 10 -runnow yes

Usage 0.1: Configure the reference database:

The reference database for kraken2 includes a folder that holds the database. Databases are pre-built, including the required hash.k2d, opts.k2d, and taxo.k2d files.

For Human Gut:
We recommend the UHGG taxonomy database which can be downloaded from MGnify Genomes.
uhgg_kraken2-db/ catalog

			├── [472K]  database100mers.kmer_distrib  
			├── [441K]  database150mers.kmer_distrib  
			├── [421K]  database200mers.kmer_distrib  
			├── [403K]  database250mers.kmer_distrib  
			├── [540K]  database50mers.kmer_distrib  
			├── [4.0K]  library    
			│   ├── [ 11G]  library.fna  
			│   └── [ 31M]  prelim_map.txt  
			├── [ 16G]	hash.k2d
			├── [  48]	opts.k2d  
			├── [473K]	taxo.k2d  
			└── [4.0K]  taxonomy  
			    ├── [310K]  names.dmp  
			    ├── [127K]  nodes.dmp  
			    └── [ 31M]  prelim_map.txt

For Zymo Community Standards 10 Mock:
You can download the reference database from Mock Community
Or for Customized Microbiome grouping:
Please check the NCBI official species taxonomic ID to add it to the NCBI taxonomy.
To build a reliable species tree, please use non-redundant genomes for the references. ⚠️

The reference genome for refining the contigs. Should be kept with the reference database.
- Split library.fna which can be found in uhgg_kraken2-db/library/ (see above) to single species fasta file. You can use the script (MetaTrass/tool/fa_split_by_taxid.py).
```
python3 /path/to/MetaTrass/tool/fa_split_by_taxid.py -reffna /path/to/kraken2-db/library/library*.fna -outdir /path/to/single-genome-fa/ 
```
- If you have already the single species fasta file, then please use taxid_genomic.fa as the filename, such as 1104_genomic.fa.
The reference genome size information table ⚠️
- Get each single species genome size as the configure file with two columns. For example :
  
  taxid genome size(bp)
  
  1000 2176737
  
  1001 6092914
  
  1002 3864718
  
  1006 2741290
  
  1007 3003526
  
  1008 1739566
  
  1009 1809170
  
  1010 4528241
  
  1011 2149385
  
  1013 2260187
- Please note that the genome size table file has no header line. ⚠️ ⚠️ ⚠️
- Please refer to the tables in MetaTrass/Config/all_single_species_genome_size.uhgg.txt.
- You can use the script (MetaTrass/tool/ref_genome_size.py) to get species genome size information and generate the above table.
```
python3 /path/to/MetaTrass/tool/ref_genome_size.py -refdir /path/to/single-genome-fa/ 
```

Usages 0.2: Assemble all species.

We provide flexible commands and detailed usage for users.
Usages:

$ python /path/to/software/MetaTrass/Trass.py -h

            ...:::=== MetaTrass v1.20.0 ===:::...
    =================================================
               Metagenomic Taxonomic Reads
                 Assembly Single-Species
    =================================================
    Combination modules:
       GC              ->  Get stLFR Cleandata. 
                           GC is the combination of SplitBarcode and GetCleandata commands.
       TB              ->  Taxonomic Reads And Co-Barcoding Reads Refining (TABrefiner)
       			   TB is the combination of Kraken2Taxon, TXACBrefiner, and ReadID2Fastq commands.
       AP              ->  Single-species Assembly and Contigs Purifying
                           AP is the combination of MetaAssembly and ContigPurify commands

    Independent command :
       SplitBarcode    ->  Convert barcode sequences to digital code
       GetCleandata    ->  Clean data filtered by SOAPfilter
       Kraken2Taxon    ->  Total taxonomic reads using references database by Kraken
       TXACBrefiner    ->  Refining read id using yaxonomic information and superior co-barcoding set
       ReadID2Fastq    ->  Convert the refined read id from total fastq file to each speices
       MetaAssembly    ->  Co-barcoding genome assembly using SUPERNOVA
       ContigPurify    ->  Purify the initial assembly sequences to generate final MAGs based on the references

   # for command specific help info
       python3 Trass.py GC -h
       python3 Trass.py TB -h
       python3 Trass.py AP -h

       python3 Trass.py SplitBarcode -h
       python3 Trass.py GetCleandata -h
       python3 Trass.py Kraken2Taxon -h
       python3 Trass.py TXACBrefiner -h
       python3 Trass.py ReadID2Fastq -h
       python3 Trass.py MetaAssembly -h
       python3 Trass.py ContigPurify -h

Usage for each combination module:

Getting CleanData

 > python Trass.py GC -h
 usage: Trass.py GC [-h] -rawfq1 RAWFQ1 -rawfq2 RAWFQ2 [-thread THREAD] -outdir OUTDIR [-runnow RUNNOW]

 Get stLFR Clean data

 optional arguments:
   -h, --help      show this help message and exit
   -rawfq1 RAWFQ1  Paired-end data: raw 1 fastq.gz
   -rawfq2 RAWFQ2  Paired-end data: raw 2 fastq.gz
   -thread THREAD  the number of threads
   -outdir OUTDIR  Output folder
   -runnow RUNNOW  Run this script immediately

Taxonomic Reads And Co-Barcoding Reads Refining (TABrefiner)

 > python Trass.py TB -h
 usage: Trass.py TB [-h] -cleanfq1 CLEANFQ1 -cleanfq2 CLEANFQ2 [-thread THREAD] [-parallel PARALLEL] -sample SAMPLE -ref_db REF_DB -genome_size GENOME_SIZE [-max_depth 		MAX_DEPTH] [-min_depth MIN_DEPTH] [-pe_length PE_LENGTH] -outdir OUTDIR [-runnow RUNNOW]

 Taxonomic and Barcoding

 optional arguments:
   -h, --help            show this help message and exit
   -cleanfq1 CLEANFQ1    Paired-end data: cleanfq1 fastq.gz
   -cleanfq2 CLEANFQ2    Paired-end data: cleanfq2 fastq.gz
   -thread THREAD        Kraken parameter
   -parallel PARALLEL    The number of parallel species
   -sample SAMPLE        Output FileName Prefix
   -ref_db REF_DB        Taxonomy references database
   -genome_size GENOME_SIZE
 			Reference genome size table file
   -max_depth MAX_DEPTH  Species Maximum-Depth Required Assembly
   -min_depth MIN_DEPTH  Species Minimum-Depth Required Assembly
   -pe_length PE_LENGTH  PE read length of sequencing data
   -outdir OUTDIR        Output folder
   -runnow RUNNOW        Run this script immediately

Single-species Assembly and Contigs Purifying

 > python Trass.py AP -h
 usage: Trass.py AP [-h] [-maprate MAPRATE] [-memory MEMORY] [-maxreads MAXREADS] [-pairdepth PAIRDEPTH] [-PCT PCT] [-IDY IDY] -ref_fa REF_FA [-thread THREAD]
 [-parallel PARALLEL] [-max_depth MAX_DEPTH] [-min_depth MIN_DEPTH] -outdir OUTDIR [-runnow RUNNOW]

 Assembly and Purifying

 optional arguments:
   -h, --help            show this help message and exit
   -maprate MAPRATE      mapping ratio (default=8)
   -memory MEMORY        number of memory use(GB,default = 150)
   -maxreads MAXREADS    maximumreads for supernova(default = 2140000000)
   -pairdepth PAIRDEPTH  filter less X pair barcode reads(default = 2)
   -PCT PCT              Threshold of contig lnegth(0-1)
   -IDY IDY              Threshold of IDY (80 - 100)
   -ref_fa REF_FA        Taxonomic reference genome fasta folder
   -thread THREAD        The number of assembly thread of each species
   -parallel PARALLEL    The number of parallel assembly of single species
   -max_depth MAX_DEPTH  Species Maximum-depth required assembly
   -min_depth MIN_DEPTH  Species Minimum-depth required assembly
   -outdir OUTDIR        Output folder
   -runnow RUNNOW        Run this script immediately

Input Sequencing files:

For stLFR sequencing data
- 1. Raw data
  If you are using raw data, please split the barcode first to get clean data. We offer the MetaTrass GC function to gain clean data
```
python3 /path/to/MetaTrass/Trass.py GC -rawfq1 rawfq.1.fq.gz -rawfq2 rawfq.2.fq.gz -outdir /path/to/output/ -runnow yes 
```
- 1. Clean data
  If you have clean data, please directly run the MetaTrass's TB and AP steps.

For 10X Chromium sequencing data

1. Raw data:
If you have 10X Chromium data, please convert them to stLFR format.

Clean data: like publicated dataset.
Using Athena MOCK20 sequencing data (ATCC MOCK20 10X data) as an examples.

Converting 10X data

Read1 of 10X reads: SRR6760785_1.fastq.gz

The barcode sequence is located on the query name within read1

@SRR6760785.142 K00374:82:hkjyvbbxx:5:1113:28209:40702 length=112
GCCACCTCGGCGATCTCGGGATCGACATGCGCGGTCTCGATGGTGAAGTCGGGCCCTTCCGGCAGCAGGTAGCCGATCTCGCGCAGGAAGGCGGTGTACTCTCTAAGGCTCG
+SRR6760785.142 K00374:82:hkjyvbbxx:5:1113:28209:40702 length=112
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@SRR6760785.143 K00374:82:hkjyvbbxx:5:2220:18690:47172_AAACACCAGACAATAC length=111
CAAGCAAACAGAAGAATGAACAAAATGTGCCTGCTTGTCGGACAAAGGTCTCAATATGCTATTCCCTTTATCTTAAATCTATTTTATCACGGCTCCCTATGTGGCATCCAA
+SRR6760785.143 K00374:82:hkjyvbbxx:5:2220:18690:47172_AAACACCAGACAATAC length=111
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFAJ
@SRR6760785.144 K00374:82:hkjyvbbxx:5:1212:15016:44078_AAACACCAGCGATATA length=111
TCCTCAACGAAGTGCGGGATCTGAAAGCCAAGAATTTCCGCGAGGTAACGCTGCTGGGGCAGAATGTAAACTCCTACCGATACGAACAAAACGGGCGGATCATCCGCTTCC
+SRR6760785.144 K00374:82:hkjyvbbxx:5:1212:15016:44078_AAACACCAGCGATATA length=111
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJ

Read2 of 10X reads: SRR6760785_2.fastq.gz

There is no barcode information within read2

@SRR6760785.142 K00374:82:hkjyvbbxx:5:1113:28209:40702 length=132
CAGGGGTTGCAGGTGGATGTACGTCTGGCCGAGTTCGTGGCTCGCGAGGCGCTGCCGGGCACCGGTGTCAGCGAGGAAGCCTTCTGGGGAGGCCTCGCCGACACGGTGCGTACTCTCGGGCCGCGGAACCGC
+SRR6760785.142 K00374:82:hkjyvbbxx:5:1113:28209:40702 length=132
FFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<FJJFJJJAJJJJFJJJJJJJJJJJFJFFJJJJJFAJJJJJJJJJJJAAFAFJFJJJJJJJFJJJJJAJFJJJJJJJJJJJ<AJJJJJJFJFFJJJJJJ
@SRR6760785.143 K00374:82:hkjyvbbxx:5:2220:18690:47172_AAACACCAGACAATAC length=133
CTGCACACGCTTTTCTATTTCCTGAAAGTTATATTCCATAATCCCTGATTTGTTCGATATATTAACTACTGTTTCTTGCCACAATATTGAAAAAAGCAGTCAGCAAACACACTAACTGCTTTATTTTCAGTTT
+SRR6760785.143 K00374:82:hkjyvbbxx:5:2220:18690:47172_AAACACCAGACAATAC length=133
FFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJAJJJJJJJFFJJJJJJJJJJJJJJJJJFJJJJAFJJJJJJJFJFFFFJJFJJJJAA7F<AAFFFJJJFJJFJJJJ<FJJA---<FJ
@SRR6760785.144 K00374:82:hkjyvbbxx:5:1212:15016:44078_AAACACCAGCGATATA length=100
TGTCCCTCTTGTTGCTCTCCTCGCTCAGGCGATTCTGGAGGGCTATCATCCGATCCAGACGGGCGAGCTTTACCTCTTCGAGTACATCGTCCACAAGGTG 
+SRR6760785.144 K00374:82:hkjyvbbxx:5:1212:15016:44078_AAACACCAGCGATATA length=100
FFFJJJJJ-FJJJJJJJFJJ7JJFFJJJJJJJJJJJJJ<JJJJJJJJ7JJJJJJJJJJFAJJFJF<JJJ7JFJJJJ<FFJJAAJJAFFJJJJJJ<AFJAJ

To stLFR data

Read1 of stLFR reads: SRR6760785_1.stlfr.fastq.gz

The barcode information is appended at the header line. #xxx_xxx_xxx part is the barcode

@SRR6760785.142#0_0_0/1
GCCACCTCGGCGATCTCGGGATCGACATGCGCGGTCTCGATGGTGAAGTCGGGCCCTTCCGGCAGCAGGTAGCCGATCTCGCGCAGGAAGGCGGTGTACTCTCTAAGGCTCG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@SRR6760785.143#AAACACCAGACAATAC/1
CAAGCAAACAGAAGAATGAACAAAATGTGCCTGCTTGTCGGACAAAGGTCTCAATATGCTATTCCCTTTATCTTAAATCTATTTTATCACGGCTCCCTATGTGGCATCCAA
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFAJ
@SRR6760785.144#AAACACCAGCGATATA/1
TCCTCAACGAAGTGCGGGATCTGAAAGCCAAGAATTTCCGCGAGGTAACGCTGCTGGGGCAGAATGTAAACTCCTACCGATACGAACAAAACGGGCGGATCATCCGCTTCC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJ

Read2 of stLFR reads: SRR6760785_2.stlfr.fastq.gz

The barcode information is appended at the header line. #xxx_xxx_xxx part is the barcode

@SRR6760785.142#0_0_0/2
CAGGGGTTGCAGGTGGATGTACGTCTGGCCGAGTTCGTGGCTCGCGAGGCGCTGCCGGGCACCGGTGTCAGCGAGGAAGCCTTCTGGGGAGGCCTCGCCGACACGGTGCGTACTCTCGGGCCGCGGAACCGC
+
FFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<FJJFJJJAJJJJFJJJJJJJJJJJFJFFJJJJJFAJJJJJJJJJJJAAFAFJFJJJJJJJFJJJJJAJFJJJJJJJJJJJ<AJJJJJJFJFFJJJJJJ
@SRR6760785.143#AAACACCAGACAATAC/2
CTGCACACGCTTTTCTATTTCCTGAAAGTTATATTCCATAATCCCTGATTTGTTCGATATATTAACTACTGTTTCTTGCCACAATATTGAAAAAAGCAGTCAGCAAACACACTAACTGCTTTATTTTCAGTTT
+
FFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJAJJJJJJJFFJJJJJJJJJJJJJJJJJFJJJJAFJJJJJJJFJFFFFJJFJJJJAA7F<AAFFFJJJFJJFJJJJ<FJJA---<FJ
@SRR6760785.144#AAACACCAGCGATATA/2
TGTCCCTCTTGTTGCTCTCCTCGCTCAGGCGATTCTGGAGGGCTATCATCCGATCCAGACGGGCGAGCTTTACCTCTTCGAGTACATCGTCCACAAGGTG
+
FFFJJJJJ-FJJJJJJJFJJ7JJFFJJJJJJJJJJJJJ<JJJJJJJJ7JJJJJJJJJJFAJJFJF<JJJ7JFJJJJ<FFJJAAJJAFFJJJJJJ<AFJAJ

Refering the shell code:

gzip -dc  SRR6760785_1.fastq.gz |awk -F ' |_' '{ if(NR%4==1){ if(NF==4){printf("%s#%s/1\n",$1,$3); } else {printf("%s#0_0_0/1\n",$1);}} else if (NR%4==2 || NR%4==0) {print $0;} else{print "+";} } ' |gzip - > SRR6760785_1.stlfr.fastq.gz &
gzip -dc  SRR6760785_2.fastq.gz |awk -F ' |_' '{ if(NR%4==1){ if(NF==4){printf("%s#%s/2\n",$1,$3); } else {printf("%s#0_0_0/2\n",$1);}} else if (NR%4==2 || NR%4==0) {print $0;} else{print "+";} } ' |gzip - > SRR6760785_2.stlfr.fastq.gz &

MetaTrass parameters and notices:

Required parameter:

-rawdata input: raw stLFR sequencing data without splitting the barcode.
-cleandata input: Paired-end clean data, should split the barcode from sequences or convert to stLFR format.
-outdir: the output path to the storage of your results.
-thread: a) in TB option, please set the maximum threads. b)in AP option, please make the value of thread * parallel equal to the maximum threads.
-runnow: decide whether run the command line right now or not.
-ref_db: assigning the prepared kraken2 reference database.
-genome_size: all single-species genome size table followed by reference of your microbiome.
-parallel: determining the number of species to assemble at one time, coordinate with the number of threads to make the decision.

Control parameter:

-min_depth: filtering extreme low depth species, if you want to skip (default: 10).
-max_depth: limiting the reads number of species by barcode unit when over the threshold depth (default: 300).
-PCT: the purifying step required, threshold of contig length (default: 0.5, range: 0-1).
-IDY: the purifying step required, threshold of IDY (default: 1, range: 80 - 100).

Notices

extreme-high depth species:

MetaTrass embedded Supernova as the assembly tool, which gains a longer continuity of genome but takes a longer assembly time compared to other tools. If meet the ultra-high depth species, it will fail to assemble because Supernova has a strict monitoring mechanism of input coverage. In particular, we recommend using other linked-reads' assemblers, such as cloudSPAdes, to re-assemble these species or you can use MetaTrass_lite..

barcode with high interspecies shared:

Some co-barcoding or linked-reads data may crash, which means barcodes are shared by several species. We developed a tool to remove those reads with shared barcodes, named TABrefiner_NOS. You can alias the MetaTrass/tools/TABrefiner_NOS to MetaTrass/tools/TABrefiner. Using the command "ln -s TABrefiner_NOS TABrefiner" is easy to replace. Please keep both versions of TABrefiner.

others co-barcoding data:

If you have linked-reads data, please convert them to stLFR format. Since the number of long fragments with the same barcode in linked-reads is greater than that of stLFR reads, more false-positive reads are introduced into the co-barcoding refined read sets, leading to the unsuccessfully assembling of several species by Supernova. We recommend using other co-barcoding assemblers, such as CloudSPAdes, to re-assemble these species or you can use MetaTrass_lite..

Output files:

Examples of output folder structure:

	.
	├── [4.0K]  all_command_shell/ 
	│   ├── [ 911]  run.log 
	│   ├── [ 419]  stp1.1.splitbarcode.sh 
	│   ├── [ 435]  stp1.2.getcleandata.sh 
	│   ├── [ 631]  stp2.1.kraken2taxon.sh 
	│   ├── [ 355]  stp2.2.TXACBrefiner.sh 
	│   ├── [155K]  stp2.3.ReadID2Fastq.sh 
	│   ├── [155K]  stp3.1.MetaAssembly.sh 
	│   └── [101K]  stp3.2.ContigPurify.sh 
	├── [4.0K]  dir1_cleandata/ 
	│   ├── [ 106]  barcode_freq.txt 
	│   ├── [  98]  lane.lst 
	│   ├── [ 118]  split_reads.1.fq.gz.clean.gz 
	│   ├── [ 118]  split_reads.2.fq.gz.clean.gz  
	│   ├── [ 109]  split_read_stat.log  
	│   └── [  98]  stat.txt  
	├── [4.0K]  dir2_taxonomy/
	│   ├── [ 20K]  ID2FQ/
	│   ├── [4.0K]  kraken/
	│   └── [ 20K]  SSRlist/
	└── [4.0K]  dir3_assembly/
	    ├── [4.0K]  purify/
	    ├── [4.0K]  quast/
	    └── [4.0K]  supernova/

	10 directories, 14 files

MetaTrass performance

Sample	Base number (Gb)	Peak RAM usage (Gb)	CPU max (thread)	Time (min)
H_Gut_Meta01	34.48	50.2	16	5,145
H_Gut_Meta02	35.33	55.1	16	2,631
H_Gut_Meta03	37.88	55.5	16	3,147
P_Gut_Meta01	97.20	71.0	16	8,363

Contributing:

Author Contact: dengli1@genomics.cn; qiyanwei1@genomics.cn; guolidong@genomics.cn;

License:

GNU General Public License v3.0

Issue:

Please submit issues on the github page for MetaTrass.

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
MetaTrass		MetaTrass
Test		Test
bin		bin
config		config
images		images
tools		tools
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Trass.py		Trass.py
VERSION		VERSION
run.sh		run.sh

taxid	genome size(bp)
1000	2176737
1001	6092914
1002	3864718
1006	2741290
1007	3003526
1008	1739566
1009	1809170
1010	4528241
1011	2149385
1013	2260187

License

BGI-Qingdao/MetaTrass

Folders and files

Latest commit

History

Repository files navigation