Skip to content

YanCCscu/MEANGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


MEANGS: MitoDNA extending assembler from NGS data


Function

The MEANGS is a seed-free software that applies trie-search to extend contigs from self-discovery seeds and assemble mitogenome, from NGS data. Use Python3 to run it.


Quick start

  • A compiled software is provided, and you can directly download it and use it with the following command:
git clone https://github.com/YanCCscu/MEANGS.git
cd MEANGS
./meangs.py --silence -1 1.fq.gz -2 2.fq.gz -o OutBase -t 16 -i 350
  • For MEANGS (v1.0), only paired-end data are available to assemble for a mitogenome.

Conda availability

  • Since April 2023, MEANGS is available in Anaconda and you can install it with the following command:
conda install -c yccscucib meangs

Download and Install

If you use an old version of linux or ubuntu. You may need to download, install and compile MEANGS as described following.

Requirement

  • gcc version >= 8.3.1 you can simply install a altnative version of gcc with the following command:
sudo yum -y install devtoolset-8-gcc devtoolset-8-gcc-c++ devtoolset-8-binutils
  • pcre >= 8.41 download the pcre from here and install with the following command:
tar -xzvf  pcre-8.41.tar.gz
cd pcre-8.41
./configure --enable-utf8
sudo make && sudo make install

You can install the pcre in a own path if you do not have a root permission by add the --prefix option,
and enable static compilation by adding --enable-static

./configure --enable-utf8 --enable-static --prefix /path/to/pcre

Install MEANGS

git clone https://github.com/YanCCscu/MEANGS.git
cd MEANGS/tools/assembler_v1.0/src
make
cp assembler ../../

we also offer a docker image here and you can run the example as following:

docker run -it --rm -w /home/meangs -v $PWD:/home/meangs bioinfodocker/meangs:latest meangs.py 
-1 SRR039541.3_1.clean.fq.gz -2 SRR039541.3_2.clean.fq.gz -o HumanMito -t 16 -n 2000000 -i 300 --deepin

Usage

MitoDNA extending assembler from NGS data

usage: meangs.py [-h] [-1 FQ1] [-2 FQ2] [-o OUTBASE] [-t THREADS] [-i INSERT]
                 [-q QUALITY] [-n NSAMPLE] [-s SEQSCAF]
                 [--species_class {A-worms,Arthropoda,Bryozoa,Chordata,Echinodermata,Mollusca,Nematoda,N-worms,Porifera-sponges}]
                 [--deepin] [--clip] [--keepIntMed] [--keepMinLen KEEPMINLEN]
                 [--skipassem] [--skipqc] [--skiphmm] [--skipextend]
                 [--silence]

optional arguments:
  -h, --help            show this help message and exit
  -1 FQ1, --fq1 FQ1     Input paired end _1.fq[.gz] files,seprated by ','
  -2 FQ2, --fq2 FQ2     Input paired end _2.fq[.gz] files,seprated by ','
  -o OUTBASE, --outBase OUTBASE
                        Output prefix of dir and files
  -t THREADS, --threads THREADS
                        Analysis threads
  -i INSERT, --insert INSERT
                        library insert length
  -q QUALITY, --quality QUALITY
                        Threshold value for low base quality
  -n NSAMPLE, --nsample NSAMPLE
                        Number of reads sampled from input reads, default 0
                        (keep all reads)
  -s SEQSCAF, --seqscaf SEQSCAF
                        specific a sequences files(fasta) just for annotation
  --species_class {A-worms,Arthropoda,Bryozoa,Chordata,Echinodermata,Mollusca,Nematoda,N-worms,Porifera-sponges}
                        taxon of species belong to
  --deepin              run deeper mode to assembly mitogenome
  --clip                detect circle clip point for mitogenome
  --keepIntMed          keep the intermediate files
  --keepMinLen KEEPMINLEN
                        Threshold of reads length to keep after remove low
                        quality bases
  --skipassem           skip the process of assembly
  --skipqc              skip the process of QC
  --skiphmm             skip the process of hmmer
  --skipextend          skip the process of extend in deepin mode
  --silence             run the program in silence mode, the standard output
                        will redirect to specific log file

Example:
#run meangs in a quick mode with paired-end library of insert size 350bp, 16 threads are called.
	meangs.py --silence -1 1.fq.gz -2 2.fq.gz -o OutBase -t 16 -i 350
#run meangs in a 'deepin mode' the first 2000000 reads in both input fastq files will be used the construct mito-genome
	meangs.py -1 R1.fastq.gz -2 R2.fastq.gz -o A3 -t 16 -n 2000000 -i 300 --deepin

Here, A-worms stands for Annelida segmented worms, and N-worms stands for Nemertea ribbon worms.


Example

The example directory contains pair-end NGS data for human SRA accession: SRR039541.3.
We keep the first 2000000 reads in each of the pair files after QC. The input files are uploaded to figshare,
and can be automatically downloaded with the following commands:

cd example
sh run_test.sh

the scripts will download the input files(about 340M) and run the following test scripts. typically, the running will finish in 10 minutes:
../meangs.py -1 SRR039541.3_1.clean.fq.gz -2 SRR039541.3_2.clean.fq.gz -o HumanMito -t 16 -n 2000000 -i 300 --deepin

Output

All output files were stored in one directory assigned by the -o option.
The {prefix}_deep_detected_mito.fas is the finally assembled mitochondrial genome. The mito_cliped.fas will be in the previous level folder if "--clip" works. Genes in the mitochondrial genome are annotated automatically and stored in the file {prefix}_hmmout_tbl_sorted.gff.

Note

  • The argument -n (nsample) is strongly recommended to reduce runtime and memory usage. It is wise to test different numbers of "-n" to obtain a completed mitogenome.
  • Supplementary information and more test results can be found here.
  • Based on enough reads given, for highly repetitive regions in a mitogenome, it is easy for MEANGS to assemble it as long as several reads contain the whole repetitive region. If not, MEANGS always break the contig at here, which can help the user to know a structural anomaly.