-
Notifications
You must be signed in to change notification settings - Fork 2
vsearch
Commands for installing vsearch
Cloning the repo. You will need Git, autoconf and automake to clone the repository and install VSEARCH. On a Debian-based Linux system, the three packages can be installed using the commands:
sudo apt-get install git autotools-devTo clone the repository and install VSEARCH use the following commands:
$ git clone https://github.com/torognes/vsearch.git
$ cd vsearch
$ ./autogen.sh
$ ./configure
$ make
$ sudo make installBinary distribution. If cloning/compiling fails, you may directly download the pre-compiled VSEARCH binary for your system. If you are on a Linux system:
wget https://github.com/torognes/vsearch/releases/download/v2.3.0/vsearch-2.3.0-linux-x86_64.tar.gz
tar xzf vsearch-2.3.0-linux-x86_64.tar.gzOr, if you are on a MAC system:
wget https://github.com/torognes/vsearch/releases/download/v2.3.0/vsearch-2.3.0-osx-x86_64.tar.gz
tar xzf vsearch-2.3.0-osx-x86_64.tar.gzYou will now have the binary distribution in a folder called vsearch-2.3.0-linux-x86_64 in which you will find three subfolders; bin, man and doc. We recommend making a copy or a symbolic link to the vsearch binary bin/vsearch in a folder included in your $PATH, and a copy or a symbolic link to the vsearch man page man/vsearch.1 in a folder included in your $MANPATH. The PDF version of the manual is available in doc/vsearch_manual.pdf.
Overview. VSEARCH includes commands to perform de novo clustering using a greedy and heuristic
centroid-based algorithm with an adjustable sequence similarity threshold specified with the --id
option (e.g., --id 0.97). The input sequences are either processed in the user supplied order
(--cluster_smallmem) or pre-sorted based on length (--cluster_fast) or abundance (--cluster_size).
Method. Each input sequence is used as a query against an initially empty database of centroid sequences. The query sequence is clustered with the first centroid sequence found with similarity equal to or above the threshold (--id). If no matches are found, the query sequence becomes the centroid of a new cluster and is added to the database. If --maxaccepts is higher than 1 (default: 1), several centroids with sufficient sequence similarity may be found and considered. By default, the query is clustered with the centroid presenting the highest sequence similarity (distance-based greedy clustering), or, if the --sizeorder option is used, the centroid with the highest abundance (abundance-based greedy clustering).
vsearch --cluster_fast BR_cob_57ind_no_outgr.fasta --id 0.97 --centroids centroids-cf.fa --msaout msaout-cf.txtvsearch --cluster_smallmem BR_cob_57ind_no_outgr.fasta --usersort --id 0.97 --centroids centroids-sm.fa --msaout msaout-sm.txtvsearch --cluster_size BR_cob_57ind_no_outgr.fasta --id 0.97 --centroids centroids-sz.fa --msaout msaout-sz.txtNote: When using ``--cluster_smallmem, option --usersort` indicates that sequences are not pre-sorted by length.
| Filename | Description |
|---|---|
| CHANGE FILES | |
| Anolis.fas | Input sequences |
| Filename | Description |
|---|---|
| [centroids-cf.fa](place link) | Centroids for --cluster_fast
|
| [centroids-sm.fa](place link) | Centroids for --cluster_smallmem
|
| [centroids-sz.fa](place link) | Centroids for --cluster_size
|
| [msaout-cf.fa](place link) | Clusters for --cluster_fast
|
| [msaout-sm.fa](place link) | Clusters for --cluster_smallmem
|
| [msaout-sz.fa](place link) | Clusters for --cluster_size
|
Check the VSEARCH wiki page on clustering.