Skip to content

BIOSPHA is the acronym of BIOlogical Scripts for PHylogeny Analyses.

License

Notifications You must be signed in to change notification settings

bioinfo-guy/biospha

Repository files navigation

BIOSPHA

The term is an acronym of BIOlogical Scripts for PHylogeny Analyses.

It consists of six separated scripts:

** BuildDB ** – This script uses the NCBI taxonomy database to classify and separate sequences from a given a list of FASTA formatted sequences. Any rank parameter from NCBI taxonomy database can be use to filter the desirable sequences.

** DUPWIPE ** – a shell script that uses scripts available at Scriptome to clean duplicate sequence from a file.

** SEARCH ** – Search for a complete taxonomic classification using a GI number, TAXID or scientific name.

** FASTAHDR ** – Using Bioperl components this script rebuild the FASTA sequence header for a more friendly view, enabling custom fields do insert. It also can change the FASTA header for a taxonomic classification.

The last two scripts were intended to build all information necessary for character tracing study using Mesquite (http://mesquiteproject.org/)

** BUILDCHAR ** – using a list of sequences in FASTA format as input, this script taxonomically classify all sequences and use it as character states. It builds the nexus block used as input on mesquite software for simulations of character evolution on a given tree.

** BUILDPTP ** – has the same function of Buildchar, but, it shuffles the characters states n times to build the nexus file used for modified PTP text.

The scripts

BIOSPHA - Updating Databases

You can update all databases using our automated script.

Download it and save as shell script executable and run:

LINUX: > ./update_databases.sh

The duration of the update will depend on you connection speed and your computer power, on average it takes about 2-3 hours to download all fles and generate the local database.

BIOSPHA - BuilDB

This script uses the NCBI taxonomy database to classify and separate sequences from a given a list of FASTA formatted sequences. Any rank parameter from NCBI taxonomy database can be use to filter the desirable sequences.

To use it simply save it as perl script and call:

LINUX:> builddb.pl input.file format

BuildDB accepts all formats included on Bioperl

You should costumize the following part of the script to fit on your needs.

if (($node->rank eq "superkingdom") && ($node->scientific_name eq "Bacteria"))
{
#print $node->rank,"\t", $node->scientific_name, "\n";
# grava no texto
$seq_out->write_seq($seq);
###Put it on an array for counting purposes
push(@in, $node->scientific_name);
} 

BIOSPHA - Taxonomy search

TaxSearch is a perl script to search for a complete taxonomic classification using a GI number, TAXID or scientific name.

It runs on a shell and need some additional database to run correctly.

To run TaxSearch:

To search for taxonomy classification of Homo sapiens

LINUX:> perl search.pl -n "homo sapiens"

To search for classification of a GI number you may

LINUX:> perl search.pl -g 220941669 -t -c

The result will be the complete taxonomic classificaton (due to -t option) and the common name list (-c option)

BIOSPHA - Fasta header

Using Bioperl components this script rebuild the FASTA sequence header for a more friendly view, enabling custom fields do insert. It also can change the FASTA header for a taxonomic classification.

FASTAHDR was primarily written to be used with NCBI fasta input, but, this can be easelly cutomized.

Linux> script.pl file.fasta > outfile.fasta

BIOSPHA - Wipe duplicated sequences

A shell script that uses scripts available at Scriptome (http://archive.sysbio.harvard.edu/csb/resources/computational/scriptome/) to clean duplicate sequence from a file.

The script will generate 2 files: .uni is the fasta unique sequences and .diff all differences found.

To use this you must save this as a shell script, and call:

Linux> ./script.sh file.fasta

BIOSPHA - Build characters

Using a list of sequences in FASTA format as input, this script taxonomically classify all sequences and use it as character states.

It builds the nexus block used as input on mesquite software for simulations of character evolution on a given tree.

To use Buildchar:

LINUX:> perl rebuildfasta.pl infile > outfile

infile = input file in FASTA format

output = altered FASTA file

BIOSPHA - BuildPTP

It has the same function of Buildchar, but, it shuffles the characters states n times to build the nexus file used for modified PTP text (1 and 2).

If you have an slow computer, it may take a while to open the file generated by Buildptp.

  1. Wahlberg N: The phylogenetics and biochemistry of host-plant specialization in Melitaeine butterflies (Lepidoptera: Nymphalidae). Evolution 2001, 55:522-537.
  2. Faith DP, Cranston PS: Could a cladogram this short have arisen by chance alone?: On permutation tests for cladistic structure. Cladistics 1991, 5:235-258.

About

BIOSPHA is the acronym of BIOlogical Scripts for PHylogeny Analyses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages