Added script to extract DNA sequences (as well as 5' or 3' regions if specified) from a FASTA file using a BLAST output file #79

Merged
merged 5 commits into from Feb 4, 2016

Conversation

2 participants
@Buuntu
Contributor

Buuntu commented Aug 5, 2014

Added a personal script to extract a DNA sequence from a FASTA file using a BLAST output file.
Expects at least two arguments, the BLAST file and the FASTA file. There are a number of optional arguments that are explained in the script.

This script is especially useful when trying to extract sequences with variance (hence the BLAST search beforehand) from FASTA files.
For example, say that you are trying to extract a given gene and 2000 base pairs 5' to it from 20 different genomes. All you have is one gene sequence, however. By doing a BLAST search between each of the genomes and the gene and then using this script, you can extract the sequences that you are interested in.

The script also has options to extract a specified 3' or 5' sequence from the FASTA file, as well as an e-value cut off. The final output is the extracted sequence in FASTA format.

Is this useful/generic enough to be included in the scripts directory? The script is well tested and takes command-line arguments.

Buuntu added some commits Aug 5, 2014

Add a personal script to extract a DNA sequence from a FASTA file usi…
…ng a BLAST output file.

Expects at least two arguments, the BLAST file and the FASTA file.

This script is especially useful when trying to extract sequences with variance (hence the BLAST search beforehand) from FASTA files.

For example, say that you are trying to extract a given gene and 2000 base pairs 5' to it from 20 different genomes.  All you have is the gene sequence from one of those genomes, however.  By doing a BLAST search between each of the genomes and the gene you have and then using this script, you can extract the sequences that you are interested in.

The script also has options to extract a specified 3' or 5' sequence from the FASTA file, as well as an e-value cut off.

Output is the sequence in FASTA format.
@cjfields

This comment has been minimized.

Show comment
Hide comment
@cjfields

cjfields Oct 25, 2014

Member

Can you move this into the SearchIO-specific folder (or a related one)? There is a possibility we will split out some related code into separate repos, and it would make sense to migrate scripts along with them (and these would be easier to find if they are present in the appropriate script directory). Thanks!

Member

cjfields commented Oct 25, 2014

Can you move this into the SearchIO-specific folder (or a related one)? There is a possibility we will split out some related code into separate repos, and it would make sense to migrate scripts along with them (and these would be easier to find if they are present in the appropriate script directory). Thanks!

@cjfields

This comment has been minimized.

Show comment
Hide comment
@cjfields

cjfields Feb 4, 2016

Member

Hi @Buuntu, I was waiting on merging this per my request above, but I'll go ahead and merge this in and will be moving the script to a SearchIO-specific directory. Thanks for the contribution!

Member

cjfields commented Feb 4, 2016

Hi @Buuntu, I was waiting on merging this per my request above, but I'll go ahead and merge this in and will be moving the script to a SearchIO-specific directory. Thanks for the contribution!

cjfields added a commit that referenced this pull request Feb 4, 2016

Merge pull request #79 from Buuntu/master
Added script to extract DNA sequences (as well as 5' or 3' regions if specified) from a FASTA file using a BLAST output file

@cjfields cjfields merged commit 587bdea into bioperl:master Feb 4, 2016

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment