Skip to content

aakanksha12/Extract_seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Extract_seq

This pipeline is to extract loci from given taxa using nhmmer and blast

This program requires folder named as "genomes","queries" and "hmm_prof_align" in current working directory.This pipeline requires blast and hammer either in root directory or CWD.

It requires 3 command line arguments passed while using it.

perl Extract_seq.pl current_working_directory_path text_file_database_list text_file_queries_list

current working directory path: It is the path of current working directory which contains genomes and queries folder

text_file_database_list: It is text file that contains list of genomes in genomes folder without ".fasta". Each line should contain one genome name. It should be located in current working directory

text_file_queries_list: It is a text file that contains list of queries in the query folder without ".fasta". Each line should contain one query name.It should be located in current working directory

genomes folder : It contains all the genome files after processing them using blast makedb
queries folder : It contains all the query files in .fasta format hmm_prof_align folder : It will store the alignment files in ".fasta" format(used by nhmmer) for all the loci

Following directories are created in CWD as part of the program: gallus_blast: stores blast result with the given loci and gallus genome other_blast: stores blast result of given loci and all genomes extract_seq: stores extracted sequence and scaffold of best hit blast_again: stores blast result of extracted sequence and gallus genome

About

Data Extraction Pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages