Skip to content

CominLab/FSH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#FSH - Fast Spaced-seed Hashing

##Description

Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k-mers can be rapidly computed by exploiting the large overlap between consecutive k-mers, spaced seeds hashing is usually computed from scratch for each of the positions in the input sequence, thus resulting in slower processing.

Fast Spaced-seed Hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing.

In our experiments, FSH can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6x to 5.3x, depending on the structure of the spaced seed.

Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.

#Download NB:This software tests only the computation time and give the results in files
FSH_download

##Compilation: Open terminal and go to FSH/Release/ and then:
make all

If you need to test in parallel mode you must add -fopenmp option on compilation in every makefiles in Release's subdirectories.

###Option compilation Y = Number of core
-jY

#Algorithm Option In the following paragraph is described the input file's structure and the parameters available in the FSH algorithm.

##File accepted and structure File accepted have the following structure:
Structure file .fna example:

>IDENTIFICATION
ATAATTGGCAAGTGTTTTAGTCTTAGAGAGATTCTCTAAGTCTAACTTGAACTCAATTTGGAATCATTTCCCAATTTTTA

Structure .fastq exemple:

@IDENTIFICATION #0/1
CCCATGCCTTTAGCCAAATTCACGGTTTGATCACCCCTAAAACCAGCCAATATACCGAAGTGGAAGCCAGCATAAATGGCCTCAATATTACCGAAATGGAT
+
HBIIIIIIIHHDIHIGIIGGIHIIGIDIIIIBIHI@IIH@HIIHIIF5IIHEII>BDAHIBIEDBEIDG@HAEH*I@AEI=#CE?G17EEDHDEB@@?#8B

In this VERSION, paired-end reads are passed to the algorithm in separeted file in which the reads are paired in the same order in which are writen. So we raccomend to control the paired-end read if they are paired in the correct manner.

##Parameter -si File path single-end reads
-pi File paths paired-end reads
-dirO Path directory output files. Default: output/
-q Enter a spaced seeds path as -q . Every file's line must contain a spaced seed. Ex. 1***1*111. 1 is the simbol considered, any others are not valid.
Default spaced seeds are:
1111011101110010111001011011111 -> CLARK-S paper
1111101011100101101110011011111 -> CLARK-S paper
1111101001110101101100111011111 -> CLARK-S paper
1111010111010011001110111110111 -> rasbhari minimizing overlap complexity
1110111011101111010010110011111 -> rasbhari minimizing overlap complexity
1111101001011100111110101101111 -> rasbhari minimizing overlap complexity
1111011110011010111110101011011 -> rasbhari maximizing sensitivity
1110101011101100110100111111111 -> rasbhari maximizing sensitivity
1111110101101011100111011001111 -> rasbhari maximizing sensitivity

#Run Calls algorithm where is compiled:
./FSH -si ../TestInputFile/long_example_1.fna -q 11101110110110111101
./FSH -pi ../TestInputFile/short_example_1.fna.1 ../TestInputFile/short_example_1.fna.2 -q 11101110110110111101
./FSH -pi ../TestInputFile/short_example_2.fna.1 ../TestInputFile/short_example_2.fna.2 -q 11101110110110111101 .dirO /home/user/desktop/

#Publication S.Girotto, M.Comin, C.Pizzi
FSH: fast spaced seed hashing exploiting adjacent hashes
Algorithms for Molecular Biology 2018, 13:8
DOI: https://doi.org/10.1186/s13015-018-0125-4

#License MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published