Skip to content

duaaa/String-Matching-Algorithms-Performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perfomance Analysis of String Matching Algorithms

Description

A simple solution for the Perfomance Analysis of String Matching Algorithms (KMP, BM and Navie algorithm).

This code repository makes the task of searching for a genome sequence easier with a built in .Net MVC user interface. Intially contains in built Severe Acute Respiratory Syndrome (SARS) dataset which can be replaced within the dataset class.

Basically gives you a performance statistics based on the following parameters:
1. Running Time
2. Indexes Found
3. Number of lines executed
4. Space / Memory Consumed

Knuth Morris Pratt (KMP)

is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than once in the sub-pattern, it uses that property to improve the time complexity, as well as in the worst case scenario. The processing time of KMP is O(m) and its time complexity of KMP is O(n).

Naïve pattern searching

is the simplest method among other pattern searching algorithms. It checks for all character of the main string to the pattern. This algorithm is helpful for smaller texts. It does not need any pre-processing phases. The time complexity is O(m*n) where m is the size of pattern and n is the size of the main string

Boyer–Moore

string-search algorithm is an efficient method that is the standard benchmark for practical string-search literature. It was developed by Robert S. Boyer and J Strother Moore in 1977. This algorithm runs faster as the pattern length increases. Running time of this algorithm is O ( n m )

Our project contains three models in the Utils folder BM.cs , KMP.cs , Naive.cs. These files contains classes responsible for the actual algorithm logic.

Dataset

Our dataset class is responsible for loading the dataset from csv file. This file contains quarter of the Severe Acute Respiratory Syndrome (SARS) dataset obtained from https://www.kaggle.com/spowers/sarssequencedata?select=sars_combined_sequences.csv

Futher, Our home controller is set as the startup call to our the search models. Here user is provided access to a search interface with options for algorithms to choose. The search is carried out according to this option.

In the searchInfo.cs class we have three classes Performance in which we have parameters of no of statements executed as we are doing in list and the parameter to check the memory consumption. Search Info is also a class which contains the parameters that get the use's choice for the algorithm that is to be executed. Statement class which contain a parameter statementnumber which counts the no of times a statement is executed.

Installation

Before executing this project, you can add any other geneome sequence file by changing the value of DataSetFile key in webConfig file. And placing your dataset csv file in Models\DataSets folder.

Support

If you need any help, you can contact us through our emails. seharejaz1997@gmail.com OR mduaaas@gmail.com

Contributing

As this is a private project no one from the outside is allowed to contribute in it.

Authors

Made by Dua and Sahar.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published