Skip to content

This repository contains my end-of-degree project. I've implemented a simple version of three of the most known aligners like Bowtie, BWA and BWT-SM

Notifications You must be signed in to change notification settings

DonQwerty/Genome-Aligners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genome-Aligners

The cost of sequencing a living being’s genome has been greatly reduced due to the advent of Next Generation Secuencing (NGS) techniques. This situation has led to the apparition of many new aligners that can find the position of a given NGS sequence in a reference genome. However, it is difficult to choose the aligner that best adapts to a problem given the shortage of fair comparisons between aligners in terms of alignment effectiveness and computational cost. Besides, another problem commonly faced by bioinformaticians is the correct adjustment of aligner’s metaparameters. These difficulties are even greater considering that these aligners are commonly used as black-boxes due to their complexity and the lack of a detailed description and analysis of them.

The objective of this Masters Thesis is to provide a theoretical analysis and implementation of three aligners that will serve to compare them, showing which is the best algorithm for each type of NGS sequencing problem. Additionally, this work includes an analysis to determine the influence of aligners’ metaparameters in their performance. In order to cover both long and short sequence aligners as well as aligners considering either only mutations (mismatches) or both mutations and gaps, three different aligners, namely Bowtie, BWA and BWT-SW, have been selected for the analysis. These aligners have a common structure, the FM-Index, that allows optimal searching in the reference genome with low memory consumption by using the Burrows-Wheeler transform and its properties. This Masters Thesis offers a description of the hidden details of every algorithm, which has been possible through the thorough study of scientific papers where these algorithms are proposed. Both the FM-Index and the aligners were implemented in C++, and their proper functioning was verified by black and white box unit testing.

Once the aligners were implemented, ART software was used to simulate NGS sequences. This software receives as parameters the NGS technology to simulate and other values to control sequences’ length, number of mismatches, and gap probability. The behaviour of these three aligners for different values of these parameters has been compared in terms of execution time and hit rate, varying every aligner’s metaparameters as well. The outcomes of this Masters Thesis are a detailed study of some of the most used alignment algorithms based on the FM-Index, which intends to complement existing literature; a simple implementation of these aligners, which favors their comprehension and comparison; and a quantitative comparative analysis, which allows us to conclude when each aligner is more suitable than others for an specific sequencing problem.

About

This repository contains my end-of-degree project. I've implemented a simple version of three of the most known aligners like Bowtie, BWA and BWT-SM

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published