GitHub - ador/mpd-alignment-sets

This repository contains selected alignment datasets from OXBench, used in "Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs" by Joseph L Herman, Ádám Novák, Rune Lyngsø, Adrienn Szabó, István Miklós and Jotun Hein. (2014)

Data selection procedure

Selecting a large reference alignment From OXBench version 1.3 one of the largest alignments was chosen, which was dataset number 12. (After unpacking the tar.gz file above, it can be found in: "oxbench_1_3/data/align/fasta/12" ) It contains 122 protein sequences.
Selecting diverse subsets of growing sizes So that we can conduct measurements of running times of our method, and be less affected by different characteristics of alignments, we chose to select subsets of the same protein family. To avoid having too highly similar proteins in a subset, we used a simple algorithm to choose a maximally dissimilar protein (least similar to the already selected ones) in each round. We run this greedy algorithm to produce protein subsets of the 12 set of sizes: 15, 30, 60; and also included the full set of 122 sequences in our measurement dataset. This way the respective sizes of subsets (almost) double, which makes drawing consequences of measures running times easier.
All-gap columns have to be removed In order to get valid sub-alignments, we removed the alignment columns that contained gaps only.

Results

The final subsets that we selected contain 15, 30, 60 and 122 sequences respectively. The larger ones always include all protein sequences from the smaller subsets and extend them further.

To find the alignments have a look at the "data" folder.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

README.md

README.md

Repository files navigation

Data selection procedure

Results

About

Releases

Packages

ador/mpd-alignment-sets

Folders and files

Latest commit

History

Repository files navigation

Data selection procedure

Results

About

Resources

Stars

Watchers

Forks