layout | title | nav_order |
---|---|---|
default |
Alignment II |
4 |
- We reviewed the algorithms for pairwise and multiple sequence alignments (Needleman-Wunsch algorithm)
At the end of today's session, you
- will be able to explain the most widely used algorithms for multiple sequence alignment
{: .note } No pre-class work.
- The Needleman-Wunsch is the magic algorithm that allows us to align two sequences
- We want to expand the pairwise sequence alignment to multiple sequence alignment
- Progressive alignment: the most widely used algorithm (e.g. ClustalW)
- Consistency-based scoring: improvement over progressive alignment by using a more strict score function (e.g. T-Coffee)
- Iterative refinement algorithm: improvement over progressive alignment by doing sequential alignments until convergence of score (e.g. mafft, muscle)
- Compute rooted binary tree (guide tree) from pairwise distances
- Build MSA from the bottom (leaves) up (root)
{: .highlight } What is a rooted binary tree?
Figure 9.9 in Warnow (2018) Computational phylogenetics
- Align all pairs of sequences using the Needleman-Wunsch algorithm
- For every pairwise alignment, we calculate its cost based on the cost of gap (e.g. unit cost) and the cost of substitution (e.g. unit cost)
- We estimate the tree from distances: we will learn this in Lecture 8. Let's pretend we already have the tree
- We build the alignments on the tree from the leaves to the root (bottom-up)
- For the leaves, we build the pairwise alignments for (a,b) and for (d,e) using the Needleman-Wunsch algorithm
- For internal nodes, we need to know how to align alignments
- Perform pairwise sequence alignment via Needleman-Wunsch (check!)
- Calculate the cost of a pairwise sequence alignment (check!)
- Calculate a tree from distances (Lecture 8)
- Perform alignment of alignments (missing)
We need a new concept called "profile".
- Construct profiles
- Define the cost of putting
$a_i, b_j$ together. We want to minimize the expected cost between profiles - Use Needleman-Wunsch to align
$P_1$ and$P_2$ based on the costs
Treat
We define the cost as
{: .highlight }
In-class exercise: What is the
Instructions: Build the cost matrix for the two following profiles. This means that you want to calculate
Assume we got the following cost matrix
a1 a2 a3 a4 a5
b1 [ 1/3 1 1 1 8/15 ]
b2 [ 1 1 1/4 2/3 1 ]
b3 [ 1 0 3/4 1/3 1 ]
b4 [ 1 1 1/4 2/3 1 ]
b5 [ 1 0 3/4 1/3 1 ]
b6 [ 1/3 1 9/12 8/9 11/15]
and we will use it to align the two profiles
{: .note }
The video on canvas has two errors:
{: .highlight }
In-class activity: Let's recall Needleman-Wunsch: we need the
Instructions: Finish Needleman-Wunsch on the two profiles.
- Build the F matrix
- Trace back the alignment from the bottom right corner
Solution: You should get the following alignment which we can translate back to the original sequences.
{: .important } MSA key insights Needleman-Wunsch lies at the core of MSA: if we have two sequences, we align them with Needleman-Wunsch; if we have two alignments, we first convert them to profiles, and then align the profiles with Needleman-Wunsch. The final alignment will depend on the assumptions on the cost of substitutions and costs of gaps
{: .highlight } Homework recap here.
{: .highlight } For next class: Read the paper corresponding to your group (in canvas): ClustalW, MUSCLE, T-Coffee