This project implements two solutions for the Sequence Alignment problem:
- Basic version using Dynamic Programming (DP)
- Memory-efficient version that combines DP with Divide-and-Conquer
The project aims to align two sequences of symbols (A, C, G, T) by minimizing the cost of alignment, which includes gap penalties and mismatch costs.
Given two strings X
and Y
, where X
consists of symbols x1, x2, ..., xm
and Y
consists of symbols y1, y2, ..., yn
, the goal is to find the optimal alignment between these strings. The alignment cost includes:
- Gap Penalty (δ): A fixed cost for each unmatched position.
- Mismatch Costs (αpq): Costs for matching different symbols
p
andq
.
The task is to implement the basic DP solution and a memory-efficient version, run them on provided test sets, and compare their performance.
The input strings are generated from a base string and a series of steps that iteratively insert copies of the string within itself at specified indices.
- Gap Penalty (δ): 30
- Mismatch Costs:
- A: [0, 110, 48, 94]
- C: [110, 0, 118, 48]
- G: [48, 118, 0, 110]
- T: [94, 48, 110, 0]
Uses a dynamic programming approach to compute the optimal alignment cost and sequences.
Combines dynamic programming with divide-and-conquer to reduce memory usage while maintaining the correctness of the alignment.
- Python 3.x
- Required Python packages:
psutil
./basic.sh input.txt output.txt
Running the Memory-efficient Algorithm
./efficient.sh input.txt output.txt
Results
Summary.pdf file with:
- Data points output table generated from provided input files.
- Line graphs comparing CPU time and memory usage vs. problem size for both solutions.
- Insights and observations from the results.