# Finding a Shared Spliced Motif

## Background Info

### Locating Motifs Despite Introns

* **Motif:** An interval of nucleotides (in DNA/RNA) or of amino acids (in proteins) that has biological importance. It could represent an important functional unit of a protein shared by many members of the same species, or a rare gene encoding a disorder. Motif is usually represented by a substring of a genetic string that we'd like to locate.

We can search through a database containing multiple genetic strings (a DNA, RNA, or amino acid) to find a longest common substring of these strings, which serve as a **motif** shared by two strings. However, coding regions of DNA are often interspersed by introns that do not code for proteins. Therefore, there's a need to locate shared motifs that are separated across exons (motifs don't have to be contiguous). To model such situation, we use subsequences.

## Problem

**Given:** Two DNA strings $s$ and $t$ (each having length at most 1kbp) in FASTA format. The two strings are not necessarily equal in length.<br>
**Return:** A longest common subsequence of $s$ and $t$.

## Solution Explained

We can use dynamic programmiing to solve this problem, as this problem has an optimal substructure and overlapping subproblems. It has an optimal substructure as the longest common subsequence of a substring of $s$ and $t$ can be combined of the previous longest common subsequence to obtain the optimal solution. The longest common subsequence problem also has overlapping subproblems as it can be broken down into subproblems where each subproblem is repeated and a recursive algorithm can be used to solve the same subproblem. The subproblem consists of 2 possible scenarios:<br>
Let's only focus at the very last character of strings $s$ and $t$ to obtain the length of the longest common subsequence.<br>
1. If the last character of $s$ and that of $t$ are the same characters, then we can add 1 to the length of the longest common subsequence value, cut off the last character from both $s$ and $t$, to obtain the longest common subsequence of those cut-off strings $s$ and $t$.
2. If the last character of $s$ and that of $t$ are not the same, then, we can try to look for the length of the longest common subsequence between $s$ and $t$ where $t$ has its last character cut off, and for the length of the longest common subsequence between $s$ and $t$ where $s$ has its last character cut off. Then, we can obtain the maximum between the 2 length values to obtain the length of the longest common subsequence between $s$ and $t$.