# Lab09 - Bioinformatics Algorithms
# RNA Secondary Structure Prediction with Nussinov-Jacobson Algorith
# ---
RNA Secondary Structure Prediction with the Nussinov-Jacobson Algorithm
In this notebook, we will implement the Nussinov-Jacobson algorithm to predict, using Dynamic Programming(DP)the secondary structure of an RNA sequence.
# Background
The RNA sequence is given by a string of bases, where Adenine (A) pairs with Uracil (U) and Cytosine (C) pairs with Guanine (G).
The algorithm maximizes the number of possible pairs to infer the most stable secondary structure.

# Discussion09
1. Understand the basic RNA pairing rules.
2. Implement the Nussinov-Jacobson algorithm.
3. Compute the optimal structure for a given RNA sequence.
4. Visualize the RNA secondary structure.
5. After each step (code) insert a Markdown cell bellow and comment about the process and its connection to the materials from Lecture09.
6. Save your notebook (give a specific name, eg. Nussinov_Discussion9 + your initials) and send it to myCourses.

# Step 1: Define the pairing rules for RNA bases

In [None]:
def can_pair(base1, base2):
    """
    Determines if two RNA bases can form a valid pair.
    """
    pairs = {'A': 'U', 'U': 'A', 'G': 'C', 'C': 'G'}
    return pairs.get(base1) == base2


# Step 2: Initialize and fill the DP matrix using Nussinov-Jacobson Algorithm

In [None]:
import numpy as np

def nussinov_algorithm(sequence):
    """
    Computes the DP matrix for RNA secondary structure prediction
    using the Nussinov algorithm.
    """
    n = len(sequence)
    dp_matrix = np.zeros((n, n), dtype=int)
    
    for k in range(1, n):          # Gap size
        for i in range(n - k):     # Start of the subsequence
            j = i + k              # End of the subsequence
            # Rule 1: Unpaired state at j
            unpaired = dp_matrix[i, j-1]
            # Rule 2: Pair i with j if possible
            if can_pair(sequence[i], sequence[j]):
                paired = dp_matrix[i+1, j-1] + 1
            else:
                paired = 0
            # Rule 3: Decomposition into two subsequences
            bifurcation = max(dp_matrix[i, t] + dp_matrix[t+1, j] for t in range(i, j))
            # Optimal substructure
            dp_matrix[i, j] = max(unpaired, paired, bifurcation)
    
    return dp_matrix


# Step 3: Backtrack to find the optimal pairing

In [None]:
def traceback(dp_matrix, sequence, i=0, j=None):
    """
    Recursively finds the optimal base pairs for the RNA sequence
    by tracing back through the DP matrix.
    """
    if j is None:
        j = len(sequence) - 1
    if i >= j:
        return []
    
    if dp_matrix[i, j] == dp_matrix[i, j-1]:
        return traceback(dp_matrix, sequence, i, j-1)
    elif can_pair(sequence[i], sequence[j]) and dp_matrix[i, j] == dp_matrix[i+1, j-1] + 1:
        return [(i, j)] + traceback(dp_matrix, sequence, i+1, j-1)
    else:
        for k in range(i, j):
            if dp_matrix[i, j] == dp_matrix[i, k] + dp_matrix[k+1, j]:
                return traceback(dp_matrix, sequence, i, k) + traceback(dp_matrix, sequence, k+1, j)
    return []


# Step 4: Visualize the RNA secondary structure

In [None]:
import matplotlib.pyplot as plt

def plot_rna_structure(sequence, pairs):
    """
    Visualizes the RNA secondary structure as a circle plot.
    """
    n = len(sequence)
    fig, ax = plt.subplots(figsize=(8, 8))
    circle = plt.Circle((0.5, 0.5), 0.45, color='gray', fill=False)
    ax.add_artist(circle)
    
    for idx, base in enumerate(sequence):
        angle = 2 * np.pi * idx / n
        x = 0.5 + 0.4 * np.cos(angle)
        y = 0.5 + 0.4 * np.sin(angle)
        ax.text(x, y, base, ha='center', va='center', fontsize=12)
    
    for (i, j) in pairs:
        angle1 = 2 * np.pi * i / n
        angle2 = 2 * np.pi * j / n
        x1, y1 = 0.5 + 0.4 * np.cos(angle1), 0.5 + 0.4 * np.sin(angle1)
        x2, y2 = 0.5 + 0.4 * np.cos(angle2), 0.5 + 0.4 * np.sin(angle2)
        ax.plot([x1, x2], [y1, y2], 'r-')
    
    ax.set_aspect('equal')
    plt.axis('off')
    plt.show()


# Step 5: Test the implementation
# Here, we will use a test RNA sequence and apply the algorithm.
# Uncomment to enter your own RNA sequence.


In [None]:
sequence = "GGGAAAUCC"  # Example RNA sequence (you can replace this)
print("RNA Sequence:", sequence)

# Compute DP matrix

In [None]:
dp_matrix = nussinov_algorithm(sequence)

# Find optimal base pairs

In [None]:
pairs = traceback(dp_matrix, sequence)

# Display results

In [None]:
print("Optimal Base Pairs:", pairs)

# Visualize the structure

In [None]:
plot_rna_structure(sequence, pairs)

# ---
# # Activity09
1. Run the notebook and understand the output at each stage.
2. Test with a different RNA sequence and interpret the results.
3. Experiment by modifying the `can_pair` function to allow other possible base pairings.
4. Describe how this JupiterLab might be improved or adapted for better outcome or more complex RNA structures.
5. Save your notebook (give a specific name, eg. Nussinov_Activity9 + your initials) and send it to myCourses.