## Codon Optimization using Ordered Reshuffling (COOR)

In the era of protein design and synthesis, it is important to optimize codons to achieve desired protein expression—if not optimal, at least sufficient expression in an organism of interest. Nuclear-encoded ribosomal and proteasome proteins are highly translated (PMID: 22068332). At the same time, some proteins may be poorly translated, or in some cases, a few coding sequences (CDS) of pseudogenes may never get translated. Therefore, it is safe to assume that not all coding sequences are constrained to evolve for higher translation efficiency.

Codon Optimization using Ordered Reshuffling (COOR) hypothesizes that proteins which are highly translated can be used for better codon optimization.

**COOR - Steps**
This tool takes a DNA coding sequence from a known protein (Protein 1), translates it into amino acids, and records which codons were used for each amino acid in the order they appear. It creates a codon database where each amino acid is linked to its list of codons in the order from the original gene. Then, when you input a new amino acid sequence (Protein 2 - for the gene to be codon optimized), the tool builds a matching DNA sequence by reusing codons from the database following the same order as the Protein 1. If an amino acid occurs more times than codons available, it loops back to the start of the list. The final output is a DNA sequence that mirrors the codon usage pattern of the original gene.

**Protein 1:** It could be a protein or concatenated proteins of interest from the organism of interest
Examples: ACTIN, TUBULIN, RNA or DNA Polymerases, Nuclear encoded ribosome or RUBISCO, proteasome subunits, CYCLINS, or any protein which you know or think might get translated better.

**Protein 2:**
Protein sequence to be codon optimized.




In [None]:
# Step 1: Input and validate DNA sequence
def is_valid_dna(seq):
    return all(base in 'ATGC' for base in seq.upper()) and len(seq) % 3 == 0

protein1_dna = input("Enter coding DNA sequence of highly translated protein from your species of interest (Eg: ACTIN, Ribosome protein, TITIN, RUBISCO) (must be multiple of 3 & Paste and Enter) ").upper()

while not is_valid_dna(protein1_dna):
    print("❌ Invalid input. Must be only A, T, G, C and length divisible by 3.")
    protein1_dna = input("Re-enter coding DNA sequence: ").upper()

print(f"✅ DNA accepted. Length: {len(protein1_dna)}")


In [None]:
!pip install biopython

In [None]:
# Step 2: Translate DNA to protein using standard codon table
from Bio.Seq import Seq
from Bio.Data import CodonTable

standard_table = CodonTable.unambiguous_dna_by_name["Standard"]
protein1_aa = str(Seq(protein1_dna).translate(to_stop=False))

print(f"Translated amino acid sequence: {protein1_aa}")

In [None]:
# Step 3 & 4: Build codon database from DNA sequence
from collections import defaultdict

codon_db = defaultdict(list)
for i in range(0, len(protein1_dna), 3):
    codon = protein1_dna[i:i+3]
    aa = protein1_aa[i // 3]
    codon_db[aa].append(codon)

# Count occurrences
codon_counts = {aa: len(codons) for aa, codons in codon_db.items()}

print("✅ Codon database created.")


In [None]:
# Step 5: Print codon database as a table
import pandas as pd

# Find max depth
max_len = max(len(codon_db[aa]) for aa in codon_db)
amino_acids = sorted(codon_db.keys())

# Create dataframe
df = pd.DataFrame({aa: codon_db[aa] + [""] * (max_len - len(codon_db[aa])) for aa in amino_acids})
print("Codon Database:")
display(df)


In [None]:
# Step 6: Input new amino acid sequence
valid_aas = set(standard_table.protein_alphabet)
protein2 = input("Enter single-letter amino acid sequence (Protein to be codon optimised) (Protein 2): ").upper()

while not all(aa in codon_db for aa in protein2):
    print("Error: Some amino acids not found in the database. Re-check or re-input.")
    protein2 = input("Re-enter Protein 2 sequence: ").upper()

print(f"✅ Protein 2 accepted: {protein2}")


In [None]:
# Step 7: Reconstruct using codons from codon_db, cycling if needed
reconstructed_dna = []
usage_tracker = defaultdict(int)

for aa in protein2:
    codons = codon_db[aa]
    index = usage_tracker[aa] % len(codons)  # loop back if exceeded
    reconstructed_dna.append(codons[index])
    usage_tracker[aa] += 1

print(f"Codon-optimized DNA sequence (length {len(reconstructed_dna)*3}):")
print("".join(reconstructed_dna))
