<img src="https://github.com/ChangLabSNU/VaxLab/blob/main/logo/VaxLab_octopus.png?raw=true" height="300" align="right" style="height:240px">

## Welcome to VaxLab! 🧬💉 <a name="page_start"></a>

VaxLab is an integrated platform for designing optimized mRNA vaccine candidates. This notebook guides you through the process of:

1. 📋 Inputting your antigen sequence (protein, DNA, or RNA)
2. 🧪 Selecting optimization strategies for the coding sequence (CDS)
3. 🔄 Choosing appropriate untranslated regions (UTRs)
4. 🎯 Generating and visualizing your optimized mRNA construct

By the end, you'll have a complete mRNA sequence optimized for stability, expression, and immunogenicity.

The [**User Guide**](#user_guide) and [**FAQ**](#FAQ) are located at the very bottom of the page.

**Note:** ⏱️ Run time is typically 1-3 minutes for average-sized proteins.



## Quick Start with Provided Example Data

If you're new to VaxLab, we recommend starting with a example data run to see how the platform works:

1. ✅ Check the box below to enable the example run
2. 🔄 Click `Runtime` → `Run all` in the menu above
3. 👀 Observe the optimization process with the example sample (NanoLuciferase sequence)

The example run uses LinearDesign for optimization with default settings and takes approximately 3 minutes to complete.

In [None]:
#@markdown
EXAMPLE_RUN = False #@param {type:"boolean"}

if EXAMPLE_RUN:
  sequence_name = 'Example-run_NLuc'
  protein_sequence = 'MAVYPYDVPDYAGYPYDVPDYAGSYPYDVPDYAGSGVFTLEDFVGDWRQTAGYNLDQVLEQGGVSSLFQNLGVSVTPIQRIVLSGENGLKIDIHVIIPYEGLSGDQMGQIEKIFKVVYPVDDHHFKVILHYGTLVIDGVTPNMIDYFGRPYEGIAVFDGKKITVTGTLWNGNKIIDERLINPDGSLLFRVTINGVTGWRLCERILA'
  CDS_optimization_tool = 'LinearDesign'
  lambda_value_for_LinearDesign = 4

  UTR_optimization = False
  UTR_5prime = 'Human alpha-globin RNA with an optimized Kozak sequence (BNT161b2/BioNTech)'
  UTR_3prime = 'The amino-terminal enhancer of split (AES) mRNA and the mitochondrial encoded 12S ribosomal RNA (BNT161b2/BioNTech)'

  print('########## EXAMPLE RUN ##########')
  print('')
  print('The example run begins.')
  print('')
  print('------------------------------')
  print('')
  print('- CDS: NanoLuciferase')
  print('- CDS optimization tool: LinearDesign')
  print('- LinearDesign lambda value: 4')
  print("- 5' UTR: Human alpha-globin RNA with an optimized Kozak sequence (BNT161b2/BioNTech)")
  print("- 3' UTR: The amino-terminal enhancer of split (AES) mRNA and the mitochondrial encoded 12S ribosomal RNA (BNT161b2/BioNTech)")
  print('')
  print('------------------------------')
  print('')
  print('- ETA : 1 minute')

## ✏️ Enter values for step 1️⃣, 2️⃣, 3️⃣ and 4️⃣

In [None]:
#@markdown ## Step 1️⃣: Name Your Sequence
#@markdown Enter a name for your sequence. This identifier will be used in all output files and reports.

if EXAMPLE_RUN == False:
  sequence_name = "" #@param {type:"string"}
#@markdown - 💡 **Example:** `SARS-CoV-2_Spike`, `mGFP`, `HA_H1N1`

In [None]:
#@markdown

#@markdown ## Step 2️⃣: Enter Your Sequence
#@markdown Input either a **protein sequence** or a **DNA/RNA coding sequence** (CDS).
#@markdown - For **proteins**: Enter the amino acid sequence in single-letter code format (ACDEFGHIKLMNPQRSTVWY).
#@markdown - For **DNA/RNA**: Enter the nucleic acid sequence in single-letter code format (ACGT for DNA / ACGU for RNA).

#@markdown The system automatically validates your input and translates DNA/RNA if provided.

#@markdown **Note:** For vaccine design, protein sequences are the most common starting point.

if EXAMPLE_RUN == False:
  CDS_sequence = "" #@param {type:"string"}
#@markdown - 💡 **Example** \
#@markdown (Protein) `MAVYPYDVPDYAGYPYDVPDYAGSYPYDVPDYAGSGVFTLEDFVGDWRQTAGYNLDQVLEQGGVSSLFQNLGVSVTPIQRIVLSGENGLKIDIHVIIPYEGLSGDQMGQIEKIFKVVYPVDDHHFKVILHYGTLVIDGVTPNMIDYFGRPYEGIAVFDGKKITVTGTLWNGNKIIDERLINPDGSLLFRVTINGVTGWRLCERILA` \
#@markdown (DNA) `ATGGCCGTGTATCCTTACGACGTGCCCGACTACGCCGGTTACCCCTATGATGTGCCC` \
#@markdown (RNA) `AUGGCCGUGUAUCCUUACGACGUGCCCGACUACGCCGGUUACCCCUAUGAUGUGCCC`


  def translate_sequence(nucleotide_seq):
    seq = Seq(nucleotide_seq.upper())

    if "U" in seq:
        seq = seq.transcribe().back_transcribe()  # U → T
    else:
        seq = Seq(nucleotide_seq.upper())
    try:
        # translate
        protein = seq.translate(to_stop=False)
        return str(protein)
    except Exception as e:
        return f"Translation error: {e}"

  # if CDS_sequence == "" and EXAMPLE_RUN == False:
  #   !pip install biopython > /dev/null 2>&1
  #   from Bio.Seq import Seq
  #   DNA_or_RNA_CDS = "" #@param {type:"string"}
  #   protein_sequence = translate_sequence(DNA_or_RNA_CDS)

  # Check if CDS_sequence is valid
  import re

  valid_amino_acids = set("ACDEFGHIKLMNPQRSTUVWY")

  cleaned_sequence = ''.join(CDS_sequence.split())
  cleaned_sequence = cleaned_sequence.replace('*', '')
  invalid_chars = set(cleaned_sequence) - valid_amino_acids

  if invalid_chars:
      print(f"!!WARNING!! There is {', '.join(invalid_chars)} in the sequence. Please check the typo.")
  else:
      if set(cleaned_sequence) == {'A', 'C', 'G', 'T'} or set(cleaned_sequence) == {'A', 'C', 'G', 'U'}:
          if len(cleaned_sequence) % 3 != 0:
              print(f"!!WARNING!! The sequence length is not a multiple of 3. Please check.")
          else:
              !pip install biopython > /dev/null 2>&1
              from Bio.Seq import Seq
              protein_sequence = translate_sequence(cleaned_sequence)
              if protein_sequence[-1] == '*':
                  protein_sequence = protein_sequence[:-1]
              elif '*' in protein_sequence:
                  print(f"!!WARNING!! There is a stop codon in the sequence. Please check.")
      else:
          protein_sequence = cleaned_sequence

In [None]:
#@markdown ## Step 3️⃣: Choose a CDS Optimization Tool
#@markdown Select one of the following softwares to optimize your coding sequence:

#@markdown - **LinearDesign**: Balances mRNA structure and codon optimization (recommended for most cases)
#@markdown - **CodonBERT**: AI-based deep learning approach for codon optimization
#@markdown - **Simple Codon Optimizer**: Basic frequency-based codon optimization
#@markdown - **CUSTOM**: Tissue-specific optimization for targeted expression

#@markdown Each tool offers different trade-offs between translation efficiency, mRNA stability, and immunogenicity.



if EXAMPLE_RUN == False:
  CDS_optimization_tool = "LinearDesign" #@param ["LinearDesign", "CodonBERT", "Simple Codon Optimizer", "CUSTOM"]

#@markdown You can adjust the tool’s detailed settings in the “[Advanced Options](#options)” section.

In [None]:
#@markdown
#@markdown ## Step 4️⃣: Select UTR Sequences

#@markdown Choose pre-defined 5' and 3' untranslated regions (UTRs) or generate a custom 5' UTR:

#@markdown - **5' UTR**: Influences translation initiation efficiency
#@markdown - **3' UTR**: Affects mRNA stability and expression duration

#@markdown The dropdown menus offer UTRs from successful commercial vaccines (Pfizer/BioNTech, Moderna) and research constructs.

UTR_list = {'Human alpha-globin RNA with an optimized Kozak sequence (BNT161b2/BioNTech)' : 'GAAUAAACUAGUAUUCUUCUGGUCCCCACAGACUCAGAGAGAACCCGCCACC',
            'The amino-terminal enhancer of split (AES) mRNA and the mitochondrial encoded 12S ribosomal RNA (BNT161b2/BioNTech)' : 'CUCGAGCUGGUACUGCAUGCACGCAAUGCUAGCUGCCCCUUUCCCGUCCUGGGUACCCCGAGUCUCCCCCGACCUCGGGUCCCAGGUAUGCUCCCACCUCCACCUGCCCCACUCACCACCUCUGCUAGUUCCAGACACCUCCCAAGCACGCAGCAAUGCAGCUCAAAACGCUUAGCCUAGCCACACCCCCACGGGAAACAGCAGUGAUUAACCUUUAGCAAUAAACGAAAGUUUAACUAAGCUAUACUAACCCCAGGGUUGGUCAAUUUCGUGCCAGCCACACCCUGGAGCUAGC',
            '5UTR sequence of mRNA1273 (mRNA1273/Moderna)' : 'GGGAAAUAAGAGAGAAAAGAAGAGUAAGAAGAAAUAUAAGACCCCGGCGCCGCCACC',
            'Homo sapiens hemoglobin subunit alpha 1 gene (HBA1) (mRNA1273/Moderna)' : 'GCUGGAGCCUCGGUGGCCUAGCUUCUUGCCCCUUGGGCCUCCCCCCAGCCCCUCCUCCCCUUCCUGCACCCGUACCCCCGUGGUCUUUGAAUAAAGUCUGAGUGGGCGGCA',
            'Dynein Axonemal Heavy Chain 2 (DNAH2) (LIVERNA)' : 'GAGACCCAAGCUGGCUAGCGGGAGAAAGCUUACCGGCUAGCGCCGCCACC',
            'Homo sapiens hemoglobin subunit alpha 2 gene (HBA2) (LIVERNA)' : 'GCUGGAGCCUCGGUAGCCGUUCCUCCUGCCCGCUGGGCCUCCCAACGGGCCCUCCUCCCCUCCUUGCACCGGCCCUUCCUGGUCUUUGAAUAAAGUCUGAGUGGGCAGC',
            'Homo sapiens hydroxysteroid 17-beta dehydrogenase 4 gene (HSD17B4) (RiboBio)' : 'GUCCCGCAGUCGGCGUCCAGCGGCUCUGCUUGUUCGUGUGUGUGUCGUUGCAGGCCUUAUUCAGAUCUACCGGUGGUACCGCCACC',
            'Homo sapiens albumin gene (ALB) (RiboBio)' : 'AGCCAACACCCUGUCUAAAAAACAUAAAUUUCUUUAAUCAUUUUGCCUCUUUUCUCUGUGCUUCAAUUAAUAAAAAAUGGAAAGAACCU',
            '5UTR sequence from Stemirna (Stemirna)' : 'GCUCGCUUUCUUGCUGUCCAAUUUCUAUUAAAGGUUCCUUUGUUCCCUAAGUCCAAGGGGAUAUUAUGAAGGGCCUUGAGCAUCUGGAUUCUGCCUAAUA',
            '3UTR sequence from Stemirna (Stemirna)' : 'ACAUUUGCUUCUGACACAACUGUGUUCACUAGCAACCUCAAACAGACACC',
            'Scaffold virus K4 element (Seo et al., 2023)' : 'AACAUCCUCUCGAUCGGAUCGCAACGUGUUACCCAGGAAUCCACUUGGGUGUACGCGGCCGUUCUGACGUUGGAAUUCUGUAGAUGAAAGUUAGCUAGGAGCUUUUAAUUGGAAAUGAGAACAAAAAAAA',
            'Aichi virus K5 element (Seo et al., 2023)' : 'CAUGGUUGUACUGCACUAUCAUCCUAAGACGGUCCUUCUUCGGAUCGCAAUCUCACCCUGGUGCCGCGCUUCCUUCGGGAACUGCACCCGCGGACCAGGGCCGUCUUUGAACUUUUCUAACUGUUCUUAC'}


if EXAMPLE_RUN == False:
  UTR_5prime = 'Human alpha-globin RNA with an optimized Kozak sequence (BNT161b2/BioNTech)' #@param ["Human alpha-globin RNA with an optimized Kozak sequence (BNT161b2/BioNTech)", "5UTR sequence of mRNA1273 (mRNA1273/Moderna)", "Dynein Axonemal Heavy Chain 2 (DNAH2) (LIVERNA)", "Homo sapiens hydroxysteroid 17-beta dehydrogenase 4 gene (HSD17B4) (RiboBio)", "5UTR sequence from Stemirna (Stemirna)"]
  UTR_3prime = 'The amino-terminal enhancer of split (AES) mRNA and the mitochondrial encoded 12S ribosomal RNA (BNT161b2/BioNTech)' #@param ["The amino-terminal enhancer of split (AES) mRNA and the mitochondrial encoded 12S ribosomal RNA (BNT161b2/BioNTech)", "Homo sapiens hemoglobin subunit alpha 1 gene (HBA1) (mRNA1273/Moderna)", "Homo sapiens hemoglobin subunit alpha 2 gene (HBA2) (LIVERNA)", "Homo sapiens albumin gene (ALB) (RiboBio)", "3UTR sequence from Stemirna (Stemirna)", "Scaffold virus K4 element (Seo et al., 2023)", "Aichi virus K5 element (Seo et al., 2023)"]

UTR_5prime_seq = UTR_list[UTR_5prime]
UTR_3prime_seq = UTR_list[UTR_3prime]

#@markdown - ⚙️ **Optional**: Check the box to generate a custom 5' UTR with Optimus-5-Prime instead of using pre-defined sequences.
if EXAMPLE_RUN == False:
  UTR_optimization = False #@param {type:"boolean"}
if EXAMPLE_RUN == False and UTR_optimization == True:
  MRL = 4.5 # @param {type:"slider", min:1.5, max:7.5, step:0.5}


#@markdown - ⚙️ **Optional:** If you’d like to enter your own UTR sequence, check this box and type your sequence in the field below.

ENTER_MY_OWN_5_UTR = False #@param {type:"boolean"}
# USE_MY_OWN_UTR
if ENTER_MY_OWN_5_UTR:
  UTR_5prime_seq = "" #@param {type:"string"}
  UTR_5prime_seq = UTR_5prime_seq.upper().replace('T', 'U')

ENTER_MY_OWN_3_UTR = False #@param {type:"boolean"}
if ENTER_MY_OWN_3_UTR:
  UTR_3prime_seq = "" #@param {type:"string"}
  UTR_3prime_seq = UTR_3prime_seq.upper().replace('T', 'U')

## 🚀 You're all set! Click **'Runtime' → 'Run all'** to run VaxLab.
#### If you’d like to adjust detailed parameters, please refer to the section below.

## ⚙️ (Optional) Advanced Options <a name="options"></a>

The settings below allow you to customize the optimization process for your specific needs. The default values work well for most applications, but adjusting these parameters can help address special requirements.

In [None]:
#@markdown

#@markdown LinearDesign
if EXAMPLE_RUN == False and CDS_optimization_tool == "LinearDesign":
  lambda_value_for_LinearDesign = 10 # @param {type:"slider", min:0, max:10, step:0.5}

#@markdown CUSTOM
if CDS_optimization_tool == "CUSTOM":
  tissue_for_CUSTOM = "Lung" #@param [    "Lung", "Breast", "Skin", "Spleen", "Heart", "Liver", "Salivarygland", "Muscle", "Tonsil", "Smallintestine", "Placenta", "Appendices", "Testis", "Rectum", "Urinarybladder", "Prostate", "Esophagus", "Kidney", "Thyroid", "Lymphnode", "Artery", "Brain", "Nerve", "Gallbladder", "Uterus", "Pituitary", "Colon", "Vagina", "Duodenum", "Fat", "Stomach", "Adrenal", "Fallopiantube", "Smoothmuscle", "Pancreas", "Ovary"]


In [None]:
#@title Installation

import subprocess as sp
import os
import shutil
import urllib
from pathlib import Path

BUILDDIR = Path('/content/build')

if not os.path.isdir(BUILDDIR):
  os.makedirs(BUILDDIR)

# VaxLab dependencies
sp.check_call(['pip', 'install', '--no-deps', 'ViennaRNA==2.7.0'])

# LinearDesign
if CDS_optimization_tool == 'LinearDesign':
  workdir = BUILDDIR / 'LinearDesign'
  LINEARDESIGN_BIN = workdir / 'bin' / 'LinearDesign_2D'
  LINEARDESIGN_BIN_URL = 'https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/lineardesign/LinearDesign_2D.gz'

  if not os.path.isfile(LINEARDESIGN_BIN):
    if not os.path.isdir(workdir):
      sp.check_call(f'git clone https://github.com/LinearDesignSoftware/LinearDesign.git {workdir}',
                    shell=True)

    os.makedirs(workdir / 'bin', exist_ok=True)
    # urllib.request.urlretrieve(LINEARDESIGN_BIN_URL, str(LINEARDESIGN_BIN) + '.gz')
    while not os.path.exists(str(LINEARDESIGN_BIN) + '.gz'):
      !cd {workdir}/bin && wget {LINEARDESIGN_BIN_URL} > /dev/null 2>&1
    sp.check_call(['gunzip', str(LINEARDESIGN_BIN) + '.gz'])

# CodonBERT
if CDS_optimization_tool == 'CodonBERT':
  workdir = BUILDDIR / 'CodonBERT'
  CODONBERT_DEPS = [
    'biopython==1.81',
    'tensorboardx==2.6',
  ]

  if not os.path.exists(workdir / '.preparation.done'):
    if not os.path.isdir(workdir):
      sp.check_call(f'git clone https://github.com/FPPGroup/CodonBERT.git {workdir}',
                    shell=True)
    sp.check_call(['pip', 'install', '--no-deps'] + CODONBERT_DEPS)
    open(workdir / '.preparation.done', 'w')

# Simple Codon Optimizer
if CDS_optimization_tool == 'Simple Codon Optimizer':
  workdir = BUILDDIR / 'simple-codon-optimizer'

  if not os.path.isdir(workdir):
    sp.check_call(f'git clone https://github.com/tdseher/simple-codon-optimizer.git {workdir}',
                  shell=True)

# CUSTOM
if CDS_optimization_tool == 'CUSTOM':
  workdir = BUILDDIR / 'CUSTOM'
  CUSTOM_SCRIPT = workdir / 'CUSTOM_tissue_optimization.py'
  CUSTOM_SCRIPT_URL = 'https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/custom/CUSTOM_tissue_optimization.py'

  sp.check_call(['pip', 'install', 'custom_optimizer'])
  sp.check_call(['pip', 'install', 'viennarna'])

  while not os.path.exists(CUSTOM_SCRIPT):
    os.makedirs(workdir, exist_ok=True)
    urllib.request.urlretrieve(CUSTOM_SCRIPT_URL, CUSTOM_SCRIPT)

# Optimus5Prime
if UTR_optimization == True:
  workdir = BUILDDIR / 'Optimus5Prime'

  if not os.path.exists(workdir / '.preparation-done'):
    os.makedirs(workdir, exist_ok=True)

    if not os.path.isdir(workdir / 'paper-5utr-design'):
      sp.check_call(['git', 'clone', 'https://github.com/castillohair/paper-5utr-design.git',
                    workdir / 'paper-5utr-design'])
    if not os.path.isdir(workdir / 'genesis'):
      sp.check_call(['git', 'clone', 'https://github.com/johli/genesis.git',
                    workdir / 'genesis'])

    import subprocess as sp

    sp.check_call("wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh", shell=True)
    sp.check_call("bash miniconda.sh -b -p /opt/conda", shell=True)
    sp.check_call("rm miniconda.sh", shell=True)
    sp.check_call("/opt/conda/bin/conda init bash", shell=True)

    sp.check_call("/opt/conda/bin/conda create -y -n optimus5prime_DEN python=3.7", shell=True)
    sp.check_call("/opt/conda/bin/conda install -y -n optimus5prime_DEN tensorflow=1.15 keras=2.2.4 scipy=1.2.1 numpy=1.16.2 pandas", shell=True)
    sp.check_call("/opt/conda/bin/conda run -n optimus5prime_DEN pip install isolearn==0.2.1", shell=True)
    sp.check_call(f"/opt/conda/bin/conda run -n optimus5prime_DEN bash -c 'cd {workdir}/genesis && python setup.py install'", shell=True)

    !cd build/Optimus5Prime/genesis/ && wget https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/optimus5prime/generate_5utr.py

    open(workdir / '.preparation-done', 'w')

# VaxLab-report
workdir = BUILDDIR / 'VaxLab-report'
if not os.path.isdir(workdir):
  sp.check_call(f'git clone https://github.com/ChangLabSNU/VaxLab-report.git {workdir}',
                shell=True)
  sp.check_call(['pip', 'install', '-e', 'build/VaxLab-report/'])

# # ViennaRNA
# ## Install ViennaRNA
# if os.system("pip install viennarna > /dev/null 2>&1") == 0:
#     print("(5/5) ViennaRNA installed successfully.")
# else:
#     print("Failed to install ViennaRNA.")

# xlsxwriter
!pip install xlsxwriter > /dev/null 2>&1

##🔄 Optimization

In [None]:
#@title UTR optimization

if UTR_optimization == True:
  !source /opt/conda/bin/activate optimus5prime_DEN && python build/Optimus5Prime/genesis/generate_5utr.py \
    --mrl $MRL \
    --predictor build/Optimus5Prime/paper-5utr-design/megatal_5utr_design/den/predictors/model_optimus5p_50bp_retrained.h5 \
    --generator build/Optimus5Prime/paper-5utr-design/megatal_5utr_design/den/saved_generators/genesis_invreg_optimus5p_50bp_ns_generator.h5 \
    > optimus5prime_result.txt > /dev/null 2>&1

In [None]:
#@title CDS optimization

if CDS_optimization_tool == 'LinearDesign':
  !chmod +x build/LinearDesign/bin/LinearDesign_2D
  !cd build/LinearDesign && echo $protein_sequence | bin/LinearDesign_2D $lambda_value_for_LinearDesign 1 codon_usage_freq_table_human.csv | tee /content/lineardesign_result.txt > /dev/null 2>&1

if CDS_optimization_tool == 'CodonBERT':
  os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION'] = 'python'
  with open("input_protein_sequence.fasta", "w") as f:
    f.write(f">{sequence_name}\n{protein_sequence}\n")
  !cd build/CodonBERT && python predict.py -m models/kidney_1_1_CodonBert_model_20230726_320_model_param.pt -f /content/input_protein_sequence.fasta -o /content/codonbert_result.fasta > /dev/null 2>&1

if CDS_optimization_tool == 'Simple Codon Optimizer':
  !curl -o build/simple-codon-optimizer/examples/human_codon_usage.html "https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606" > /dev/null 2>&1
  with open("input_protein_sequence.txt", "w") as f:
    f.write(f"{protein_sequence}")
  !cd build/simple-codon-optimizer/ && python simple-codon-optimizer.py /content/build/simple-codon-optimizer/examples/human_codon_usage.html 1 "$(cat /content/input_protein_sequence.txt)" --suppress --deterministic | awk '{print $2}' > /content/simplecodon_result.txt

if CDS_optimization_tool == 'CUSTOM':
  !python build/CUSTOM/CUSTOM_tissue_optimization.py --seq $protein_sequence --tissue $tissue_for_CUSTOM > custom_result.txt


In [None]:
#@markdown
# Save the CDS sequence generated by LinearDesign

CDS_seq = ''

if CDS_optimization_tool == 'LinearDesign':
  with open('lineardesign_result.txt', 'r') as file:
      lineardesign_result = file.readlines()
      for result_line in lineardesign_result:
        if result_line.startswith('mRNA sequence:'):
          CDS_seq = result_line.strip().split(':  ')[1]
          CDS_seqlen = len(CDS_seq)

elif CDS_optimization_tool == 'CodonBERT':
  with open('codonbert_result_fix.fasta', 'r') as file:
    for line in file:
      if line.startswith('>'):
        continue
      else:
        CDS_seq = line.strip().replace('T', 'U')
        CDS_seqlen = len(CDS_seq)

elif CDS_optimization_tool == 'Simple Codon Optimizer':
  with open('simplecodon_result.txt', 'r') as file:
    simplecodon_result = file.readlines()
    CDS_seq = simplecodon_result[0].strip().replace('T', 'U')
    CDS_seqlen = len(CDS_seq)

elif CDS_optimization_tool == 'CUSTOM':
  with open('custom_result.txt', 'r') as file:
    lines = file.readlines()
    for i, line in enumerate(lines):
      if line.strip().startswith('Top optimized sequence:'):
        if i+1 < len(lines):
          CDS_seq = lines[i+1].strip()
          CDS_seqlen = len(CDS_seq)

if CDS_seq == '':
  print('Failed to extract CDS sequence.')
  # Stop VaxLab
else:
  if UTR_optimization == True:
    with open('optimus5prime_result.txt', 'r') as file:
      for line in file:
        if line.startswith('1:'):
          UTR_5prime_seq = line.split(' ')[1]

  STOP_codon = 'UGAUAAUAG'
  CDS_seq += STOP_codon

  full_seq_RNA = UTR_5prime_seq.lower() + CDS_seq.replace('T', 'U') + UTR_3prime_seq.lower()
  CDS_start_index = len(UTR_5prime_seq)
  CDS_end_index = CDS_start_index + len(CDS_seq)

  # full_seq_DNA = full_seq_RNA.upper().replace('U', 'T')
  full_seq_DNA = UTR_5prime_seq.replace('U', 'T').lower() + CDS_seq.replace('U', 'T') + UTR_3prime_seq.replace('U', 'T').lower()

  with open('optimized.fasta', 'w') as file:
    file.write(f'>5UTR\n' + UTR_5prime_seq + '\n')
    file.write(f'>{sequence_name}\n' + CDS_seq + '\n')
    file.write(f'>3UTR\n' + UTR_3prime_seq + '\n')
  with open('optimized_full.fasta', 'w') as file:
    file.write(f'>{sequence_name}\n' + full_seq_RNA.upper())
  print('Sequence optimization is done successfully.')

## 📊 Result

In [None]:
#@markdown
#@markdown ## Interactive Sequence Editor

#@markdown This viewer allows you to explore your complete mRNA construct with annotated features:
#@markdown - 5' UTR (orange)
#@markdown - Coding sequence (blue)
#@markdown - 3' UTR (yellow)

#@markdown You can zoom, scroll, and inspect specific regions of interest.

#@markdown By clicking the padlock icon(🔒), you can modify the sequence or generate primers. \
#@markdown You can check the melting temperature (Tm) by clicking `View` - `Melting Temp of Selection`.

#@markdown Sequence files (`GenBank`, `FASTA`, `Teselagen JSON`) can be exported from **File > Export Sequence**.

#Editor

EDITOR_SRCDIR = Path('/content/ove_editor')

if not os.path.isdir(EDITOR_SRCDIR):
  os.makedirs(EDITOR_SRCDIR)

else:
  while not os.path.exists(EDITOR_SRCDIR / 'style.css'):
    !cd {EDITOR_SRCDIR} && wget https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/ove-editor/style.css > /dev/null 2>&1
  while not os.path.exists(EDITOR_SRCDIR / 'index.umd.js'):
    !cd {EDITOR_SRCDIR} && wget https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/ove-editor/index.umd.js > /dev/null 2>&1

editor_html = f"""
<html>
  <head>
    <link
      rel="stylesheet"
      type="text/css"
      href="ove_editor/style.css"
    />
    <style>
      html, body {{
        margin: 0;
        padding: 0;
        width: 100%;
        height: 100vh;
        overflow: visible;
      }}

      .veStatusBar {{
        position: relative !important;
        bottom: auto !important;
        left: auto !important;
        right: auto !important;
        top: auto !important;
      }}

      .veVectorInteractionWrapper,
      .veVectorInteractionWrapperToolbar {{
        position: relative !important;
        bottom: auto !important;
        display: flex !important;
        flex-direction: column !important;
        height: auto !important;
        min-height: 100vh !important;
      }}

      .tg-editor-container {{
        height: auto !important;
        min-height: calc(100vh - 140px) !important;
      }}
    </style>
  </head>
  <body>
    <script
      type="text/javascript"
      src="ove_editor/index.umd.js"
    ></script>
    <script type="text/javascript">
      const editor = window.createVectorEditor("createDomNodeForMe", {{
        withPreviewMode: false,
        editorName: "Optimized Sequence",
        showMenuBar: true
      }});

      editor.updateEditor({{
        sequenceData: {{
          name: "{sequence_name}",
          circular: false,
          sequence: "{full_seq_DNA}",
          features: [
          {{
              id: "19f0fjj",
              name: "5' UTR",
              type: "UTR",
              start: 0,
              end: "{CDS_start_index-1}",
              strand: 1,
              color: "#ffa190"
            }},
          {{
              id: "24t2t",
              name: "CDS",
              type: "CDS",
              start: "{CDS_start_index}",
              end: "{CDS_end_index-1}",
              strand: 1,
              color: "#96b4fe"
            }},
          {{
              id: "82020000",
              name: "3' UTR",
              type: "UTR",
              start: {CDS_end_index},
              end: "{len(full_seq_DNA)}",
              strand: 1,
              color: "#ffff6b"
            }}
          ]
        }}
      }});

      editor.updateEditor({{
        panelsShown: [
        [
          {{
            id: "rail",
            name: "Linear Map",
            active: true
          }},
          {{
            id: "circular",
            name: "Circular Map",
            active: false
          }},
          {{
            id: "properties",
            name: "Properties",
            active: false
          }}
        ],
        [
          {{
            id: "sequence",
            name: "Sequence Map",
            active: true
          }}
        ]
      ]
      }})

      window.addEventListener('load', function() {{
        setTimeout(function() {{
          const statusBar = document.querySelector('.veStatusBar');
          if (statusBar) {{
            statusBar.style.position = 'relative';
            statusBar.style.bottom = 'auto';
            statusBar.style.top = 'auto';

            let parent = statusBar.parentElement;
            while (parent && parent !== document.body) {{
              const style = window.getComputedStyle(parent);
              if (style.position === 'fixed' || style.position === 'absolute') {{
                parent.style.position = 'relative';
                parent.style.bottom = 'auto';
                parent.style.top = 'auto';
              }}
              parent = parent.parentElement;
            }}

            console.log('Status bar and its parents have been fixed');
          }} else {{
            console.log('Status bar not found');
          }}
        }}, 2000);
      }});
    </script>
  </body>
</html>
"""

# Save the editor html file
editor_filename = "index.html"

with open(editor_filename, 'w') as file:
    file.write(editor_html)

import os
import time
from google.colab import output
from IPython.display import HTML

os.system('python3 -m http.server 8000 --bind 0.0.0.0 >/dev/null 2>&1 &')

output.serve_kernel_port_as_iframe(8000)

display(HTML("""
<style>
  iframe {
    width: 100% !important;
    max-width: 2400px !important;
    height: 900px !important;
    background-color: white !important;
  }
</style>
"""))

In [None]:
#@markdown ### VaxLab Quality Report

#@markdown This section generates a comprehensive report analyzing various quality metrics of your optimized sequence.

#@markdown The report will download automatically and includes:
#@markdown - Codon usage statistics
#@markdown - GC content distribution
#@markdown - Secondary structure analysis
#@markdown - Translation efficiency predictions
#@markdown - Potential sequence issues

#@markdown 💡 **Tip:** This report is useful for documentation and comparing different optimization strategies.

# Path Configuration
input_path = "/content/optimized.fasta"
output_path = "/content/VaxLab_QC"

# Clean output directory if exists
import os
import shutil
import subprocess

if os.path.exists(output_path):
  shutil.rmtree(output_path)
  print(f"🗑️ Existing output directory '{output_path}' deleted.")
  !cd /content
# Run evaluation and report generation with VaxLab-report Conda environment
print("🔬 Starting VaxLab Quality Report generation...")
evaluate_command_list = ["python", "build/VaxLab-report/vaxlab_report/evaluate_only.py", "-i", input_path, "-o", output_path, "--preset", "/content/build/VaxLab-report/parameters.json"]
report_command_list = ["python", "build/VaxLab-report/vaxlab_report/report_only.py", "-i", input_path, "-o", output_path]

process = subprocess.run(evaluate_command_list, capture_output=True, text=True)
evaluate_result = process.returncode
evaluate_stdout = process.stdout
evaluate_stderr = process.stderr

if evaluate_result == 0:
  print("✅ Evaluation completed.")
  process_report = subprocess.run(report_command_list, capture_output=True, text=True)
  report_result = process_report.returncode
  report_stdout = process_report.stdout
  report_stderr = process_report.stderr
  if report_result == 0:
    print("✅ Report generation completed.")
    report_filename = f"{output_path}/report.html"
    if os.path.exists(report_filename):
      from google.colab import files
      files.download(report_filename)
      print('📥 Report file downloaded automatically.')
    else:
      print('❌ Error! Report file was not generated.')
  else:
    print('❌ Error! Report generation failed.')
else:
  print('❌ Error! Evaluation failed.')

In [None]:
#@markdown

!cp VaxLab_QC/report.html index.html

from IPython.display import HTML

with open("index.html") as f:
    html_content = f.read()

html_with_bg = f"""
<div style="background-color: white; padding: 0; margin: 0;">
  {html_content}
</div>
"""

HTML(html_with_bg)

In [None]:
#@markdown ### Download TWIST Order Sheet

#@markdown This section generates a sheet for ordering the optimized sequence at Twist Bioscience.
#@markdown Press the Download button and drop it down at [TWIST order site](https://ecommerce.twistdna.com/app/gene).

import xlsxwriter
import openpyxl
import ipywidgets as widgets
from IPython.display import display

!curl -L -o /content/TWIST_order_template.xlsx "https://github.com/ChangLabSNU/VaxLab/raw/refs/heads/main/resources/twist_template/TWIST_order_template.xlsx" > /dev/null 2>&1

wb = openpyxl.load_workbook("/content/TWIST_order_template.xlsx")
ws = wb.active  # Select the first worksheet

ws["A2"] = sequence_name
ws["B2"] = full_seq_DNA

wb.save("TWIST_order_filled.xlsx")

def download_order_sheet(b):
  order_sheet_filename = f"/content/TWIST_order_filled.xlsx"
  if os.path.exists(order_sheet_filename):
    files.download(order_sheet_filename)
    print('📥 Order sheet file downloaded successfully.')
  else:
    print('❌ Error! Order sheet file not found.')

download_button = widgets.Button(description="Download Sheet", button_style="success")
download_button.on_click(download_order_sheet)
display(download_button)

# 📚 VaxLab User Guide <a name="user_guide"></a>

This guide will walk you through using VaxLab, a comprehensive tool for designing optimized mRNA vaccine sequences. Each section below corresponds to a cell in the notebook, with explanations of what each step does and how to use it effectively.

[Go to Top](#page_start).

## 🧪 Introduction
VaxLab allows you to design mRNA vaccine sequences by optimizing coding sequences (CDS) and selecting appropriate untranslated regions (UTRs). The platform integrates several state-of-the-art optimization tools, making it easy to create mRNA constructs tailored to your specific needs.

## 🚀 Getting Started
### ✅ Example Run
The first cell provides a quick way to test the platform with pre-defined settings:

- Check the `EXAMPLE_RUN` box and run all cells to see VaxLab in action with a sample NanoLuciferase sequence
- This gives you a sense of the workflow and output format before working with your own sequences
- The example run takes approximately 1 minute to complete

### 📝 Step 1: Name Your Sequence

- Enter a unique identifier for your sequence (e.g., "NLuc" or "SARS-CoV-2_Spike")
- This name will be used in output files and reports for easy identification
- Keep it concise but descriptive to help you organize multiple designs

### 🧬 Step 2: Input Your Sequence

- You can enter either a protein sequence OR a DNA/RNA coding sequence
- Enter the amino acid sequence or nucleic acid sequence directly in single-letter code
- The system will automatically validate your input and warn you of any invalid characters
- Example: The test sequence is NanoLuciferase, a commonly used reporter protein

### 🛠️ Step 3: Select a CDS Optimization Tool
VaxLab offers several optimization algorithms, each with different strengths:

- LinearDesign: Optimizes mRNA secondary structure and codon usage simultaneously
- CodonBERT: Uses AI-based approach for codon optimization based on deep learning
- Simple Codon Optimizer: Performs basic codon optimization based on host codon usage frequencies
- CUSTOM: Tissue-specific optimization that tailors codon usage to expression in specific organs

### 🧩 Step 4: Select UTR Sequences

- Choose pre-defined 5' and 3' UTR sequences from established mRNA vaccines
- Options include sequences from Pfizer/BioNTech (BNT162b2), Moderna (mRNA-1273), and others
- Alternatively, you can generate a custom 5' UTR using Optimus-5-Prime by checking the UTR optimization box
- If you have your own UTR sequences, please check 'ENTER_MY_OWN_UTR' and type them.
- Each UTR influences translation efficiency and mRNA stability differently

## ⚙️ Advanced Parameters
### 🎛️ Optimization Tool-Specific Parameters
Each optimization tool has specific parameters you can adjust:

- **LinearDesign**: λ value (0-10) balances codon adaptation vs. structure optimization

  - Higher values (>5) prioritize codon usage
  - Lower values (<5) prioritize mRNA structure
  - Default value of 4 provides a balanced optimization


- **CUSTOM**: Select the target tissue where your mRNA will be expressed

  - This ensures the optimized sequence works efficiently in the specific organ



### 🔄 UTR Optimization (Optimus-5-Prime)

- Mean Ribosome Load (MRL) slider (1.5-7.5) controls translation efficiency
- Higher values create UTRs that promote higher protein expression
- Lower values create UTRs with more moderate translation rates

## ▶️ Running VaxLab
After entering all your parameters, click **"Runtime" → "Run all"** to start the optimization process. The system will:

1. Install all necessary dependencies
2. Run the selected CDS optimization tool on your sequence
3. If selected, optimize the 5' UTR
4. Combine optimized sequences into a complete mRNA construct
5. Analyze the structure and properties of the final sequence

## 📊 Results and Visualization
### 🔍 RNA Structure Visualization

- The folded RNA structure is displayed in an interactive viewer
- The minimum free energy (MFE) shows the thermodynamic stability of your mRNA
- You can zoom, pan, and explore the structure

### 📝 Sequence Editor

- An interactive editor displays your complete mRNA sequence with annotations
- Color-coded regions show the 5' UTR, CDS, and 3' UTR
- You can inspect the sequence and download it for further use

### 📄 Report Generation

- A comprehensive report is automatically generated and downloaded
- Contains optimization metrics, sequence properties, and structure information
- The report can be shared with colleagues or included in publications

## 📈 Performance Metrics
The final report includes several key metrics:

- Codon Adaptation Index (CAI): Measures how well the codon usage matches the host organism
- GC content: Important for stability and expression efficiency
- Minimum Free Energy (MFE): Indicates the stability of the mRNA structure
- Loop Length: Affects accessibility to ribosomes
- Repeat Analysis: Identifies potentially problematic repetitive sequences

## 💡 Tips for Optimal Results

- 💉 For vaccine design, a balanced approach with moderate structure stability often works best
- ⚠️ Avoid sequences with very low MFE (highly structured) as they may impede translation
- 🎯 Consider tissue-specific optimization for targeted expression
- 🔄 Compare different optimization strategies to find the best for your specific antigen
- 🧩 The BioNTech/Pfizer UTRs are generally a good starting point for vaccine design

## ⚠️ Troubleshooting

- 🚫 If optimization fails, try simplifying your sequence or reducing its length
- 🔍 Check for rare amino acids or unusual sequence patterns that might cause issues
- 🔄 If the structure visualization doesn't load, try rerunning just that cell
- ⏱️ For large proteins, expect longer optimization times (can be 10+ minutes)

Happy designing with VaxLab! 🧪🧬💉

# ❓ VaxLab Frequently Asked Questions <a name="FAQ"></a>

[Go to Top](#page_start).

## ℹ️ General Questions
### What is VaxLab?
VaxLab is an integrated platform for designing optimized mRNA vaccine sequences. It combines several state-of-the-art optimization tools, allowing you to create mRNA constructs with optimized coding sequences (CDS) and appropriate untranslated regions (UTRs).

### Do I need coding experience to use VaxLab?
No! VaxLab is designed to be user-friendly for researchers without programming experience. The notebook interface guides you through each step with clear instructions.

### How long does the optimization process take?
Typically 3-5 minutes for average-sized proteins (300-500 amino acids). Larger sequences may take longer, especially with LinearDesign or CUSTOM.

## 🧬 Input Sequences
### What format should my protein sequence be in?
Enter your protein sequence in single-letter amino acid code (e.g., MAVRGHPGKGASCTL...) without spaces or special characters.

### Can I input a DNA sequence instead of a protein?
Yes. Please enter a DNA or RNA CDS sequence directly, which will be translated automatically.

### What sequence length can VaxLab handle?
VaxLab works best with sequences between 50-1000 amino acids. Very long sequences (>1000 amino acids) may cause longer computation times or occasional timeouts in the Colab environment. \
Optimizing a sequence longer than 600 amino acids with LinearDesign might fail on the free version of Google Colab due to insufficient RAM. In such cases, you can either use a different CDS optimization tool besides LinearDesign, or you will need to subscribe to a paid version of Google Colab.

### Do I need to add a stop codon to my sequence?
No, VaxLab automatically adds stop codons (UGAUAAUAG) to the end of your coding sequence.

## 🛠️ Optimization Tools
### Which optimization tool should I choose?

- **LinearDesign**: Best for most applications, balancing structure and codon optimization
- **CodonBERT**: Useful when maximum expression is the primary goal
- **Simple Codon Optimizer**: Good for basic applications or when you want minimal structural manipulation
- **CUSTOM**: Best when tissue-specific expression is important

### What does the LinearDesign λ value control?
The λ parameter (0-10) balances the importance of codon optimization versus mRNA structure optimization:

- λ = 0: Focuses entirely on optimizing mRNA structure
- λ = 10: Focuses entirely on codon adaptation
- λ = 4 (default): Balanced approach suitable for most applications

### What tissues can I target with the CUSTOM optimizer?
The CUSTOM optimizer supports 36 different human tissues, including lung, heart, liver, muscle, brain, and various immune system components. Select the tissue where your mRNA vaccine will be primarily expressed.

## 🧩 UTRs and Structure
###Which UTRs should I choose for my vaccine design?
For vaccine applications, we recommend starting with the BioNTech/Pfizer (BNT162b2) UTRs, which have proven successful in commercial vaccines. The Moderna (mRNA-1273) UTRs are also well-validated options.

### What is MRL in the Optimus-5-Prime settings?
MRL (Mean Ribosome Load) is a measure of translation efficiency. Higher values (5.5-7.5) create UTRs that promote higher protein expression, while lower values (1.5-3.5) create UTRs with more moderate translation rates.

### What is a good MFE (minimum free energy) value?
MFE is measured in kcal/mol, with more negative values indicating more stable structures:

- -200 to -300 kcal/mol: Typical for medium-sized constructs with moderate structure
- < -400 kcal/mol: Highly structured mRNA (may impede translation)
- -150 kcal/mol: Low structure stability (may be prone to degradation)

The optimal MFE depends on your sequence length and design goals.

## ⚠️ Troubleshooting
### The optimization process is taking too long
For lengthy sequences or when using LinearDesign, the process can take 10+ minutes. If it exceeds 20 minutes, try:

1. Restarting the runtime
2. Using a simpler optimization tool (Simple Codon Optimizer or CodonBERT)

### I'm getting an error during installation
Common installation issues can be resolved by:

1. Restarting the runtime and running all cells again
2. Ensuring you have a stable internet connection

### The structure visualization isn't loading
Try:

1. Rerunning just the visualization cell
2. Ensuring you have a stable internet connection
3. If persistent, download the report file which contains another visualization

### The interactive editor isn't loading
Try:

1. Rerunning just the editor cell
2. Ensuring you have a stable internet connection
3. If you've been running VaxLab many times, please clear your Chrome cookies and try again

### How can I save my results?
Your report file is automatically saved.

1. The optimized sequence can be saved from the interactive editor (`File` - `Export Sequence`)
2. A comprehensive report is generated and downloaded automatically
3. You can also manually copy sequences from the interactive editor

## 🔍 Interpretation and Next Steps
### How do I interpret the optimization reports?
The report provides several key metrics:

- CAI (Codon Adaptation Index): Higher values (closer to 1.0) indicate better codon usage
- GC content: Ideally between 40-60% for most applications
- MFE (Minimum Free Energy): Indicates structure stability
- Loop metrics: Longer loops may indicate more accessible regions for translation

### What should I do after designing my mRNA sequence?
Typical next steps include:

1. Synthesizing the designed sequence (through commercial providers)
2. Performing in vitro transcription to produce the mRNA
3. Testing expression levels in cell culture
4. Evaluating immunogenicity in model systems

### Can I compare multiple design strategies?
Yes! We recommend:

1. Creating several designs with different optimization tools
2. Keeping the same UTRs while varying the CDS optimization
3. Comparing the resulting metrics and structures
4. Testing the most promising candidates experimentally

### Is VaxLab suitable for therapeutic mRNAs beyond vaccines?
Absolutely. While developed with vaccines in mind, VaxLab's optimization strategies are equally applicable to therapeutic proteins, antibodies, and other mRNA-based therapeutics.