<a href="https://colab.research.google.com/github/GDanyi96/VHH-spA-binding-analyzer/blob/main/VHH_spA_interaction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **VHH sequence analyzer for spA binding**

## *Description*
A bioinformatics tool to aid the rational design of **Protein A-binding VHH sequences**. The algorithm is based on an open-access article  [(Henry KA et al.,2016)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163113) that describes a mutagenesis-based strategy to investigate Staphylococcal protein A (**SpA**) binding to VHH backbones.

The tool translates the experimentally validated findings (by surface plasmon resonance) of this article into an easy-to-use tool that **accepts a VHH sequence as an input and informs the user about potential interaction SpA**. Furthermore, it reports which residues could have destabilizing effects on the SpA binding interface

## *Usage*

Use the tool by entering a VHH amino acid sequence as well as its name. On the toolbar, click **Runtime > Run all** or press Ctrl + F9. Results of the analysis should appear below. In order to  perform the analysis with another sequence, start


## *Limitations*

The tool is solely based on the article mentioned, which does not investigate all the possible amino-acid substitutions experimentally. Ideally, the generated results are subject to *in vitro* and/or *in silico* validation. The paper reports that no  adverse effects on expression yield and binding affinity to the target were recognized was observed when introducing the SpA-binding enhancing substitutions, although this is only based on 5 different VHHs against 3 different targets. This can be different in the case of various substitutions and recombinant expression hosts that are not investigated in the paper. Additionally, be advised that there are several recombinant SpA proteins on the market that potentially differ in their exact sequence, which could lead to different structures and altered binding to the VHH.

In [None]:
#@title Input VHH sequence
query_name = '' #@param {type:"string"}
query_sequence = '' #@param {type:"string"}

if not query_name:
  raise ValueError("No sequence name entered. Please provide a name for the sequence.")

if len(query_sequence) < 84:
  raise ValueError(f"The sequence for {query_name} is less than 84 residues. Please enter a full-length sequence.")
elif not query_sequence.isalpha():
  raise ValueError(f"The sequence for {query_name} contains invalid characters. Only letters are allowed.")







In [None]:
#@title Results
from IPython.display import display, HTML

# Global variables to hold the indices
red_idx = []
orange_idx = []

def check_spA_binding(seq_name, seq):

    # Define the indices (assuming 0-based indexing) and expected residues
    tolerated = {
        14: ['G', 'D'],
        16: ['S', 'A'],
        18: ['R'],
        56: ['K', 'R', 'T'],
        58: ['Y'],
        63: ['K', 'E'],
        64: ['G'],
        65: ['R'],
        67: ['T', 'A'],
        69: ['S'],
        74: ['A', 'E', 'K', 'Q', 'R'],
        80: ['Q'],

    }
    # True: No other
    not_tolerated = {
        14: True,
        16: [],
        18: True,
        56: True,
        58: True,
        63: [],
        64: True,
        65: True,
        67: ['S'],
        69: ['P'],
        74: ['P'],
        80: [],

    }

    not_tolerated_found = False
    # Initialize HTML output
    html_output = ""

    # Check the binding criteria and provide feedback
    for position in range(len(seq)):
        if position in not_tolerated:
            not_allowed = not_tolerated[position]
            # Check if this position is a boolean indicating 'All others' are not tolerated
            if isinstance(not_allowed, bool) and not_allowed:
                # If 'All others' are not tolerated, only the residues in tolerated are allowed
                if seq[position] not in tolerated.get(position, []):
                    allowed = ', '.join(tolerated[position])
                    html_output += f"<div style='color: darkorange;font-size: 20px;'><strong>⚠️ {position+1} ({seq[position]}) might reduce affinity to spA.</strong> Consider reverting to a tolerated residue: {allowed}</div>"
                    not_tolerated_found = True
                    orange_idx.append(position)
            # Otherwise, it's a list of specific not tolerated residues
            elif seq[position] in not_allowed:
                allowed = ', '.join(tolerated.get(position, []))
                html_output += f"<div style='color: red;font-size: 20px;'><strong>❗   {position+1} ({seq[position]}) is not a tolerated residue.</strong> Revert to a tolerated residue at this position: {allowed}</div>"
                not_tolerated_found = True
                red_idx.append(position)


    # Rules for 82a 82b positions in FR3 region
    if seq[81] in ['D', 'S', 'A'] or seq[82] =='D':
        html_output += "<br><div style='color: darkorange; font-size: 20px;'>" \
                      "<strong>⚠️ Examine your sequence's Kabat numbering for positions 82a and 82b. </strong>" \
                      "Position 82a should be N and 82b should be S or N residue. " \
                      "Check the Kabat numbering of your sequence with " \
                      "<a href='https://www.novoprolabs.com/tools/ab-numbering' target='_blank'>NovoPro Labs</a>.</div>"

    # No proline in FR3 positions 71-80
    if 'P' in seq[70:80]:
      orange_idx.extend(index for index, residue in enumerate(seq[70:80], start=70) if residue == 'P' and index != 74)
      html_output += "<div style='color: lightcoral;font-size: 20px;'><strong>⚠️ No Proline residue should be present between residues 71-80.</strong> Revert to the nearest human IGHV3 germline residue.</div>"
      not_tolerated_found = True

    if not not_tolerated_found:
        html_output += f"<div style='color: green;font-size: 20px;'><strong>✅ All residues in ' {seq_name} ' suggest binding to spA.</strong></div>"

    # Display the HTML output
    display(HTML(html_output))


check_spA_binding(query_name, query_sequence);


In [None]:
#@title Sequence display



def display_sequence(seq):

    # Split the sequence into chunks of 10 residues
    chunks = [seq[i:i+10] for i in range(0, len(seq), 10)]

    # Create the HTML representation
    html_output = "<pre style='font-family: monospace; font-size: 18px;'>"
    html_output += f"{query_name} sequence analysis" "\n"
    for i, chunk in enumerate(chunks):
        # Add an indicator before each new line if it's a multiple of 10
        if i % 10 == 0 and i != 0:
            html_output += "\n"
        index_label = str(i * 10 + 1).ljust(10)  # Left-justified to align with the sequence
        # Add the chunk to the output with highlights
        chunk_output = ""
        for j, residue in enumerate(chunk):
            residue_index = i * 10 + j  # Calculate the global index of the residue
            # Check if the residue index is in red_idx or orange_idx
            if residue_index in red_idx:
                chunk_output += f"<span style='background-color: red;'>{residue}</span>"
            elif residue_index in orange_idx:
                chunk_output += f"<span style='background-color: darkorange;'>{residue}</span>"
            else:
                chunk_output += residue

        html_output += index_label + chunk_output + "\n"

    html_output += "</pre>"

    # Display the HTML output
    display(HTML(html_output))


#Call the function
display_sequence(query_sequence)