<a href="https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2Byte_Biophysical_features_applied_on_proteins_3D_structure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## The bio2Byte Toolkit to analyze AlphaFold results

This Jupyter Notebook on Google Colab aims to simplify the analysis of the predicted three-dimensional structure after running AlphaFold with your sequence of interest. 

### About bio2Byte

<img src="https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n.jpg" alt="b2b" width="48"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Vrije_Universiteit_Brussel_logo.svg/1200px-Vrije_Universiteit_Brussel_logo.svg.png" alt="vub" width="120"/>

Proteins are the molecular machines that make cells work. They perform a wide variety of functions through interactions with each other and many additional molecules. Traditionally, proteins are described in a single static state (a picture). It is now increasingly recognised that many proteins can adopt multiple states and move between these conformational states dynamically (a movie).

We investigate how the dynamics, conformational states and available experimental data of proteins relates to their amino acid sequence. Underlying physical and chemical principles are computationally unravelled through data integration, analysis and machine learning, so connecting them to biological events and improving our understanding of the way proteins work.

Online predictors are available on https://bio2byte.be/b2btools/

#### Tools and predictors

* **DynaMine** is a fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle. This tool returns the following predictions: 
  * DynaMine backbone dynamics
  * DynaMine side chain dynamics
  * DynaMine conformational propensities (polyproline II)
  * DynaMine conformational propensities (coil)
  * DynaMine conformational propensities (sheet)
  * DynaMine conformational propensities (helix)

* **EFoldMine** is a method that predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. This tool returns the following predictions:
  * EFoldMine earlyFolding propensity

* **DisoMine** method predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. This tool returns the following predictions:
  * Disomine disorder

* **AgMata** is a single-sequence based predictor of protein regions that are likely to cause beta-aggregation. This tool returns the following predictions:
  * Agmata aggregation propensity

# Environment setup

**Notes:** Please be patiente, this section might take several minutes to download and install the dependencies in the Jupyter context...

In [None]:
#@markdown ## Install extra Python packages
%%capture

!pip install b2bTools==3.0.5 py3Dmol biopython

In [None]:
#@markdown ## Import required libraries and define constants

%%capture
from google.colab import files
import py3Dmol
import matplotlib.pyplot as plt
import pickle
import numpy as np
import pandas as pd
from math import floor

from IPython.display import display
import ipywidgets as widgets
from ipywidgets import GridspecLayout
from ipywidgets import Output

from b2bTools import SingleSeq
from Bio import SeqIO 
from Bio.PDB import *

from IPython.utils import io
import tqdm.notebook
import subprocess

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

VALUES_CONSTANTS = {
    # "pLDDT": {
    #     "colors": [
    #         (0, 50, "#FF7D45"),
    #         (50, 70, "#FFDB13"),
    #         (70, 90, "#65CBF3"),
    #         (90, 100, "#0053D6"),
    #     ],
    #     "legends": [
    #         "Very low (pLDDT < 50)",
    #         "Low (50 < pLDDT < 70)",
    #         "Confident (70 < pLDDT < 90)",
    #         "Very high (pLDDT > 90)",
    #     ],
    #     "title": "AlphaFold's Model Confidence (pLDDT)",
    # },
    "backbone": {
        "colors": [
            (0, 0.69, "#FF7D45"),
            (0.69, 0.80, "#FFDB13"),
            (0.80, 1.00, "#65CBF3"),
            (1.00, 2.00, "#0053D6"),
        ],
        "legends": [
            "Flexible region (value < 0.69)",
            "Context dependent region (0.69 < value < 0.80)",
            "Rigid region (0.80 < value < 1.00)",
            "Membrane spanning region (value > 1.00)",
        ],
        "title": "DynaMine backbone dynamics",
    },
    "sidechain": {
        "colors": [
            (0, -0.6, "#FF7D45"),
            (-0.6, 0.0, "#FFDB13"),
            (0.0, 0.6, "#65CBF3"),
            (0.6, 1.2, "#0053D6"),
        ],
        "legends": [
            "Lower values",
            "Low values",
            "High values",
            "Higher values",
        ],
        "title": "DynaMine sidechain dynamics",
    },
    "ppII": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (polyproline II)",
    },
    "coil": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (coil)",
    },
    "sheet": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (sheet)",
    },
    "helix": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (helix)",
        "subtitle": "DynaMine is a fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle.",
    },
    "earlyFolding": {
        "colors": [
            (0, 0.169, "#FF7D45"),
            (0.169 ,2, "#0053D6"),
        ],
        "legends": [
            "Not likely to start the protein folding process (value < 0.169)",
            "Likely to start the protein folding process (value > 0.169)",
        ],
        "title": "EFoldMine earlyFolding propensity",
    },
    "disoMine": {
        "colors": [
            (0, 0.50, "#FF7D45"),
            (0.50, 1.00, "#0053D6"),
        ],
        "legends": [
            "Not likely a disordered residue (value < 0.50)",
            "Likely a disordered residue (value > 0.50)",
        ],
        "title": "Disomine disorder",
    },
    "agmata": {
        "colors": [
            (0.00, 0.25, "#ff0100"),
            (0.50, 0.75, "#ffff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Not likely to be involved in beta-sheet aggregation (Trough)",
            "Likely to be involved in beta-sheet aggregation (Mid Peak)",
            "Most likely to be involved in beta-sheet aggregation (Higher Peak)",
        ],
        "title": "Agmata aggregation propensity",
    },
}

B2B_TOOLS = ["dynamine", "disomine", "efoldmine", "agmata"]

B2B_TOOLS_RESULTS = np.array(
    [
        ["backbone", "sidechain", "ppII"],
        ["coil", "sheet", "helix"],
        ["earlyFolding", "disoMine", "agmata"],
    ]
)

# PLDDT_COLOR_MAP = {i: bands[2] for i, bands in enumerate(VALUES_CONSTANTS["pLDDT"]["colors"])}
# PLDDT_DEFAULT_ATOM_STYLE = {"cartoon": {"colorscheme": {"prop": "b", "map": PLDDT_COLOR_MAP}}}

In [None]:
#@markdown ## Internal source code
#@markdown Please execute this cell by pressing the _Play_ button 

class Atom(dict):
    def __init__(self, line):
        self["type"] = line[0:6].strip()
        self["idx"] = line[6:11].strip()
        self["name"] = line[12:16].strip()
        self["resname"] = line[17:20].strip()
        self["resid"] = int(int(line[22:26]))
        self["x"] = float(line[30:38])
        self["y"] = float(line[38:46])
        self["z"] = float(line[46:54])
        self["sym"] = line[76:78].strip()

    def __str__(self):
        line = list(" " * 80)

        line[0:6] = self["type"].ljust(6)
        line[6:11] = self["idx"].ljust(5)
        line[12:16] = self["name"].ljust(4)
        line[17:20] = self["resname"].ljust(3)
        line[22:26] = str(self["resid"]).ljust(4)
        line[30:38] = str(self["x"]).rjust(8)
        line[38:46] = str(self["y"]).rjust(8)
        line[46:54] = str(self["z"]).rjust(8)
        line[76:78] = self["sym"].rjust(2)
        return "".join(line) + "\n"


class Molecule(list):
    def __init__(self, file):
        for line in file:
            if "ATOM" in line or "HETATM" in line:
                self.append(Atom(line))

    def __str__(self):
        outstr = ""
        for at in self:
            outstr += str(at)

        return outstr


def check_min_max(sequence_df, former_min, former_max):
    seq_max = max(sequence_df)
    seq_min = min(sequence_df)

    if seq_max + \
            0.1 > former_max and not np.isnan(seq_max) \
            and not np.isinf(seq_max):
        former_max = seq_max + 0.1
    if seq_min - \
            0.1 < former_min and not np.isnan(seq_min) \
            and not np.isinf(seq_min):
        former_min = seq_min - 0.1
    return former_min, former_max


def plot_prediction(prediction_name, highlighting_regions, sequence_df):
    thresholds_dict = {'backbone': {'membrane spanning': [1., 1.5],
                                    'rigid': [0.8, 1.],
                                    'context-dependent': [0.69, 0.8],
                                    'flexible': [-1.0, 0.69]},
                       'earlyFolding': {'early folds': [0.169, 2.],
                                        'late folds': [-1., 0.169]},
                       'disoMine': {'ordered': [-1., 0.5],
                                    'disordered': [0.5, 2.]},
                       }
    ordered_regions_dict = {'backbone': ['flexible',
                                         'context-dependent',
                                         'rigid',
                                         'membrane spanning'],
                            'earlyFolding': ['late folds', 'early folds'],
                            'disoMine': ['ordered', 'disordered'],
                            }
    colors = ['yellow', 'orange', 'pink', 'red']
    ranges_dict = {
        'backbone': [-0.2, 1.2],
        'sidechain': [-0.2, 1.2],
        'ppII': [-0.2, 1.2],
        'earlyFolding': [-0.2, 1.2],
        'disoMine': [-0.2, 1.2],
        'agmata': [-0.2, 1.2],
        'helix': [-1., 1.],
        'sheet': [-1., 1.],
        'coil': [-1., 1.],
    }
    fig, ax = plt.subplots(1, 1)
    fig.set_figwidth(10)
    fig.set_figheight(5)
    ax.set_title(prediction_name + ' ' + 'prediction')
    min_value, max_value = ranges_dict[prediction_name]

    predictions = sequence_df[prediction_name]
    min_value, max_value = check_min_max(predictions, min_value, max_value)
    
    ax.plot(range(len(predictions)), predictions, label=selected_sequence)
    ax.set_xlim([0, len(predictions) - 1])
    
    legend_lines = plt.legend(
        bbox_to_anchor=(
            1.04,
            1),
        loc="upper left",
        fancybox=True,
        shadow=True)
    ax.add_artist(legend_lines)
    
    # Define regions
    if highlighting_regions:
        if prediction_name in ordered_regions_dict.keys():
            for i, prediction in enumerate(
                    ordered_regions_dict[prediction_name]):
                lower = thresholds_dict[prediction_name][prediction][0]
                upper = thresholds_dict[prediction_name][prediction][1]
                color = colors[i]
                ax.axhspan(
                    lower,
                    upper,
                    alpha=0.3,
                    color=color,
                    label=prediction)
            # to sort it "from up to low"
            included_in_regions_legend = list(reversed(
                [r_pred for r_pred in ordered_regions_dict[prediction_name]]))
            # Get handles and labels
            handles, labels = plt.gca().get_legend_handles_labels()
            handles_dict = {label: handles[idx]
                            for idx, label in enumerate(labels)}
            # Add legend for regions, if available
            lgnd_labels = [handles_dict[r] for r in included_in_regions_legend]
            lgnd_regions = [region for region in included_in_regions_legend]
            region_legend = ax.legend(lgnd_labels,
                                      lgnd_regions,
                                      fancybox=True,
                                      shadow=True,
                                      loc='lower left',
                                      bbox_to_anchor=(1.04, 0))
            ax.add_artist(region_legend)
    
    ax.set_ylim([min_value, max_value])
    ax.set_xlabel('residue index')
    ax.set_ylabel('prediction values')
    ax.grid(axis='y')
    plt.show()

def get_value_color(result_key, value, min_value=0, max_value=1):
    if (result_key == "ppII" 
        or result_key == "coil" 
        or result_key == "sheet" 
        or result_key == "helix" 
        or result_key == "agmata"):
        return get_spectrum_color(value, min_value, max_value)
    else:
        for band in VALUES_CONSTANTS[result_key]["colors"]:
            band_floor, band_ceiling, band_color = band
            if value > band_floor and value <= band_ceiling:
                return band_color

def get_spectrum_color(real, min_value, max_value):
    color_r = 0
    color_g = 0
    color_b = 0

    value = real
    base = max_value - min_value
    if (value < min_value):
        value = min_value 
    elif value > max_value:
        value = max_value

    if base == 0:
        value = 100
    else:
        value = (value - min_value) / base * 100
    
    if value < 50:
        color_r = 255
        off_g = 5.1 * value
        color_g = max(1, off_g)
    else:
        color_g = 255
        off_r = 510 - 5.1 * value
        color_r = max(1, off_r)
  
    return '#%02x%02x%02x' % (int(color_r), int(color_g), int(color_b))

def create_py3dmol_view(molecule, color_key):
    view = py3Dmol.view(width=400, height=300)
    view.setBackgroundColor("white")
    view.addModelsAsFrames(str(molecule))

    default_color_style = {"cartoon": {"color": "black"}}
    for atom_index, atom in enumerate(molecule):
        view.setStyle(
            {"model": -1, "serial": atom_index + 1}, 
            atom.get(color_key, default_color_style)
        )

    view.zoomTo()
    return view

def plot_result_legend(result_key):
    """Plots the legend for a predicted result."""

    value_constants = VALUES_CONSTANTS[result_key]
    colors = [x[2] for x in value_constants["colors"]]

    plt.figure(figsize=(2, 2))
    for c in colors:
        plt.bar(0, 0, color=c)

    plt.legend(value_constants["legends"], frameon=False, loc="center", fontsize=14)
    plt.xticks([])
    plt.yticks([])
    ax = plt.gca()
    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)
    ax.spines["left"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    # plt.suptitle(value_constants["title"], fontsize=18)
    plt.title(value_constants["title"], fontsize=16, pad=16)

    return plt

def plot_structure(pdb_file, b2b_predictions_results):
    print(pdb_file_path)
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        b2b_agmata_result = b2b_predictions_results['agmata']
        agmata_min_value = min(b2b_agmata_result)
        agmata_max_value = max(b2b_agmata_result)

        pbar.update(2)

        with open(pdb_file, "r") as ifile:
            molecule = Molecule(ifile)
    
        for atom in molecule:
            res_id = int(atom["resid"]) - 1
            for result_key in VALUES_CONSTANTS.keys():
                if result_key != "pLDDT":
                    atom[result_key] = b2b_predictions_results[result_key][res_id]

                if result_key == "agmata":
                    atom_color = get_value_color(
                        result_key, 
                        atom[result_key], 
                        min_value=agmata_min_value, 
                        max_value=agmata_max_value
                    )
                else:
                    atom_color = get_value_color(result_key, atom[result_key])

                atom["{0}_color".format(result_key)] = {
                    "cartoon": {
                        "color": atom_color
                      }
                }
        
        pbar.update(3)
        grid = GridspecLayout(len(np.matrix.flatten(B2B_TOOLS_RESULTS)) + 1, 2)

        # output_plot = Output()
        # with output_plot:
        #     create_py3dmol_view(molecule, "pLDDT_color").show()
        # grid[0, 0] = output_plot

        # output_legend = Output()
        # with output_legend:
        #     plot_result_legend("pLDDT").show()
        # grid[0, 1] = output_legend
        pbar.update(5)

        predictors_count = B2B_TOOLS_RESULTS.shape[0] * B2B_TOOLS_RESULTS.shape[1]
        for index, result_key in enumerate(np.matrix.flatten(B2B_TOOLS_RESULTS)):
            output_plot = Output()
            with output_plot:
                create_py3dmol_view(
                    molecule, 
                    "{0}_color".format(result_key)
                ).show()

            grid[index + 1, 0] = output_plot

            output_legend = Output()
            with output_legend:
                plot_result_legend(result_key).show()

            grid[index + 1, 1] = output_legend
            pbar.update(int(90.0 / predictors_count))

        display(grid)

def build_pdb_by_prediction(pdb_file_path, predictions, predictor):
    p = PDBParser()
    structure = p.get_structure("A", pdb_file_path)

    for model in structure:
        for chain in model:
            for residue_position, residue_object in enumerate(chain):
                for atom in residue_object:
                    atom.set_bfactor(round(predictions[predictor][residue_position], 3))

    io = PDBIO()
    io.set_structure(structure)
    io.save(path.join("/content/", f"{predictor}.pdb"))
    return path.join("/content/", f"{predictor}.pdb")


def download_pdb_by_prediction(pdb_file_path, b2b_predictions_results, predictor):
    filename = build_pdb_by_prediction(pdb_file_path, b2b_predictions_results, predictor)
    files.download(filename)

# Submit your files
Open this section to see the input cells where you will be able to submit the sequence of interest in both FASTA and PDB format.

In [None]:
#@title Upload Sequence file in FASTA format { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload just one FASTA file

from google.colab import files
from os import path
from Bio import SeqIO 
import ipywidgets as widgets

fasta_uploaded = files.upload()
print(fasta_uploaded.keys())

fasta_file_path = ""

for fasta_file_name in fasta_uploaded.keys():
  fasta_file_path = path.join('/content/', fasta_file_name)

  print('User fasta_uploaded file "{name}" with length {length} bytes'.format(
      name=fasta_file_path, length=len(fasta_uploaded[fasta_file_name])))

sequence_keys = []
fasta = SeqIO.parse(open(fasta_file_path), "fasta")
formatted_fasta_content = ""
for sequence in fasta:
  sequence_id = sequence.id.replace("|", "_")
  sequence_keys.append(sequence_id)
  formatted_fasta_content += f">{sequence_id}\n{sequence.seq}"

formatted_fasta_filepath = path.join("/content", "formatted_input.fasta")
with open(formatted_fasta_filepath, "w") as fasta_writter:
    fasta_writter.write(formatted_fasta_content)

sequence_picker = widgets.Dropdown(options=sequence_keys, description="Sequence:")
display(sequence_picker)

In [None]:
#@markdown Run this cell to confirm the sequence selected in the previous cell
selected_sequence = sequence_picker.value
print("Selected sequence:", selected_sequence)

In [None]:
#@title Upload your structure file in PDB format { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload just one PDB file

from google.colab import files
from os import path

pdb_uploaded = files.upload()

for pdb_file_name in pdb_uploaded.keys():
  pdb_file_path = path.join("/content/", pdb_file_name)

  print('User pdb_uploaded file "{name}" with length {length} bytes'.format(
      name=pdb_file_path, length=len(pdb_uploaded[pdb_file_name])))

# Processing of the input files

In [None]:
#@markdown ## Step: Predict biophysical features

with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
    with io.capture_output() as captured:
        try:
            single_seq = SingleSeq(formatted_fasta_filepath)
            pbar.update(25)
            single_seq.predict(tools=B2B_TOOLS)
            pbar.update(50)
            print(single_seq.get_all_predictions().keys())
            b2b_predictions_results = single_seq.get_all_predictions()[selected_sequence]
            pbar.update(25)            
        except subprocess.CalledProcessError:
            print(captured)
            raise

# Analysis

In [None]:
#@markdown ## Plotting the biophysical predictions
#@markdown Please execute this cell by pressing the _Play_ button. 

#@markdown ### DynaMine backbone dynamics
#@markdown Values above _0.8_ indicate **rigid conformations**, values above _1.0_ **membrane spanning regions**, values below _0.69_ **flexible regions**. 
#@markdown Values between _0.69-0.80_ are **'context' dependent** and capable of being either rigid or flexible.

#@markdown ### DynaMine sidechain dynamics
#@markdown Higher values mean more likely rigid. These values are highly dependent on the amino acid type (i.e. a Trp will be rigid, an Asp flexible).
#@markdown ### DynaMine conformational propensities (sheet, helix, coil, ppII (polyproline II))
#@markdown Higher values indicate higher propensities.
#@markdown ### EFoldMine earlyFolding propensity
#@markdown Values above _0.169_ indicate residues that are **likely to start the protein folding process**, based on only local interactions with other amino acids.
#@markdown ### Disomine disorder
#@markdown Values above _0.5_ indicate that this is **likely a disordered residue**.
#@markdown ### Agmata aggregation propensity
#@markdown These values are divided by a factor of 20 from the original. Peaks indicate residues likely to be involved in beta-sheet aggregation.

residues = b2b_predictions_results['seq']
residues_count = len(residues)
sequence_df = pd.DataFrame(columns=b2b_predictions_results.keys(), index=range(residues_count))
sequence_df.index.name = 'residue_index'
for predictor in b2b_predictions_results.keys():
    sequence_df[predictor] = b2b_predictions_results[predictor]
sequence_df = sequence_df.rename(columns={"seq": "residue"})
sequence_df = sequence_df.round(decimals=3)

for predictor in b2b_predictions_results.keys():
    if predictor != 'seq':
        plot_prediction(prediction_name=predictor,
                        highlighting_regions=True,
                        sequence_df=sequence_df)

In [None]:
#@markdown ## Render three-dimensional structures { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button

#@markdown ### DynaMine backbone dynamics
#@markdown Values above _0.8_ indicate **rigid conformations**, values above _1.0_ **membrane spanning regions**, values below _0.69_ **flexible regions**. 
#@markdown Values between _0.69-0.80_ are **'context' dependent** and capable of being either rigid or flexible.

#@markdown ### DynaMine sidechain dynamics
#@markdown Higher values mean more likely rigid. These values are highly dependent on the amino acid type (i.e. a Trp will be rigid, an Asp flexible).
#@markdown ### DynaMine conformational propensities (sheet, helix, coil, ppII (polyproline II))
#@markdown Higher values indicate higher propensities.
#@markdown ### EFoldMine earlyFolding propensity
#@markdown Values above _0.169_ indicate residues that are **likely to start the protein folding process**, based on only local interactions with other amino acids.
#@markdown ### Disomine disorder
#@markdown Values above _0.5_ indicate that this is **likely a disordered residue**.
#@markdown ### Agmata aggregation propensity
#@markdown These values are divided by a factor of 20 from the original. Peaks indicate residues likely to be involved in beta-sheet aggregation.

plot_structure(pdb_file_path, b2b_predictions_results)

In [None]:
#@markdown ## PDB download
#@markdown Select a biophysical feature and press on the _Play_ button to download
#@markdown a new PDB file containing the prediction values on the B-FACTOR column

download_pdb_by_feature = "backbone" #@param ['backbone', 'sidechain', 'ppII', 'coil', 'sheet', 'helix', 'earlyFolding', 'agmata', 'disoMine']
download_pdb_by_prediction(pdb_file_path, b2b_predictions_results, download_pdb_by_feature)

# Appendix

## Conclusion and notes

**Important note:** If you need to run these analysis again, please restart the Runtime from the menu "Runtime > Restart Runtime"

Download the Bio2Byte's tools package from our PyPi repository: https://pypi.org/project/b2bTools/.

### Citations

DynaMine:
> Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken From protein sequence to dynamics and disorder with DynaMine Nature Communications 4:2741(2013)
doi: 10.1038/ncomms3741

EFoldMine:
> Raimondi, D., Orlando, G., Pancsa, R. et al. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 7, 8826 (2017).
doi: https://doi.org/10.1038/s41598-017-08366-3

Disomine:
> Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken bioRxiv 2020.05.25.115253;
doi: https://doi.org/10.1101/2020.05.25.115253

AgMata:
> O.G, S.A, MR.S, R.D, V.W.; Accurate prediction of protein beta-aggregation with generalized statistical potentials. doi: 10.1093/bioinformatics/btz912.

Regarding AlphaFold:
```
@Article{AlphaFold2021,
  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  doi     = {10.1038/s41586-021-03819-2},
  note    = {(Accelerated article preview)},
}
```