<a href="https://colab.research.google.com/github/Bio2Byte/public_notebooks/blob/main/Bio2Byte_AlphaFold_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The bio2Byte Toolkit to analyze AlphaFold results

This Jupyter Notebook on Google Colab aims to simplify the analysis of the predicted three-dimensional structure after running AlphaFold with your sequence of interest. 

## About bio2Byte

<img src="https://pbs.twimg.com/profile_images/1247824923546079232/B9b_Yg7n.jpg" alt="b2b" width="48"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Vrije_Universiteit_Brussel_logo.svg/1200px-Vrije_Universiteit_Brussel_logo.svg.png" alt="vub" width="120"/>

Proteins are the molecular machines that make cells work. They perform a wide variety of functions through interactions with each other and many additional molecules. Traditionally, proteins are described in a single static state (a picture). It is now increasingly recognised that many proteins can adopt multiple states and move between these conformational states dynamically (a movie).

We investigate how the dynamics, conformational states and available experimental data of proteins relates to their amino acid sequence. Underlying physical and chemical principles are computationally unravelled through data integration, analysis and machine learning, so connecting them to biological events and improving our understanding of the way proteins work.

Online predictors are available on https://bio2byte.be/b2btools/

### Tools and predictors

* **DynaMine** is a fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle. This tool returns the following predictions: 
  * DynaMine backbone dynamics
  * DynaMine side chain dynamics
  * DynaMine conformational propensities (polyproline II)
  * DynaMine conformational propensities (coil)
  * DynaMine conformational propensities (sheet)
  * DynaMine conformational propensities (helix)

* **EFoldMine** is a method that predicts from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. This tool returns the following predictions:
  * EFoldMine earlyFolding propensity

* **DisoMine** method predicts protein disorder with recurrent neural networks not directly from the amino acid sequence, but instead from more generic predictions of key biophysical properties, here protein dynamics, secondary structure and early folding. This tool returns the following predictions:
  * Disomine disorder

* **AgMata** is a single-sequence based predictor of protein regions that are likely to cause beta-aggregation. This tool returns the following predictions:
  * Agmata aggregation propensity

## Requirements
This section includes the commands to download the required libraries (from PyPi) and also the import sentences to use them in our toolkit.

In [None]:
#@title Install libraries from PyPi

#@markdown Please execute this cell by pressing the _Play_ button 

from IPython.utils import io
import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

try:
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        with io.capture_output() as captured:
          !pip uninstall --yes py3Dmol
          pbar.update(15)
          !pip install py3Dmol
          pbar.update(18)
          !pip uninstall --yes b2bTools
          pbar.update(15)
          !pip install b2bTools
          pbar.update(18)
          !pip uninstall --yes biopython
          pbar.update(15)
          !pip install biopython
          pbar.update(19)
except subprocess.CalledProcessError:
    print(captured)
    raise

In [None]:
#@title Import the required libraries

#@markdown Please execute this cell by pressing the _Play_ button 

import py3Dmol
import matplotlib.pyplot as plt
import pickle
import numpy as np
from math import floor

from IPython.display import display
import ipywidgets as widgets
from ipywidgets import GridspecLayout
from ipywidgets import Output

from b2bTools import SingleSeq
from Bio import SeqIO 
from Bio.PDB import MMCIF2Dict

## Source code
This section contains the definition of all the functions that will be invoked afterwards during the analysis section.

In [None]:
#@title Define the file variables

#@markdown Please execute this cell by pressing the _Play_ button 

fasta_file_path = ''
pdb_file_path = ''
pickle_file_path = ''
cif_file_path = ''

In [None]:
#@title Define the constants

#@markdown Please execute this cell by pressing the _Play_ button 

VALUES_CONSTANTS = {
    "pLDDT": {
        "colors": [
            (0, 50, "#FF7D45"),
            (50, 70, "#FFDB13"),
            (70, 90, "#65CBF3"),
            (90, 100, "#0053D6"),
        ],
        "legends": [
            "Very low (pLDDT < 50)",
            "Low (50 < pLDDT < 70)",
            "Confident (70 < pLDDT < 90)",
            "Very high (pLDDT > 90)",
        ],
        "title": "AlphaFold's Model Confidence (pLDDT)",
    },
    "backbone": {
        "colors": [
            (0, 0.69, "#FF7D45"),
            (0.69, 0.80, "#FFDB13"),
            (0.80, 1.00, "#65CBF3"),
            (1.00, 2.00, "#0053D6"),
        ],
        "legends": [
            "Flexible region (value < 0.69)",
            "Context dependent region (0.69 < value < 0.80)",
            "Rigid region (0.80 < value < 1.00)",
            "Membrane spanning region (value > 1.00)",
        ],
        "title": "DynaMine backbone dynamics",
    },
    "sidechain": {
        "colors": [
            (0, 0.69, "#FF7D45"),
            (0.69, 0.80, "#FFDB13"),
            (0.80, 1.00, "#65CBF3"),
            (1.00, 2.00, "#0053D6"),
        ],
        "legends": [
            "Flexible region (value < 0.69)",
            "Context dependent region (0.69 < value < 0.80)",
            "Rigid region (0.80 < value < 1.00)",
            "Membrane spanning region (value > 1.00)",
        ],
        "title": "DynaMine sidechain dynamics",
    },
    "ppII": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (polyproline II)",
    },
    "coil": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (coil)",
    },
    "sheet": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (sheet)",
    },
    "helix": {
        "colors": [   
            (0.00, 0.25, "#ff0100"),
            (0.25, 0.50, "#ffff00"),
            (0.50, 0.75, "#7fff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Lowest propensity (value = 0.00)",
            "Low propensity (0.00 < value < 0.25)",
            "Mid propensity (0.25 < value < 0.50)",
            "High propensity (0.50 < value < 0.75)",
            "Very high propensity(0.75 < value < 1.00)",
        ],
        "title": "DynaMine conformational propensities (helix)",
        "subtitle": "DynaMine is a fast predictor of protein backbone dynamics using only sequence information as input. The version here also predicts side-chain dynamics and secondary structure predictors using the same principle.",
    },
    "earlyFolding": {
        "colors": [
            (0, 0.169, "#FF7D45"),
            (0.169 ,2, "#0053D6"),
        ],
        "legends": [
            "Not likely to start the protein folding process (value < 0.169)",
            "Likely to start the protein folding process (value > 0.169)",
        ],
        "title": "EFoldMine earlyFolding propensity",
    },
    "disoMine": {
        "colors": [
            (0, 0.50, "#FF7D45"),
            (0.50, 1.00, "#0053D6"),
        ],
        "legends": [
            "Not likely a disordered residue (value < 0.50)",
            "Likely a disordered residue (value > 0.50)",
        ],
        "title": "Disomine disorder",
    },
    "agmata": {
        "colors": [
            (0.00, 0.25, "#ff0100"),
            (0.50, 0.75, "#ffff00"),
            (0.75, 1.00, "#01ff00"),
        ],
        "legends": [
            "Not likely to be involved in beta-sheet aggregation (Trough)",
            "Likely to be involved in beta-sheet aggregation (Mid Peak)",
            "Most likely to be involved in beta-sheet aggregation (Higher Peak)",
        ],
        "title": "Agmata aggregation propensity",
    },
}

B2B_TOOLS = ["dynamine", "disomine", "efoldmine", "agmata"]

B2B_TOOLS_RESULTS = np.array(
    [
        ["backbone", "sidechain", "ppII"],
        ["coil", "sheet", "helix"],
        ["earlyFolding", "disoMine", "agmata"],
    ]
)

PLDDT_COLOR_MAP = {i: bands[2] for i, bands in enumerate(VALUES_CONSTANTS["pLDDT"]["colors"])}
PLDDT_DEFAULT_ATOM_STYLE = {"cartoon": {"colorscheme": {"prop": "b", "map": PLDDT_COLOR_MAP}}}

In [None]:
#@title Define the classes

#@markdown Please execute this cell by pressing the _Play_ button 

class Atom(dict):
    def __init__(self, line):
        self["type"] = line[0:6].strip()
        self["idx"] = line[6:11].strip()
        self["name"] = line[12:16].strip()
        self["resname"] = line[17:20].strip()
        self["resid"] = int(int(line[22:26]))
        self["x"] = float(line[30:38])
        self["y"] = float(line[38:46])
        self["z"] = float(line[46:54])
        self["sym"] = line[76:78].strip()

    def __str__(self):
        line = list(" " * 80)

        line[0:6] = self["type"].ljust(6)
        line[6:11] = self["idx"].ljust(5)
        line[12:16] = self["name"].ljust(4)
        line[17:20] = self["resname"].ljust(3)
        line[22:26] = str(self["resid"]).ljust(4)
        line[30:38] = str(self["x"]).rjust(8)
        line[38:46] = str(self["y"]).rjust(8)
        line[46:54] = str(self["z"]).rjust(8)
        line[76:78] = self["sym"].rjust(2)
        return "".join(line) + "\n"


class Molecule(list):
    def __init__(self, file):
        for line in file:
            if "ATOM" in line or "HETATM" in line:
                self.append(Atom(line))

    def __str__(self):
        outstr = ""
        for at in self:
            outstr += str(at)

        return outstr

In [None]:
#@title Declare the methods required to analyze the sequences

#@markdown Please execute this cell by pressing the _Play_ button 

import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

def analyze_sequence(plddt, b2b_predictions_results):
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        fig, axs = plt.subplots(
            B2B_TOOLS_RESULTS.shape[0], 
            B2B_TOOLS_RESULTS.shape[1], 
            sharex=False, 
            sharey=False
        )
        fig.set_size_inches(18, 10)
        fig.legend(
            VALUES_CONSTANTS["pLDDT"]["legends"], 
            frameon=False, 
            loc="center", 
            fontsize=20
        )
        pbar.update(10)

        predictors_count = B2B_TOOLS_RESULTS.shape[0] * B2B_TOOLS_RESULTS.shape[1]
        for ix, iy in np.ndindex(B2B_TOOLS_RESULTS.shape):
            result_key = B2B_TOOLS_RESULTS[ix, iy]

            axs[ix, iy].plot(
                b2b_predictions_results[result_key], 
                "tab:blue", 
                label=result_key
            )
            axs[ix, iy].set_ylabel("{0} (blue)".format(result_key))
            if not result_key == "agmata":
                axs[ix, iy].set_ylim([0, 1.20])

            pLDDT_axis = axs[ix, iy].twinx()
            pLDDT_axis.plot(plddt, "tab:red", label="pLDDT")
            pLDDT_axis.set_ylabel("pLDDT (red)")
            pLDDT_axis.set_ylim([0, 120])
            
            axs[ix, iy].set_title("{0} vs pLDDT".format(result_key))
            axs[ix, iy].set_xlabel("Residue position")
            pbar.update(90/predictors_count)

        fig.tight_layout()
        fig.show()

In [None]:
#@title Declare the methods required to plot the 3D structures

#@markdown Please execute this cell by pressing the _Play_ button 

import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

def get_value_color(result_key, value, min_value=0, max_value=1):
    if (result_key == "ppII" 
        or result_key == "coil" 
        or result_key == "sheet" 
        or result_key == "helix" 
        or result_key == "agmata"):
        return get_spectrum_color(value, min_value, max_value)
    else:
        for band in VALUES_CONSTANTS[result_key]["colors"]:
            band_floor, band_ceiling, band_color = band
            if value > band_floor and value <= band_ceiling:
                return band_color

def get_spectrum_color(real, min_value, max_value):
    color_r = 0
    color_g = 0
    color_b = 0

    value = real
    base = max_value - min_value
    if (value < min_value):
        value = min_value 
    elif value > max_value:
        value = max_value

    if base == 0:
        value = 100
    else:
        value = (value - min_value) / base * 100
    
    if value < 50:
        color_r = 255
        off_g = 5.1 * value
        color_g = max(1, off_g)
    else:
        color_g = 255
        off_r = 510 - 5.1 * value
        color_r = max(1, off_r)
  
    return '#%02x%02x%02x' % (int(color_r), int(color_g), int(color_b))

def create_py3dmol_view(molecule, color_key):
    view = py3Dmol.view(width=400, height=300)
    view.setBackgroundColor("white")
    view.addModelsAsFrames(str(molecule))

    default_color_style = {"cartoon": {"color": "black"}}
    for atom_index, atom in enumerate(molecule):
        view.setStyle(
            {"model": -1, "serial": atom_index + 1}, 
            atom.get(color_key, default_color_style)
        )

    view.zoomTo()
    return view


def plot_result_legend(result_key):
    """Plots the legend for a predicted result."""

    value_constants = VALUES_CONSTANTS[result_key]
    colors = [x[2] for x in value_constants["colors"]]

    plt.figure(figsize=(2, 2))
    for c in colors:
        plt.bar(0, 0, color=c)

    plt.legend(value_constants["legends"], frameon=False, loc="center", fontsize=14)
    plt.xticks([])
    plt.yticks([])
    ax = plt.gca()
    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)
    ax.spines["left"].set_visible(False)
    ax.spines["bottom"].set_visible(False)
    # plt.suptitle(value_constants["title"], fontsize=18)
    plt.title(value_constants["title"], fontsize=16, pad=16)

    return plt

def plot_structure(pdb_file, plddt, b2b_predictions_results):
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        b2b_agmata_result = b2b_predictions_results['agmata']
        agmata_min_value = min(b2b_agmata_result)
        agmata_max_value = max(b2b_agmata_result)

        pbar.update(2.5)

        with open(pdb_file, "r") as ifile:
            molecule = Molecule(ifile)
    
        for atom in molecule:
            res_id = int(atom["resid"]) - 1
            for result_key in VALUES_CONSTANTS.keys():
                if result_key == "pLDDT":
                    atom[result_key] = plddt[res_id]
                else:
                    atom[result_key] = b2b_predictions_results[result_key][res_id]

                if result_key == "agmata":
                    atom_color = get_value_color(
                        result_key, 
                        atom[result_key], 
                        min_value=agmata_min_value, 
                        max_value=agmata_max_value
                    )
                else:
                    atom_color = get_value_color(result_key, atom[result_key])

                atom["{0}_color".format(result_key)] = {
                    "cartoon": {
                        "color": atom_color
                      }
                }
        
        pbar.update(2.5)
        grid = GridspecLayout(len(np.matrix.flatten(B2B_TOOLS_RESULTS)) + 1, 2)

        output_plot = Output()
        with output_plot:
            create_py3dmol_view(molecule, "pLDDT_color").show()
        grid[0, 0] = output_plot

        output_legend = Output()
        with output_legend:
            plot_result_legend("pLDDT").show()
        grid[0, 1] = output_legend
        pbar.update(5)

        predictors_count = B2B_TOOLS_RESULTS.shape[0] * B2B_TOOLS_RESULTS.shape[1]
        for index, result_key in enumerate(np.matrix.flatten(B2B_TOOLS_RESULTS)):
            output_plot = Output()
            with output_plot:
                create_py3dmol_view(
                    molecule, 
                    "{0}_color".format(result_key)
                ).show()

            grid[index + 1, 0] = output_plot

            output_legend = Output()
            with output_legend:
                plot_result_legend(result_key).show()

            grid[index + 1, 1] = output_legend
            pbar.update(90 / predictors_count)

        display(grid)

## Files input
Please upload the sequence in FASTA format, the structure in PDB format and the AlphaFold results in either Pickle format or CIF.

In [None]:
#@title Upload Fasta { display-mode: "form" }

#@markdown Please execute this cell by pressing the _Play_ button and upload just one FASTA file

from google.colab import files
from os import path

fasta_uploaded = files.upload()
print(fasta_uploaded.keys())

for fasta_file_name in fasta_uploaded.keys():
  fasta_file_path = path.join('/content/', fasta_file_name)

  print('User fasta_uploaded file "{name}" with length {length} bytes'.format(
      name=fasta_file_path, length=len(fasta_uploaded[fasta_file_name])))

sequence_keys = []
fasta = SeqIO.parse(open(fasta_file_path), "fasta")
for sequence in fasta:
  sequence_keys.append(sequence.id)

sequence_picker = widgets.Dropdown(options=sequence_keys, description="Sequence:")
display(sequence_picker)

In [None]:
selected_sequence = sequence_picker.value
print("Selected sequence:", selected_sequence)

In [None]:
#@title Upload PDB { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload just one PDB file

from google.colab import files
from os import path

pdb_uploaded = files.upload()

for pdb_file_name in pdb_uploaded.keys():
  pdb_file_path = path.join("/content/", pdb_file_name)

  print('User pdb_uploaded file "{name}" with length {length} bytes'.format(
      name=pdb_file_path, length=len(pdb_uploaded[pdb_file_name])))

In [None]:
#@title Upload PKL { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload just one Pickle file
from google.colab import files
from os import path

pkl_uploaded = files.upload()

for pickle_file_name in pkl_uploaded.keys():
  pickle_file_path = path.join("/content/", pickle_file_name)

  print('User pkl_uploaded file "{name}" with length {length} bytes'.format(
      name=pickle_file_path, length=len(pkl_uploaded[pickle_file_name])))

In [None]:
#@title Upload mmCIF { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload a mmCIF file instead the PKL
from google.colab import files
from os import path

cif_uploaded = files.upload()

for cif_file_name in cif_uploaded.keys():
  cif_file_path = path.join("/content/", cif_file_name)

  print('User cif_uploaded file "{name}" with length {length} bytes'.format(
      name=cif_file_path, length=len(cif_uploaded[cif_file_name])))


## Analysis
This section invokes the previous defined functions to analyze the information provided by AlphaFold. It includes pLDDT vs different Bio2Byte tools and the three-dimensional plot of the predicted structure colored using different values. 

In [None]:
print(pickle_file_path)
print(cif_file_path)

fasta = SeqIO.parse(open(fasta_file_path), "fasta")
for r in fasta.records:
  print(len(r.seq), r.seq)

if pickle_file_path != '':
    with open(pickle_file_path, "rb") as dbfile:
        db = pickle.load(dbfile)
    plddt = np.array(db["plddt"])
elif cif_file_path != '':
    cif_dictionary = MMCIF2Dict.MMCIF2Dict(cif_file_path)
    plddt = map(lambda v: float(v), cif_dictionary.get('_ma_qa_metric_local.metric_value'))
    plddt = list(plddt)
else:
    raise FileNotFoundError("pLDDT data is not available, please upload a PKL or mmCIF file and try running this cell again")


In [None]:
#@title Run B2B Tools { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button and upload just one PDB file

from IPython.utils import io
import tqdm.notebook

TQDM_BAR_FORMAT = '{l_bar}{bar}| {n_fmt}/{total_fmt} [elapsed: {elapsed} remaining: {remaining}]'

try:
    with tqdm.notebook.tqdm(total=100, bar_format=TQDM_BAR_FORMAT) as pbar:
        with io.capture_output() as captured:
            single_seq = SingleSeq(fasta_file_path)
            pbar.update(25)
            single_seq.predict(tools=B2B_TOOLS)
            pbar.update(50)
            b2b_predictions_results = single_seq.get_all_predictions()[selected_sequence]
            pbar.update(25)
except subprocess.CalledProcessError:
    print(captured)
    raise

In [None]:
#@title Plot predictions vs pLDDT { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button. 

#@markdown Blue series represents the Bio2Byte predictions and red one is the pLDDT value. 

#@markdown **About pLDDT**:
#@markdown > AlphaFold produces a per-residue confidence score (pLDDT) between 0 and 100. Some regions below 50 pLDDT may be unstructured in isolation.
analyze_sequence(plddt, b2b_predictions_results)

In [None]:
#@title Render three-dimensional structures { display-mode: "form" }
#@markdown Please execute this cell by pressing the _Play_ button

plot_structure(pdb_file_path, plddt, b2b_predictions_results)

## Conclusion and notes

**Important note:** If you need to run these analysis again, please restart the Runtime from the menu "Runtime"

Download the Bio2Byte's tools package from our PyPi repository: https://pypi.org/project/b2bTools/.

### Citations

DynaMine:
> Elisa Cilia, Rita Pancsa, Peter Tompa, Tom Lenaerts, and Wim Vranken From protein sequence to dynamics and disorder with DynaMine Nature Communications 4:2741(2013)
doi: 10.1038/ncomms3741

EFoldMine:
> Raimondi, D., Orlando, G., Pancsa, R. et al. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 7, 8826 (2017).
doi: https://doi.org/10.1038/s41598-017-08366-3

Disomine:
> Gabriele Orlando, Daniele Raimondi, Francesco Codice, Francesco Tabaro, Wim Vranken bioRxiv 2020.05.25.115253;
doi: https://doi.org/10.1101/2020.05.25.115253

AgMata:
> O.G, S.A, MR.S, R.D, V.W.; Accurate prediction of protein beta-aggregation with generalized statistical potentials. doi: 10.1093/bioinformatics/btz912.

Regarding AlphaFold:
```
@Article{AlphaFold2021,
  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  doi     = {10.1038/s41586-021-03819-2},
  note    = {(Accelerated article preview)},
}
```