[![Open In Colab](../../_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)
[![Get Notebook](../../_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)
[![View In GitHub](../../_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)

# Using AlphaFold2

This tutorial shows you how to use the AlphaFold2 model to create a PDB of your protein sequence of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit [Using ESMFold](https://colab.research.google.com/drive/1moKUAeMlST9-B5rQW0qGzL1L2uX5HG26?usp=drive_link).

## What you need before getting started

Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:

In [1]:
%pip install dotenv
%load_ext dotenv
%dotenv

Note: you may need to restart the kernel to use updated packages.


In [2]:
import openprotein

# Login to your session
session = openprotein.connect()

# Specify your sequence
sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"

## Creating an MSA

AlphaFold2 requires evolutionary context from a multiple sequence alignment (MSA) to make structure predictions. This section demonstrates how to create an MSA based on the sequence you wish to fold.

Start by getting the alphafold model object:

In [3]:
afmodel = session.fold.get_model('alphafold2')
afmodel.fold?

[31mSignature:[39m
afmodel.fold(
    proteins: list[openprotein.protein.Protein] | openprotein.align.msa.MSAFuture | [38;5;28;01mNone[39;00m = [38;5;28;01mNone[39;00m,
    num_recycles: int | [38;5;28;01mNone[39;00m = [38;5;28;01mNone[39;00m,
    num_models: int = [32m1[39m,
    num_relax: int = [32m0[39m,
    **kwargs,
) -> openprotein.fold.future.FoldComplexResultFuture
[31mDocstring:[39m
Post sequences to alphafold model.

Parameters
----------
proteins : List[Protein] | MSAFuture
    List of protein sequences to fold. `Protein` objects must be tagged with an `msa`. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.
num_recycles : int
    number of times to recycle models
num_models : int
    number of models to train - best model will be used
num_relax : int
    maximum number of iterations for relax

Returns
-------
job : Job
[31mFile:[39m      ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py
[31mType:

You can review some of the metadata about the AlphaFold2 model. Note that the input tokens for the model is `null` because it accepts an MSA instead of directly with sequences.

In [4]:
afmodel.metadata

ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=None, output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, descripti

Use your seed sequence to create an MSA:

In [5]:
msa = session.align.create_msa(sequence.encode())
print(msa)

job_id='529f6b22-0bd0-415c-b5b7-71cbac737ae5' job_type=<JobType.align_align: '/align/align'> status=<JobStatus.PENDING: 'PENDING'> created_date=datetime.datetime(2025, 8, 20, 8, 40, 42, 50046) start_date=None end_date=None prerequisite_job_id=None progress_message=None progress_counter=None sequence_length=None


Examine the outputs once the MSA is complete:

In [6]:
msa.wait_until_done(verbose=True)

print(list(msa.get())[0:3])

Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:07<00:00,  2.48s/it, status=SUCCESS]


[('101', 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'), ('UniRef100_G1RE34\t243\t0.764\t2.142E-68\t0\t138\t239\t0\t152\t153', 'MYRMQLLSCIALSLALVTNGAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVQELKGSETTFMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------'), ('UniRef100_A0A2K5MA48\t234\t0.753\t1.582E-65\t0\t138\t239\t0\t153\t154', 'MYRMQLLSCIALSLALVANSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRdTKDLISNINVIVLELKGSETTLMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------')]


## Predicting your sequence

Call the AlphaFold2 model:

In [7]:
afmodel.fold?

[31mSignature:[39m
afmodel.fold(
    proteins: list[openprotein.protein.Protein] | openprotein.align.msa.MSAFuture | [38;5;28;01mNone[39;00m = [38;5;28;01mNone[39;00m,
    num_recycles: int | [38;5;28;01mNone[39;00m = [38;5;28;01mNone[39;00m,
    num_models: int = [32m1[39m,
    num_relax: int = [32m0[39m,
    **kwargs,
) -> openprotein.fold.future.FoldComplexResultFuture
[31mDocstring:[39m
Post sequences to alphafold model.

Parameters
----------
proteins : List[Protein] | MSAFuture
    List of protein sequences to fold. `Protein` objects must be tagged with an `msa`. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.
num_recycles : int
    number of times to recycle models
num_models : int
    number of models to train - best model will be used
num_relax : int
    maximum number of iterations for relax

Returns
-------
job : Job
[31mFile:[39m      ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py
[31mType:

Send the MSA to the fold endpoint and return a `fold` job to await:



In [9]:
fold = afmodel.fold(msa, num_models=1)

fold

FoldJob(num_records=1, job_id='3a807932-700d-43ca-abf4-49d9f17447af', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 20, 9, 39, 31, 14221, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

In [None]:
fold.wait_until_done(verbose=True, timeout=600)

Waiting:   0%|                                                                                                             | 0/100 [01:39<?, ?it/s, status=RUNNING]

Wait for the job to complete and fetch the results all with `wait()`:

In [None]:
result = fold.wait(verbose=True)
result[0][0]

Waiting: 100%|██████████| 100/100 [00:00<00:00, 980.44it/s, status=SUCCESS]


b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'

Return a PDB file:

In [None]:
print("\n".join( list(result[0][1].decode().split("\n")[0:5]) ) )

MODEL     1                                                                     
ATOM      1  N   MET A   1     -24.000   8.852  20.203  1.00 45.47           N  
ATOM      2  CA  MET A   1     -23.406   9.719  19.188  1.00 45.47           C  
ATOM      3  C   MET A   1     -22.453   8.938  18.281  1.00 45.47           C  
ATOM      4  CB  MET A   1     -22.672  10.883  19.844  1.00 45.47           C  


## Next steps

After the PDB contents are returned, save them as a file for use with your molecular visualization system of choice.