# A note on how to run AlphaFold 3

**Author**: Xiping Gong (xipinggong@hotmail.com, Department of Food Science and Technology, College of Agricultural and Environmental Sciences, University of Georgia, Griffin, GA, USA)

**Date**: 01/22/2025 (first draft); 04/24/2025 (create a modified "af3.sh" script and more)


# Introduction

AlphaFold 2 has renolutionized biomolecular structrue prediction by providing accurate 3D protein structures, which can be effectively used for rapid molecular docking (DOI: https://doi.org/10.1038/s41586-021-03819-2). This year, AlphaFold 3 was launched, extending its capability to accurately model the biomolecule-ligand interactions, likely offering unprecedented precision in studying PFAS binding to critical toxicological targets, such as proteins (DOI: https://doi.org/10.1038/s41586-024-07487-w). It was claimed that its advanced predictive accuracy significantly surpasses that of tranditional molecular docking models (e.g., AutoDock Vina), providing more opportunities in understanding the PFAS-biomolecule binding mechanisms that drive PFAS bioaccumulation and toxicity (DOI: https://doi.org/10.1038/s41586-024-07487-w). The recent release of open-source code in November 2024 (Link: https://github.com/google-deepmind/alphafold3) introduces high-throughput capabilities, making it possible to rapidly screen a wide array of biomolecule-logand interactions. These advancements provide a foundation for generating high-quality structural features on PFAS-biomolecule interaction.

This note uses the PFOA-human serum albumin interaction as an example to demonstrate how AlphaFold 3 can be utilized for docking. Additionally, I discuss the docking results and compare them to the outcomes obtained using AutoDock Vina from our previous note.

AlphaFold 3: https://github.com/google-deepmind/alphafold3


# An example: PFOA - human serum albumin (hSA) protein

The goal of this example is to how we can use the AlphaFold 3 to predict the binding of PFOA with the hSA protein. 
To test it, I integrated all scripts (Python and Bash) together, so that we can automatically screen other potential PFAS molecules.


## Background

**Reference**
Maso, Lorenzo, et al. "Unveiling the binding mode of perfluorooctanoic acid to human serum albumin." Protein Science 30.4 (2021): 830-841. DOI: https://doi.org/10.1002/pro.4036

![Alt text](https://onlinelibrary.wiley.com/cms/asset/641b2e4e-b7a8-429b-8b78-d9238385a0ab/pro4036-fig-0001-m.jpg)

**Figure 1**. Structure of hSA in complex with PFOA and Myr. Chemical structure (top) and composite omit maps depicting the (Fo−Fc) electron density (bottom) of PFOA (a) and Myr (b) contoured at 4σ; (c) Crystal structure of hSA-PFOA-Myr complex (white) obtained using a twofold molar excess of PFOA over Myr [PDB identification code: 7AAI]; (d) Superimposition of hSA-PFOA-Myr ternary complex (white) with aligned hSA-Myr binary complex (blue white) [PDB identification code: 7AAE]. The structure of hSA is organized in homologues domains (I, II and III), subdomains (A and B), fatty acids (FA) and Sudlow's binding sites. The α-helices of hSA are represented by cylinders. Bound PFOA and Myr are shown in a ball-and-stick representation with a semi-transparent van der Waals and colored by atom type (PFOA: carbon = dark salmon, oxygen = firebrick, fluorine = palecyan; Myr: carbon = smudge green, oxygen = firebrick). The electron density PFOA and Myr is shown as grey mesh. (Note: I switched the "7AAE" with "7AAI" after checking out both structures from the PDB database.)


## Download the repository

I have created a GitHub repository, and we should have it downloaded first, which includes some required scripts. 

```bash
$ git clone https://github.com/XipingGong/pfas_docking.git
$ cd pfas_docking # go to this directory, and we will have a test later.
```


## A general script to run the AF3 docking

AF3 only requires a json file which includes the basic info of the protein-ligand complex, like protein sequence and ligand ID, so it can be straightforward to run the AlphaFold 3.

+ **Step 1. Prepare the input files: af3.json and request the parameters file**

A json example of hSA-PFOA is shown as follows, and save it as "af3.json". You can also check out the document for the details from here: https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md

```json
{
  "name": "af3",
  "sequences": [
    {
    "protein": {
        "id": "A",
        "sequence": "HKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKCCAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAAL"
      }
    },
    {
     "ligand": {
        "id": "B",
        "ccdCodes": ["8PF"]
      }
    }
  ],
  "modelSeeds": [1],
  "bondedAtomPairs": [],
  "dialect": "alphafold3",
  "version": 2
}
```

+ **Step 2: Check out the "af3.sh" script**


```bash
# We provided a general script, "af3.sh", to run the AlphaFold 3.
# Please modify the INPUT section in the "scripts/af3.sh" file, like
# >> scripts_dir='/home/xg69107/program/pfas_docking/scripts' # need to change
# >> af3_param_dir='/home/xg69107/program/alphafold3' # need to change
# >> obabel="/home/xg69107/program/anaconda/anaconda3/envs/gmxMMPBSA/bin/obabel"
# >> python="/home/xg69107/program/anaconda/anaconda3/bin/python"

```

+ **Step 3: Run the "af3.sh" script**

```bash
$ mkdir -p test/dock_dir/7AAI_8PF # create a test folder
$ cd test/dock_dir/7AAI_8PF # go to this test folder
$ sbatch ../../../scripts/af3.sh --input_json af3.json # submit an AF3 job and the results will be saved in an 'af3' folder.
# It could take ~10 min if it runs

```

+ **Step 4: Check out the results**
```bash
$ ls -lrt af3 # see what this 'af3' folder has
# >>
# af3_data.json
# af3_model.cif
# af3_summary_confidences.json
# af3_confidences.json
# TERMS_OF_USE.md
# ranking_scores.csv
# best_pose
# seed-1_sample-0
# seed-1_sample-1
# seed-1_sample-2
# seed-1_sample-3
# seed-1_sample-4
```

# Analysis & Conclusion


## RMSD check

We can calcualte its RMSD value in terms of the native experimental structure.
Usually, a RMSD value (<= 0.2 nm) can be a good prediction.

+ **Obtain the native structure first**

```bash
# Download the native structure from the RCSB PDB
$ bash ../../../scripts/get_native_pdb.sh --pdbid 7AAI --ligandid 8PF # it will generate four models because of four ligands
# We can take the "7AAI_8PF_1.pdb" as the native structure
$ cp 7AAI_8PF_1.pdb native_model.pdb
```

+ **Calculate the RMSD values of AF3-predicted structures**

```bash
# Before calculating the RMSD values, we still need to clean up the AF3-predicted structures, because they could have a different format from the native structure. We also need to do a structural alignment.
$ bash ../../../scripts/af3_ana.sh # please have a look
# >>
# 📊 Protein Backbone RMSD (Direct): Min = 0.765 nm ; [0.79301757 0.7891323  0.78303146 0.7893516  0.79301757 0.7651273 ] nm
# 📊 Protein Backbone RMSD (MDTraj): Min = 0.434 nm ; [0.43501106 0.44744506 0.4344369  0.45243776 0.43501106 0.4348642 ] nm
# 📊 Ligand RMSD (Direct): Min = 0.846 nm ; [0.87349176 0.8715516  0.8641499  0.8538811  0.87349176 0.84595853] nm
# 📊 Ligand RMSD (MDTraj): Min = 0.143 nm ; [0.14293166 0.17109454 0.1693879  0.17163862 0.14293166 0.164534  ] nm
# 📊 Protein Backbone Pocket RMSD (Direct): Min = 0.077 nm ; [0.07853515 0.07989438 0.07657553 0.08200549 0.07853515 0.07801802] nm
# 📊 Protein Backbone Pocket RMSD (MDTraj): Min = 0.077 nm ; [0.0785345  0.07989517 0.07657539 0.08200631 0.0785345  0.07801855] nm


```


## VMD visualization

We can use the VMD software to have a look at the AF3-predicted structures, and see what protein residues interact with the PFOA molecule.

Please see the demo (Link: ).

```bash
# Useful commands to obtain the hSA-PFOA interactions
# Native structure
$ python ../../../scripts/get_interactions_from_pdb.py native_model.pdb 8PF --cutoff_distance 0.3
# AF3-predicted best pose
$ python ../../../scripts/get_interactions_from_pdb.py af3/best_pose/aligned_model.pdb 8PF --cutoff_distance 0.3

```

<img src="af3_docking_pfoa_hsa.svg" alt="Illustration of PFOA-hSA" style="width:80%;">

**Figure 1** Comparison of PFOA-hSA interaction structures obtained experimentally and through AlphaFold 3 docking.

The results reveal a close alignment between the two methods, with the head group of PFOA showing strong similarity. Notably, no specific binding pocket was predefined in this docking example, indicating that AlphaFold 3 can accurately predict the binding pocket of PFOA in the hSA protein. However, differences are observed in the orientation of the PFOA tail. 


# Questions

## How to submit an AF3 job at Sapelo2@GACRC?

Please also check out the documentation from here: https://wiki.gacrc.uga.edu/wiki/AlphaFold3-Sapelo2 

Please also check out the script 'af3.sh'

## What if my ligand is negatively charged?

The AF3 provides multiple ways to define the input ligand. For example, you can use a SMILES string to define the ligand structure, like the example below.
Please see more details by checking this link: https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md

```bash
{
  "name": "af3",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "HKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKCCAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAAL"
      }
    },
    {
      "ligand": {
        "id": "B",
        "smiles": "O=C([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F"
      }
    }
  ],
  "modelSeeds": [
    1
  ],
  "bondedAtomPairs": [],
  "dialect": "alphafold3",
  "version": 2
}


```

## If I have a new structure, then how can I align the AF3-predicted stuctures based on this new structure?

```bash
# We provide a simple bash script 'af3_ana.sh' to align the AF3-predicted structures based on any structure we have.
# So, what you can do is just to replace the "native_model.pdb" file with the new structure file.
#
# Saved into an 'af3_ana.sh' bash file
#
scripts_dir="/home/xg69107/program/pfas_docking/scripts"
python="/home/xg69107/program/anaconda/anaconda3/bin/python"

NATIVE_MODEL="native_model.pdb"
NATIVE_LIGAND="native_ligand.pdb"
awk 'substr($0,22,1)=="B"' $NATIVE_MODEL > $NATIVE_LIGAND

for bp in af3/*; do
  [ -d "$bp" ] || continue

  # split out chain A (protein) and chain B (ligand)
  awk 'substr($0,22,1)=="A"' "$bp/model.pdb" > "$bp/x_prot.pdb"
  awk 'substr($0,22,1)=="B"' "$bp/model.pdb" > "$bp/x_lig.pdb"

  echo "python $scripts_dir/rename_atom_info_by_topology.py --ref $NATIVE_LIGAND $bp/x_lig.pdb > $bp/x_lig_convert.pdb"
        python $scripts_dir/rename_atom_info_by_topology.py --ref $NATIVE_LIGAND $bp/x_lig.pdb > $bp/x_lig_convert.pdb
  echo ""

  # stitch protein + converted ligand into one PDB
  cat "$bp/x_prot.pdb" "$bp/x_lig_convert.pdb" > "$bp/model_convert.pdb"

  echo "$bp: Converted $bp/model_convert.pdb"
done
echo ""

# Alignment: default 'aligned_model.pdb' will be created for each AF3-predicted structure
$python $scripts_dir/align_pdb.py  --ref $NATIVE_MODEL "af3/*/model_convert.pdb"

# Alignment: Convert all AF3-predicted structures into a pdb file 'af3_model.pdb'
$python $scripts_dir/align_pdb.py  --ref $NATIVE_MODEL "af3/*/aligned_model.pdb" -o af3_model.pdb

# Calculate RMSD in terms of native structure
$python $scripts_dir/check_rmsd.py --ref $NATIVE_MODEL "af3/*/aligned_model.pdb"


```

## How can save the RMSD data from multiple tests in a data file?

```bash
# Save the RMSD data into a data file
$ python ../../../scripts/check_rmsd.py --ref native_model.pdb "af3/*/aligned_model.pdb" > af3_model.rmsd

```


```bash
# If we have finished multiple tests and obtained many "af3_model.rmsd" data files, then we want to put these data together
$ cd ../ # go back to a previous directory that includes many test examples
$ python ../../scripts/extract_rmsd.py --num_per_array 6 "*/af3_model.rmsd"
```