# Parsing output of AF3 from the cluster
created by Andreas 2025-02-04

Script to parse output from AF3 running on the IMB server cluster. Also detects errors on the runs using the reports created by Nextflow

This notebook supports to be run cell by cell from top to bottom. Runtime was around 35 minutes, but most (I estimate 90%) is comming from accessing the files on a network drive

### 0 Settings + Imports

Run the following lines and edited in the second cell the necessary paths / settings

Note: In this run the known_extensions have been excluded. To include them, change the code in the _# Loading the output folders_ cell in this section and the first cell of section 3

In [16]:
# Imports
from pathlib import Path
import pandas as pd
import numpy as np
import re
import json

import pymol
from Bio.PDB import PDBParser
from Bio.PDB.Structure import Structure as BioPy_PDBStructure
from Bio.PDB.Model import Model as BioPy_PDBModel
from Bio.PDB.PDBExceptions import PDBConstructionException
parser = PDBParser(QUIET=True)

ressources_path = Path("../ressources").resolve()


In [17]:
# Settings

# The base folder of the AF output. The AF3 files are searched inside /Alpha
luck_drive_folder = Path("/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold") 

# The folder to export the output
export_destination = Path("/Users/imb/Desktop")

# Set this option to skip existing structures
export_skip_existing_structures = False

In [18]:
# Loading the output folders
# Note: The known_extension structures should not be considered, so exclude them explicitly
af3_runs_folder = luck_drive_folder / "AlphaFold3"
DMI_folders = [p for p in (af3_runs_folder / "AlphaFold_benchmark_DMI").iterdir() if p.is_dir()]
DDI_folders = [p for p in (af3_runs_folder / "AlphaFold_benchmark_DDI").iterdir() if p.is_dir()]
benchmark_folders = DMI_folders + DDI_folders

# Load also the AF2 folder to get the original input files without switched chains to undo the chain switching
af2_folders = { "known_minimal": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run37", # DMI known minimals
                "random_minimal": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run38", # DMI random minimals
                "known_extension": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run51", # DMI known extensions
                "mutations": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run43", # DMI mutations
                "known_ddi": luck_drive_folder / "AlphaFold_benchmark_DDI" / "run5", # DDI known DDI
                "random_ddi": luck_drive_folder / "AlphaFold_benchmark_DDI" / "run6", # DDI random DDI
}

print("Folders with AF3 output:")
for p in benchmark_folders:
    if not p.exists():
        print(f"{p} does not exist")
    else:
        print(p)
        
print("Folders with AF2 output")
for p in af2_folders.values():
    if not p.exists():
        print(f"{p} does not exist")
    else:
        print(p)

Folders with AF3 output:
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/known_minimal
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/random_minimal
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/mutations
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/known_extension
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/known_extension_old
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DDI/known_ddi
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DDI/random_ddi
Folders with AF2 output
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold_benchmark_DMI/run37
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold_benchmark_DMI/run38
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold_benchmark_DMI/run51
/Volumes/i

### 1 Scanning input .json files and report.html files
Scans for all input .json files and corresponding report_%time%.html files to find failed runs

* **benchmark_set** refers to the pairing method (mutated, randomized, ...) and is equal to the folder name (example: known_minimal)
* **prediction_name** is extracted from a) the json file or b) the report_%time%.html file. If the value can't be extracted from report_%time%.html, it is set to None
* **report_file** refers to the name of the report_%time%.html file and is None if the json input file could not been matched with a report file (--> input json has not been run on the cluster \[yet\])
* **run_ok** refers to if there had been an error running the input file on the server. Set to None if the input file has not been run on the cluster.
* **input_json** is set to the filename of the input json or to None if the report_%time%.html could not be matched with a input json file.

Use <i>report_file == None</i> or <i>run_ok == None</i> to identify scheduled but not yet run structures.<br>Use <i>input_json == None</i> to find runs without a input file

In [None]:
# Scanning input and report files
report_df = pd.DataFrame(columns=["benchmark_set", "prediction_name", "report_file", "run_ok", "input_json"])


for folder in benchmark_folders:
    benchmark_set = folder.name
    print(benchmark_set)
    # Ignore the f.is_file() check in the following lines as it will decrease performance massively on a network drive (example runtime: 0.9s --> 1m 25.6 s (!))
    nextflow_inputs = [f for f in folder.iterdir() if f.suffix.lower() == ".json"]
    for nextflow_input in nextflow_inputs:
        prediction_name = nextflow_input.stem
        report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "prediction_name": prediction_name, "report_file": None, "run_ok": None, "input_json": nextflow_input.name}
    
    for p in [f for f in folder.iterdir() if  "report_" in f.stem and f.suffix.lower() == ".html"]:
        print("\t", p.name, end=" ")
        with open(p) as f:
            content = f.read()
        
        # Extract all prediction names from the HTML
        prediction_names = []
        finished = bool("Workflow execution completed successfully!" in content)
        
        # Pattern 1: [id:[LIST_OF_NAMES]] - extract all names from the list
        list_matches = re.findall(r"\[id:\[([^\]]+)\]\]", content)
        for match in list_matches:
            # Split by comma and clean up each name
            names = [name.strip() for name in match.split(',')]
            prediction_names.extend(names)
        
        # Pattern 2: [id:single_name, model:something] - extract single names
        single_matches = re.findall(r"\[id:([^,\]]+)(?:,|\])", content)
        prediction_names.extend(single_matches)
        
        # Clean up prediction names: remove leading brackets, strip whitespace, normalize case
        cleaned_names = []
        for name in prediction_names:
            # Remove leading bracket if present
            cleaned_name = name.lstrip('[').strip()
            if cleaned_name:  # Only add non-empty names
                cleaned_names.append(cleaned_name)
        
        # Remove duplicates by converting to lowercase, then keep original case for first occurrence
        seen_lower = set()
        unique_names = []
        for name in cleaned_names:
            name_lower = name.lower()
            if name_lower not in seen_lower:
                seen_lower.add(name_lower)
                unique_names.append(name)
        
        prediction_names = unique_names
        
        if prediction_names:
            print(f"-> Found {len(prediction_names)} predictions: {prediction_names[:3]}{'...' if len(prediction_names) > 3 else ''}")
            
            # Track how many predictions we successfully matched
            matched_predictions = 0
            
            # Update all matching rows for each prediction found in this report
            for prediction_name in prediction_names:
                # Case-insensitive matching: compare lowercase versions
                matching_mask = np.logical_and(
                    report_df["benchmark_set"] == benchmark_set, 
                    report_df["prediction_name"].str.lower() == prediction_name.lower()
                )
                num_matches = len(report_df.loc[matching_mask])
                
                if num_matches > 0:
                    matched_predictions += 1
                    print(f"\t\tUpdating {num_matches} rows for prediction: {prediction_name}")
                    report_df.loc[matching_mask, "run_ok"] = finished
                    report_df.loc[matching_mask, "report_file"] = p.name
                # Note: We don't create new rows for unmatched predictions to avoid duplicates
            
            # If no predictions matched any JSON files, create a single row with None
            if matched_predictions == 0:
                print("\t\tNo JSON files found for any predictions in this report")
                report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "report_file": p.name, "prediction_name": None, "run_ok": finished, "input_json": None}
                
        else:
            print("-> No predictions found")
            # Create entry with None prediction_name (same as original behavior)
            report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "report_file": p.name, "prediction_name": None, "run_ok": finished, "input_json": None}
        finished = bool("Workflow execution completed successfully!" in content)
        
        # Extract all prediction names from the HTML

known_minimal
	 report_2025-02-06_23-42.html -> Found 1 predictions: ['LIG_LIR_Gen_1_2L8J']
		Updating 1 rows for prediction: LIG_LIR_Gen_1_2L8J
	 report_2025-02-06_23-58.html -> Found 1 predictions: ['LIG_LIR_LC3C_4_3VVW']
		Updating 1 rows for prediction: LIG_LIR_LC3C_4_3VVW
	 report_2025-02-07_00-14.html -> Found 1 predictions: ['LIG_LIR_Nem_3_5AZG']
		Updating 1 rows for prediction: LIG_LIR_Nem_3_5AZG
	 report_2025-02-07_00-31.html -> Found 1 predictions: ['LIG_SH3_3_2GBQ']
		Updating 1 rows for prediction: LIG_SH3_3_2GBQ
	 report_2025-02-07_00-47.html -> Found 1 predictions: ['LIG_SH3_CIN85_PxpxPR_1_2BZ8']
		Updating 1 rows for prediction: LIG_SH3_CIN85_PxpxPR_1_2BZ8
	 report_2025-02-07_01-04.html -> Found 1 predictions: ['LIG_SPRY_1_2JK9']
		Updating 1 rows for prediction: LIG_SPRY_1_2JK9
	 report_2025-02-07_01-22.html -> Found 1 predictions: ['LIG_SUFU_1_4KMD']
		Updating 1 rows for prediction: LIG_SUFU_1_4KMD
	 report_2025-02-07_01-43.html -> Found 1 predictions: ['LIG_SUMO_SIM

In [5]:
# Displaying output
num_input = len(report_df[~report_df['input_json'].isna()])
num_output_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), ~report_df["report_file"].isna())])
num_output_total = len(report_df[~report_df['report_file'].isna()])
num_ok_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), report_df["run_ok"] == True)])
num_fail_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), report_df["run_ok"] == False)])
num_ok = len(report_df[report_df["run_ok"] == True])
num_fail = len(report_df[report_df["run_ok"] == False])

print(f"{num_output_with_input}/{num_input} of the scheduled structures have finished. {num_ok_with_input} were successful and {num_fail_with_input} failed")
if num_output_total != num_output_with_input:
    print(f"There are {num_output_total - num_output_with_input} reported runs which could not be identified. {num_ok - num_ok_with_input} of them were successful and {num_fail - num_fail_with_input} failed")
print(f"Benchmark sets: {set(report_df['benchmark_set'])}")
report_df

1904/1904 of the scheduled structures have finished. 1891 were successful and 13 failed
Benchmark sets: {'random_ddi', 'mutations', 'known_extension_old', 'known_extension', 'known_ddi', 'random_minimal', 'known_minimal'}


Unnamed: 0,benchmark_set,prediction_name,report_file,run_ok,input_json
0,known_minimal,LIG_HOMEOBOX_1B72,report_2025-02-05_13-18.html,True,LIG_HOMEOBOX_1B72.json
1,known_minimal,DOC_SPAK_OSR1_1_2V3S,report_2025-02-05_13-35.html,True,DOC_SPAK_OSR1_1_2V3S.json
2,known_minimal,DOC_USP7_MATH_1_3MQS,report_2025-02-05_13-51.html,True,DOC_USP7_MATH_1_3MQS.json
3,known_minimal,DOC_USP7_MATH_2_1YY6,report_2025-02-05_14-08.html,True,DOC_USP7_MATH_2_1YY6.json
4,known_minimal,DOC_USP7_UBL2_3_4YOC,report_2025-02-05_16-00.html,True,DOC_USP7_UBL2_3_4YOC.json
...,...,...,...,...,...
1899,random_ddi,D1PF14447_PF00179_3ZNI.D2PF14978_PF00327_5OOL,report_2025-02-11_07-40.html,True,D1PF14447_PF00179_3ZNI.D2PF14978_PF00327_5OOL....
1900,random_ddi,D1PF14978_PF00327_5OOL.D2PF15985_PF10175_6D6Q,report_2025-02-11_07-59.html,True,D1PF14978_PF00327_5OOL.D2PF15985_PF10175_6D6Q....
1901,random_ddi,D1PF15985_PF10175_6D6Q.D2PF17838_PF00071_3KZ1,report_2025-02-11_08-16.html,True,D1PF15985_PF10175_6D6Q.D2PF17838_PF00071_3KZ1....
1902,random_ddi,D1PF17838_PF00071_3KZ1.D2PF18773_PF00071_2X19,report_2025-02-11_08-34.html,True,D1PF17838_PF00071_3KZ1.D2PF18773_PF00071_2X19....


### 2 Parsing the AF output
Iterates over the nextflow output folders, reads the AF data and creates a tsv files containing all the metrics. On the way it checks for missing, corrupted or unexpected data using the report_df from section 1.

In [6]:
# LAST VERSION
dataAF = pd.DataFrame() # Holding the output metrics and metadata of the runs
missformed_outputs = pd.DataFrame(columns=["benchmark_set", "prediction_name", "model_seed", "reason"])
empty_outputs = pd.DataFrame(columns=["benchmark_set", "nextflow_name"])

for folder in benchmark_folders:
    benchmark_set = folder.name
    print(benchmark_set)
    nextflowFolders = [p for p in folder.iterdir() if p.is_dir()]
    
    for nextflowFolder in nextflowFolders:
        print("\t", f"{nextflowFolder.name:<30}", end=" -> ")
        
        # Check for the TSV metrics file (keeping original logic)
        if not (metricPath := (nextflowFolder / "alphafold3_metrics.tsv")).exists():
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set":benchmark_set, "nextflow_name": nextflowFolder.name}
            print("TSV file not found")
            continue
        
        # Read the TSV file (keeping original logic)
        metric_file = pd.read_csv(metricPath, delimiter="\t", header=0)
        metric_file["benchmark_set"] = benchmark_set
        metric_file["model_path"] = None
        
        # Check if TSV file is empty
        tsv_is_empty = not (metric_file.shape[0] >= 1)
        if tsv_is_empty:
            print("empty TSV file - will create metrics from folder structure")
        
        # Check if predictions/alphafold3 folder exists
        predictions_path = nextflowFolder / "predictions" / "alphafold3"
        if not predictions_path.exists():
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set": benchmark_set, "nextflow_name": nextflowFolder.name}
            print("predictions/alphafold3 folder not found")
            continue
        
        # Get all prediction name folders within predictions/alphafold3/
        prediction_folders = [p for p in predictions_path.iterdir() if p.is_dir()]
        
        if not prediction_folders:
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set": benchmark_set, "nextflow_name": nextflowFolder.name}
            print("no prediction folders found")
            continue
        
        # Process each prediction folder found in the directory structure
        for prediction_folder in prediction_folders:
            prediction_name = prediction_folder.name
            print(f" -> {prediction_name}")
            
            # Handle empty TSV files by creating metrics from folder structure
            if tsv_is_empty:
                prediction_metrics = pd.DataFrame()
            else:
                # Filter the metric_file for this specific prediction_name (if it exists in TSV)
                prediction_metrics = metric_file[metric_file["prediction_name"] == prediction_name].copy()
                
                # If this prediction_name is not in the TSV, skip it
                if prediction_metrics.empty:
                    missformed_outputs.loc[len(missformed_outputs)] = {
                        "benchmark_set": benchmark_set, 
                        "prediction_name": prediction_name, 
                        "reason": "prediction_name not found in TSV file"
                    }
                    continue
            
            # Check if the prediction folder exists (should exist since we got it from folder scan)
            if not prediction_folder.exists():
                missformed_outputs.loc[len(missformed_outputs)] = {
                    "benchmark_set": benchmark_set, 
                    "prediction_name": prediction_name, 
                    "reason": "prediction folder does not exist"
                }
                continue
            
            # Process model files for this prediction
            found_models = False
            
            # Look for CIF files in the nested structure: prediction_folder/seed_folder/model.cif
            cif_files = []
            seed_folders = [p for p in prediction_folder.iterdir() if p.is_dir()]
            print(f"\t\t\t Found {len(seed_folders)} seed folders: {[f.name for f in seed_folders]}")
            
            for seed_folder in seed_folders:
                # Look for any .cif file in the seed folder (there should be only one)
                cif_files_in_folder = [f for f in seed_folder.iterdir() if f.is_file() and f.suffix.lower() == '.cif']
                if cif_files_in_folder:
                    cif_files.append(cif_files_in_folder[0])  # Take the single .cif file
            
            print(f"\t\t\t Found {len(cif_files)} CIF files: {[f.parent.name for f in cif_files]}")
            
            for i, model_file in enumerate(cif_files):
                model_seed = model_file.parent.name
                
                if tsv_is_empty:
                    # Create metrics entry from scratch since TSV is empty
                    model_row = {
                        "prediction_name": prediction_name,
                        "model_id": f"ranked_{i}",
                        "benchmark_set": benchmark_set,
                        "model_path": model_file.relative_to(af3_runs_folder),
                        "ranking_score": 1.0 - (i * 0.1),  # Mock ranking score
                        # Add other default columns as needed
                    }
                    prediction_metrics = pd.concat([prediction_metrics, pd.DataFrame([model_row])], ignore_index=True)
                else:
                    # Use existing TSV logic
                    if len(prediction_metrics.loc[prediction_metrics["model_id"] == model_seed, ["model_path"]]) == 0:
                        missformed_outputs.loc[len(missformed_outputs)] = {
                            "benchmark_set": benchmark_set, 
                            "prediction_name": prediction_name, 
                            "model_seed": model_seed,
                            "reason": "model seed is not contained in tsv file"
                        }
                        continue
                    
                    prediction_metrics.loc[prediction_metrics["model_id"] == model_seed, ["model_path"]] = model_file.relative_to(af3_runs_folder)
                
                found_models = True
            
            if not found_models:
                print(f"\t\t\t WARNING: No CIF files found for {prediction_name}")
            
            if found_models:
                # Sort and rank the models for this prediction
                prediction_metrics.sort_values(by=['ranking_score'], ascending=False, ignore_index=True, inplace=True)
                prediction_metrics["model_id"] = prediction_metrics.apply(lambda r: f"ranked_{int(r.name)}", axis=1)
                dataAF = pd.concat([dataAF, prediction_metrics], ignore_index=True)

# Clean up columns
if 'project_name' in dataAF.columns:
    dataAF.drop(columns=["project_name"], inplace=True)

print(f"\nDEBUG: dataAF contains {len(dataAF)} rows before merge")
print(f"DEBUG: Unique prediction names in dataAF: {sorted(dataAF['prediction_name'].unique()) if not dataAF.empty else 'None'}")
print(f"DEBUG: Unique benchmark sets in dataAF: {sorted(dataAF['benchmark_set'].unique()) if not dataAF.empty else 'None'}")

# Also check if CIF files have model_path filled
if not dataAF.empty and 'model_path' in dataAF.columns:
    print(f"DEBUG: CIF files found: {dataAF['model_path'].notna().sum()} out of {len(dataAF)} entries")
    if dataAF['model_path'].notna().sum() > 0:
        print(f"DEBUG: Sample CIF paths: {dataAF[dataAF['model_path'].notna()]['model_path'].head(3).tolist()}")

# Reordering of the columns
if not dataAF.empty:
    c = list(dataAF.columns)
    columns_to_reorder = ["prediction_name", "model_preset", "benchmark_set", "ranking_score"]
    
    for col in columns_to_reorder:
        if col in c:
            c.remove(col)
    
    # Insert columns in desired order
    if "model_preset" in dataAF.columns:
        c.insert(0, "model_preset")
    if "benchmark_set" in dataAF.columns:
        c.insert(1, "benchmark_set") 
    if "prediction_name" in dataAF.columns:
        c.insert(2, "prediction_name")
    if "ranking_score" in dataAF.columns:
        c.insert(4, "ranking_score")
    
    dataAF = dataAF[c]

known_minimal
	 sharp_yonath                   ->  -> lig_lir_gen_1_2l8j
			 Found 5 seed folders: ['seed-338384_sample-0', 'seed-695630_sample-0', 'seed-632678_sample-0', 'seed-919743_sample-0', 'seed-58598_sample-0']
			 Found 5 CIF files: ['seed-338384_sample-0', 'seed-695630_sample-0', 'seed-632678_sample-0', 'seed-919743_sample-0', 'seed-58598_sample-0']
	 crazy_lumiere                  ->  -> lig_lir_lc3c_4_3vvw
			 Found 5 seed folders: ['seed-359412_sample-0', 'seed-605221_sample-0', 'seed-943108_sample-0', 'seed-325249_sample-0', 'seed-30132_sample-0']
			 Found 5 CIF files: ['seed-359412_sample-0', 'seed-605221_sample-0', 'seed-943108_sample-0', 'seed-325249_sample-0', 'seed-30132_sample-0']
	 disturbed_kare                 ->  -> lig_lir_nem_3_5azg
			 Found 5 seed folders: ['seed-558133_sample-0', 'seed-900166_sample-0', 'seed-859909_sample-0', 'seed-823358_sample-0', 'seed-78881_sample-0']
			 Found 5 CIF files: ['seed-558133_sample-0', 'seed-900166_sample-0', 'seed-859909

In [7]:
# Find missing structures and correct lower case names

report_df_ = report_df[~report_df["prediction_name"].isna()].copy() # Create copy to allow merging by lowercase prediction_name
report_df_["prediction_name_lower"] = report_df["prediction_name"].str.lower()

# Correcting lower case names
dataAF = pd.merge(
    left = dataAF,
    right = report_df_,
    how="outer", # Using outer to check for missing runs using report_df and filter in a later step
    left_on = ["benchmark_set", "prediction_name"],
    right_on = ["benchmark_set", "prediction_name_lower"],
    suffixes = ["", "_input"]
)

# Detecting missing outputs (= structures with input json but without a output nextflow folder)
missing_outputs = dataAF[np.logical_and(~dataAF["input_json"].isna(), dataAF["prediction_name"].isna())]
missing_outputs = missing_outputs[["benchmark_set", "prediction_name_input", "report_file", "run_ok", "input_json"]]
# Detect unidentified outputs (= output folders, which do not have a input.json)
unidentified_outputs = dataAF[dataAF["prediction_name_input"].isna()]
unidentified_outputs = dataAF[["benchmark_set", "prediction_name", "model_id"]]

# Filter to only include AF outputs and not input files
dataAF = dataAF[~dataAF["prediction_name"].isna()]
# Replacing AF prediction_name with the proper upper case variant
dataAF["prediction_name"] = dataAF["prediction_name_input"] 
# Drop the unnecessary columns added
dataAF.drop(columns=["prediction_name_input", "prediction_name_lower", "report_file", "run_ok", "input_json"], inplace=True) 
dataAF = dataAF.copy()

In [8]:
# Display the dataAF output and informations about potential errors
print(f"Currently {len(set(dataAF['prediction_name']))} valid AF output folders have been generated")
display(dataAF)
print("Processed files with errors or missing output")
display(missing_outputs)
print("Missformed outputs")
display(missformed_outputs)
print("Empty output folders")
display(empty_outputs)

Currently 1352 valid AF output folders have been generated


Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainA_intf_avg_plddt,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,model_path
0,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_0,0.28,113.0,189.0,0.04,0.0,0.20,...,58.43,62.30,60.44,12.0,13.0,26.0,184.0,18.66,0.04,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
1,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_1,0.25,113.0,189.0,0.04,0.0,0.16,...,60.24,57.70,59.02,13.0,12.0,26.0,204.0,21.40,0.05,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
2,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_2,0.22,113.0,189.0,0.04,0.0,0.13,...,57.70,57.81,57.76,13.0,14.0,28.0,202.0,23.16,0.05,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
3,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_3,0.19,113.0,189.0,0.04,0.0,0.10,...,46.30,57.75,52.58,14.0,17.0,38.0,286.0,25.10,0.04,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
4,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_4,0.17,113.0,189.0,0.04,0.0,0.07,...,32.83,47.38,40.71,11.0,13.0,21.0,133.0,27.70,0.02,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9511,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,4.0,312.0,0.02,0.0,0.91,...,84.04,96.90,94.45,4.0,17.0,26.0,239.0,3.50,0.31,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9512,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,4.0,312.0,0.01,0.0,0.90,...,83.70,97.70,95.04,4.0,17.0,26.0,233.0,3.60,0.31,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9513,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,4.0,312.0,0.02,0.0,0.90,...,83.24,96.88,94.40,4.0,18.0,28.0,246.0,3.65,0.30,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9514,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,4.0,312.0,0.01,0.0,0.90,...,83.07,97.64,94.86,4.0,17.0,26.0,232.0,3.60,0.30,AlphaFold_benchmark_DMI/random_minimal/gloomy_...


Processed files with errors or missing output


Unnamed: 0,benchmark_set,prediction_name_input,report_file,run_ok,input_json
6470,known_extension_old,LIG_WW_1_Mmin_DFL,report_2025-06-08_06-32.html,False,LIG_WW_1_Mmin_DFL.json


Missformed outputs


Unnamed: 0,benchmark_set,prediction_name,model_seed,reason


Empty output folders


Unnamed: 0,benchmark_set,nextflow_name
0,known_minimal,disturbed_lichterman


### 3 Output adjustments
The following sections corrent some errors and problems with the data

### 3a Undo chain naming by length
For the AF3 run, the chains have been given an ID by their length (chain A is shortest, chain B is longest). This is fine for DMI, but fails for DDI. Proper chain ID as in the solved structures is important for template depended metrics. The following cells reads the fasta files of the AF2 runs to switch chain A and chain B if necessary.

In [9]:
# Undos chain moving if necessary
# This function is based on a function of Joelle Strom in make_json_files.py (Last updated: 24.01.2025) to detect runs where the chain IDs have been switched
from pathlib import Path

def have_chains_been_switched(prediction_name: str, benchmark_set: str):
    """
        Modified function from make_json_files.py to detect switched chains

    """
    path_af2 = af2_folders[benchmark_set] / (af2_folders[benchmark_set].name + "_" + prediction_name + ".fasta")
    #path_af3 = af3_runs_folder/ benchmark_set / (prediction_name.upper() + ".json")


        # Automatically find the correct subdirectory under af3_runs_folder
    for subdir in af3_runs_folder.iterdir():
        candidate = subdir / benchmark_set / (prediction_name.upper() + ".json")
        if candidate.exists():
            path_af3 = candidate
            break
    else:
        raise FileNotFoundError(f"No JSON found for {prediction_name.upper()} in any subdir of {af3_runs_folder}")


    with open(path_af3, "r") as f:
        af3_input = json.load(f)
        af3_input_len1 = len(af3_input["sequences"][0]["protein"]["sequence"])
        af3_input_len2 = len(af3_input["sequences"][1]["protein"]["sequence"])
    if not path_af2.exists():
        print(f"Can't find {path_af2.name} in {path_af2.parent.parent.name}/{path_af2.parent.name}", end="")
        return None
    chains = {}
    with open(path_af2, "r") as f:
        fasta_contents = f.readlines()
    i = 0
    for line in fasta_contents:
        if re.search(">",line):
            new_chain = True
            i+=1
        else:
            new_chain = False
        if new_chain:
            id = str(i)
            sequence = []
        else:
            sequence.append(line.strip("\n"))
            sequence_str = "".join(sequence)
            chains[i] = sequence_str
    if not len(chains) == 2:
        print(f"{prediction_name} has an invalid chain length of {len(chains)}", end="")
        return None
    if (l1 := len(list(chains.values())[0])) == af3_input_len1:
        return False
    elif (l2 := len(list(chains.values())[1])) == af3_input_len2:
        return False
    elif l1 == l2:
        print("(same length) ", end="")
    else:
        return True

# Generate column in data frame if chains have been switched
dataAF["chains_flipped"] = None
for i, row in dataAF.iterrows():
    prediction_name, benchmark_set = row["prediction_name"], row["benchmark_set"]
    print(f"{f'{prediction_name} ({benchmark_set}) -> ':<80}", end="")
    chains_switched = have_chains_been_switched(prediction_name, benchmark_set)
    dataAF.at[i, "chains_flipped"] = chains_switched
    print(chains_switched)

PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133 (known_ddi) ->           True
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133 (known_ddi) ->           True
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133 (known_ddi) ->           True
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133 (known_ddi) ->           True
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133 (known_ddi) ->           True
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120 (known_ddi) ->           True
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120 (known_ddi) ->           True
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120 (known_ddi) ->           True
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120 (known_ddi) ->           True
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120 (known_ddi) ->           True
PF00059_PF00041_1TDQ_B_resi10_resi125.A_resi85_resi186 (known_ddi) ->           True
PF00059_PF00041_1TDQ_B_resi10_resi125.A_resi85_resi186 (known_ddi

KeyError: 'known_extension_old'

In [10]:
# Flip the metrics if necessary
for i, row in dataAF.iterrows():
    if not row["chains_flipped"]:
        continue
    dataAF.at[i, "chainA_length"], dataAF.at[i, "chainB_length"] = row["chainB_length"], row["chainA_length"]
    dataAF.at[i, "chainA_intf_avg_plddt"], dataAF.at[i, "chainB_intf_avg_plddt"] = row["chainB_intf_avg_plddt"], row["chainA_intf_avg_plddt"]
    dataAF.at[i, "num_chainA_intf_res"], dataAF.at[i, "num_chainB_intf_res"] = row["num_chainB_intf_res"], row["num_chainA_intf_res"]
    if row["model_id"] == "ranked_0":
        print(f"Modified {row['prediction_name']}")

Modified PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133
Modified PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120
Modified PF00059_PF00041_1TDQ_B_resi10_resi125.A_resi85_resi186
Modified PF00089_PF00095_1FLE_E_resi16_resi243.I_resi12_resi56
Modified PF00137_PF07850_6VQG_i_resi7_resi86.p_resi292_resi343
Modified PF00244_PF01161_3AXY_J_resi4_resi233.H_resi19_resi169
Modified PF00454_PF00017_2Y3A_A_resi794_resi1010.B_resi616_resi690
Modified PF00514_PF00104_3TX7_A_resi148_resi661.B_resi316_resi533
Modified PF00675_PF02271_1PP9_B_resi35_resi180.S_resi12_resi105
Modified PF00787_PF03643_5F0L_B_resi8_resi283.C_resi58_resi147
Modified PF00858_PF00087_7CFT_A_resi48_resi461.D_resi1_resi56
Modified PF00890_PF13085_1L0V_M_resi1_resi406.N_resi2_resi121
Modified PF02372_PF18707_4GS7_A_resi2_resi112.B_resi6_resi97
Modified PF03166_PF11409_1DEV_A_resi272_resi443.B_resi671_resi709
Modified PF04670_PF15454_6JWP_G_resi12_resi235.H_resi14_resi91
Modified PF05158_PF04801_6F40_M_resi71_resi26

### 4 Converting AF3 structure files (.cif) to pdb files
The following section loads the model.cif files in the dataAF table and exports them to the destination path. Already existing structures are skipped depending on the setting _export_skip_existing_structures_. The chains are flipped as described in the sections above

In [11]:
# Converting .cif to .pdb files 
try:
    dataAF
except NameError:
    raise Exception("Please first run the cells to get the dataAF frame")

# If this property is not set, pymol will ignore the alter commands on the ID when exporting
pymol.cmd.set("pdb_retain_ids", False)
# No interest to mess up with segments instead of chain IDs
pymol.cmd.set("ignore_pdb_segi", True)

if not export_destination.exists() or not export_destination.is_dir():
    raise Exception("Your destination path does not exist")

for index, row in dataAF.iterrows():
    prediction_file = af3_runs_folder / Path(row["model_path"])
    chains_flipped = row["chains_flipped"]
    if not prediction_file.exists():
        print(f"For {row['prediction_name']} does not exist at expected location ({prediction_file.resolve()})")
        continue

    structure_folder_dest: Path = (export_destination / ("DDI" if "ddi" in str(row['benchmark_set']).lower() else "DMI") / row["benchmark_set"] / row["prediction_name"])
    structure_folder_dest.mkdir(parents=True, exist_ok=True)

    if (structure_file_dest := structure_folder_dest / (str(row["model_id"]) + ".pdb")).exists() and export_skip_existing_structures:
        print(f"{row['prediction_name']}/{structure_file_dest.name} already processed. Skip")
        continue
    else:
        print(f"{row['prediction_name']}/{structure_file_dest.name} ->", "Flipping chains" if chains_flipped else "")

    pymol.cmd.load(prediction_file, prediction_file.stem)

    if chains_flipped: # Reorder chains
        pymol.cmd.alter(selection="chain A", expression="chain = 'C'")
        pymol.cmd.alter(selection="chain B", expression="chain = 'A'")
        pymol.cmd.alter(selection="chain C", expression="chain = 'B'")
        pymol.cmd.alter(selection="segi A", expression="segi = 'C'")
        pymol.cmd.alter(selection="segi B", expression="segi = 'A'")
        pymol.cmd.alter(selection="segi C", expression="segi = 'B'")
        pymol.cmd.sort()
        pymol.cmd.alter(selection="chain A", expression=f"ID = (int(ID) - {pymol.cmd.count_atoms('chain B')})")
        pymol.cmd.alter(selection="chain B", expression=f"ID = (int(ID) + {pymol.cmd.count_atoms('chain A')})")
        pymol.cmd.sort()

    pymol.cmd.save(structure_file_dest)
    for o in pymol.cmd.get_object_list():
        pymol.cmd.delete(o)


PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133/ranked_0.pdb -> Flipping chains
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133/ranked_1.pdb -> Flipping chains
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133/ranked_2.pdb -> Flipping chains
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133/ranked_3.pdb -> Flipping chains
PF00009_PF01873_2D74_A_resi12_resi200.B_resi21_resi133/ranked_4.pdb -> Flipping chains
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120/ranked_0.pdb -> Flipping chains
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120/ranked_1.pdb -> Flipping chains
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120/ranked_2.pdb -> Flipping chains
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120/ranked_3.pdb -> Flipping chains
PF00026_PF06394_1F34_A_resi13_resi326.B_resi62_resi120/ranked_4.pdb -> Flipping chains
PF00059_PF00041_1TDQ_B_resi10_resi125.A_resi85_resi186/ranked_0.pdb -> Flipping chains
PF00059_PF00041_1TDQ_B_resi10_resi125.A_res

In [None]:
# Helper cell: If pymol crashes, use this cell to reset pymol
for o in pymol.cmd.get_object_list():
        pymol.cmd.delete(o)

### 5 Reorder of columns

In [12]:
dataAF.columns

Index(['model_preset', 'benchmark_set', 'prediction_name', 'model_id',
       'ranking_score', 'chainA_length', 'chainB_length',
       'fraction_disordered', 'has_clash', 'iptm', 'ptm',
       'chainA_intf_avg_plddt', 'chainB_intf_avg_plddt', 'intf_avg_plddt',
       'num_chainA_intf_res', 'num_chainB_intf_res', 'num_res_res_contact',
       'num_atom_atom_contact', 'iPAE', 'pDockQ', 'model_path',
       'chains_flipped'],
      dtype='object')

In [13]:
c = list(dataAF.columns)
c.remove("model_path")
c.remove("chains_flipped")
c.append("chains_flipped")
c.append("model_path")

if "num_mutations" in c:
    c.remove("num_mutations")
    c.insert(4,"num_mutations")

dataAF = dataAF[c].copy()
dataAF

Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,chains_flipped,model_path
0,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_0,0.28,189.0,113.0,0.04,0.0,0.20,...,58.43,60.44,13.0,12.0,26.0,184.0,18.66,0.04,True,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
1,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_1,0.25,189.0,113.0,0.04,0.0,0.16,...,60.24,59.02,12.0,13.0,26.0,204.0,21.40,0.05,True,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
2,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_2,0.22,189.0,113.0,0.04,0.0,0.13,...,57.70,57.76,14.0,13.0,28.0,202.0,23.16,0.05,True,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
3,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_3,0.19,189.0,113.0,0.04,0.0,0.10,...,46.30,52.58,17.0,14.0,38.0,286.0,25.10,0.04,True,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
4,alphafold3,known_ddi,PF00009_PF01873_2D74_A_resi12_resi200.B_resi21...,ranked_4,0.17,189.0,113.0,0.04,0.0,0.07,...,32.83,40.71,13.0,11.0,21.0,133.0,27.70,0.02,True,AlphaFold_benchmark_DDI/known_ddi/suspicious_c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9511,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,4.0,312.0,0.02,0.0,0.91,...,96.90,94.45,4.0,17.0,26.0,239.0,3.50,0.31,,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9512,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,4.0,312.0,0.01,0.0,0.90,...,97.70,95.04,4.0,17.0,26.0,233.0,3.60,0.31,,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9513,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,4.0,312.0,0.02,0.0,0.90,...,96.88,94.40,4.0,18.0,28.0,246.0,3.65,0.30,,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
9514,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,4.0,312.0,0.01,0.0,0.90,...,97.64,94.86,4.0,17.0,26.0,232.0,3.60,0.30,,AlphaFold_benchmark_DMI/random_minimal/gloomy_...


### 6 Save metric file

In [14]:
# Sorting the file

# Filter out unwanted rows which is KNOWN_EXTENSION_OLD in this case 
dataAF = dataAF[dataAF["benchmark_set"] != "known_extension_old"]


dataAF["_benchmark_id"] = dataAF["benchmark_set"].replace({"known_minimal": "1", "random_minimal": "2", "mutations": "3","known_extension": "4", "known_ddi": "5", "random_ddi": "6"}).astype(int)
dataAF.sort_values(["_benchmark_id", "prediction_name", "model_id"], inplace=True)
dataAF.drop(columns=["_benchmark_id"], inplace=True)
dataAF.reset_index(drop=True, inplace=True)

dataAF

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataAF["_benchmark_id"] = dataAF["benchmark_set"].replace({"known_minimal": "1", "random_minimal": "2", "mutations": "3","known_extension": "4", "known_ddi": "5", "random_ddi": "6"}).astype(int)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataAF.sort_values(["_benchmark_id", "prediction_name", "model_id"], inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataAF.drop(columns=["_benchmark_id"], in

Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,chains_flipped,model_path
0,alphafold3,known_minimal,DEG_APCC_KENBOX_2_4GGD,ranked_0,0.97,5.0,312.0,0.02,0.0,0.96,...,96.21,94.54,4.0,15.0,25.0,252.0,1.85,0.20,,AlphaFold_benchmark_DMI/known_minimal/sharp_sh...
1,alphafold3,known_minimal,DEG_APCC_KENBOX_2_4GGD,ranked_1,0.97,5.0,312.0,0.02,0.0,0.96,...,96.23,94.20,5.0,15.0,26.0,263.0,1.85,0.20,,AlphaFold_benchmark_DMI/known_minimal/sharp_sh...
2,alphafold3,known_minimal,DEG_APCC_KENBOX_2_4GGD,ranked_2,0.96,5.0,312.0,0.02,0.0,0.96,...,96.14,93.49,5.0,14.0,27.0,280.0,2.15,0.20,,AlphaFold_benchmark_DMI/known_minimal/sharp_sh...
3,alphafold3,known_minimal,DEG_APCC_KENBOX_2_4GGD,ranked_3,0.96,5.0,312.0,0.02,0.0,0.95,...,95.48,92.56,5.0,15.0,26.0,261.0,1.90,0.15,,AlphaFold_benchmark_DMI/known_minimal/sharp_sh...
4,alphafold3,known_minimal,DEG_APCC_KENBOX_2_4GGD,ranked_4,0.96,5.0,312.0,0.02,0.0,0.95,...,95.73,93.03,5.0,15.0,27.0,271.0,1.95,0.19,,AlphaFold_benchmark_DMI/known_minimal/sharp_sh...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6690,alphafold3,random_ddi,D1PF18773_PF00071_2X19.D2PF00009_PF01873_2D74,ranked_0,0.36,60.0,113.0,0.22,0.0,0.19,...,64.94,59.98,12.0,13.0,24.0,130.0,12.30,0.04,,AlphaFold_benchmark_DDI/random_ddi/angry_sange...
6691,alphafold3,random_ddi,D1PF18773_PF00071_2X19.D2PF00009_PF01873_2D74,ranked_1,0.23,60.0,113.0,0.08,0.0,0.12,...,72.04,64.46,9.0,11.0,18.0,120.0,18.69,0.04,,AlphaFold_benchmark_DDI/random_ddi/angry_sange...
6692,alphafold3,random_ddi,D1PF18773_PF00071_2X19.D2PF00009_PF01873_2D74,ranked_2,0.22,60.0,113.0,0.14,0.0,0.07,...,51.87,50.40,8.0,11.0,20.0,141.0,22.10,0.03,,AlphaFold_benchmark_DDI/random_ddi/angry_sange...
6693,alphafold3,random_ddi,D1PF18773_PF00071_2X19.D2PF00009_PF01873_2D74,ranked_3,0.21,60.0,113.0,0.07,0.0,0.10,...,61.57,56.68,19.0,17.0,39.0,290.0,21.80,0.06,,AlphaFold_benchmark_DDI/random_ddi/angry_sange...


In [15]:
# Export metrics files
if not export_destination.exists() or not export_destination.is_dir():
    raise Exception("Your destination path is not valid")

dataAF.to_csv(export_destination / "AF3_output.tsv", sep="\t", index=False)
dataAF.to_excel(export_destination / "AF3_output.xlsx", sheet_name="AF3", index=False)

Need to load the metric file to recalculate some columns? Remove the comments on the following cell

In [None]:
# Load metrics file
dataAF = pd.read_csv(export_destination / "AF3_output.tsv", sep="\t")
dataAF