# Parsing output of AF3 from the cluster
created by Andreas 2025-02-04

Script to parse output from AF3 running on the IMB server cluster. Also detects errors on the runs using the reports created by Nextflow

This notebook supports to be run cell by cell from top to bottom. Runtime was around 35 minutes, but most (I estimate 90%) is comming from accessing the files on a network drive

### 0 Settings + Imports

Run the following lines and edited in the second cell the necessary paths / settings

Note: In this run the known_extensions have been excluded. To include them, change the code in the _# Loading the output folders_ cell in this section and the first cell of section 3

In [1]:
# Imports
from pathlib import Path
import pandas as pd
import numpy as np
import re
import json

import pymol
from Bio.PDB import PDBParser
from Bio.PDB.Structure import Structure as BioPy_PDBStructure
from Bio.PDB.Model import Model as BioPy_PDBModel
from Bio.PDB.PDBExceptions import PDBConstructionException
parser = PDBParser(QUIET=True)

ressources_path = Path("../ressources").resolve()


In [2]:
# Settings

# The base folder of the AF output. The AF3 files are searched inside /Alpha
luck_drive_folder = Path("/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold") 

# The folder to export the output
export_destination = Path("/Users/imb/Desktop")

# Set this option to skip existing structures
export_skip_existing_structures = False

In [9]:
# Loading the output folders
# Note: The known_extension structures should not be considered, so exclude them explicitly
af3_runs_folder = luck_drive_folder / "AlphaFold3"
DMI_folders = [af3_runs_folder / "AlphaFold_benchmark_DMI"/"random_minimal"]
#DDI_folders = [p for p in (af3_runs_folder / "AlphaFold_benchmark_DDI").iterdir() if p.is_dir()]
benchmark_folders = DMI_folders #+ DDI_folders

# Load also the AF2 folder to get the original input files without switched chains to undo the chain switching
af2_folders = { #"known_minimal": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run37", # DMI known minimals
                "random_minimal": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run38", # DMI random minimals
                #"known_extension": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run60", # DMI known extensions
                #"mutations": luck_drive_folder / "AlphaFold_benchmark_DMI" / "run43", # DMI mutations
                #"known_ddi": luck_drive_folder / "AlphaFold_benchmark_DDI" / "run5", # DDI known DDI
                #"random_ddi": luck_drive_folder / "AlphaFold_benchmark_DDI" / "run6", # DDI random DDI
}

print("Folders with AF3 output:")
for p in benchmark_folders:
    if not p.exists():
        print(f"{p} does not exist")
    else:
        print(p)
        
print("Folders with AF2 output")
for p in af2_folders.values():
    if not p.exists():
        print(f"{p} does not exist")
    else:
        print(p)


Folders with AF3 output:
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold3/AlphaFold_benchmark_DMI/random_minimal
Folders with AF2 output
/Volumes/imb-luckgr/imb-luckgr2/projects/AlphaFold/AlphaFold_benchmark_DMI/run38


### 1 Scanning input .json files and report.html files
Scans for all input .json files and corresponding report_%time%.html files to find failed runs

* **benchmark_set** refers to the pairing method (mutated, randomized, ...) and is equal to the folder name (example: known_minimal)
* **prediction_name** is extracted from a) the json file or b) the report_%time%.html file. If the value can't be extracted from report_%time%.html, it is set to None
* **report_file** refers to the name of the report_%time%.html file and is None if the json input file could not been matched with a report file (--> input json has not been run on the cluster \[yet\])
* **run_ok** refers to if there had been an error running the input file on the server. Set to None if the input file has not been run on the cluster.
* **input_json** is set to the filename of the input json or to None if the report_%time%.html could not be matched with a input json file.

Use <i>report_file == None</i> or <i>run_ok == None</i> to identify scheduled but not yet run structures.<br>Use <i>input_json == None</i> to find runs without a input file

In [10]:
# Scanning input and report files
report_df = pd.DataFrame(columns=["benchmark_set", "prediction_name", "report_file", "run_ok", "input_json"])


for folder in benchmark_folders:
    benchmark_set = folder.name
    print(benchmark_set)
    # Ignore the f.is_file() check in the following lines as it will decrease performance massively on a network drive (example runtime: 0.9s --> 1m 25.6 s (!))
    nextflow_inputs = [f for f in folder.iterdir() if f.suffix.lower() == ".json"]
    for nextflow_input in nextflow_inputs:
        prediction_name = nextflow_input.stem
        report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "prediction_name": prediction_name, "report_file": None, "run_ok": None, "input_json": nextflow_input.name}
    
    for p in [f for f in folder.iterdir() if  "report_" in f.stem and f.suffix.lower() == ".html"]:
        print("\t", p.name, end=" ")
        with open(p) as f:
            content = f.read()
        
        # Extract all prediction names from the HTML
        prediction_names = []
        finished = bool("Workflow execution completed successfully!" in content)
        
        # Pattern 1: [id:[LIST_OF_NAMES]] - extract all names from the list
        list_matches = re.findall(r"\[id:\[([^\]]+)\]\]", content)
        for match in list_matches:
            # Split by comma and clean up each name
            names = [name.strip() for name in match.split(',')]
            prediction_names.extend(names)
        
        # Pattern 2: [id:single_name, model:something] - extract single names
        single_matches = re.findall(r"\[id:([^,\]]+)(?:,|\])", content)
        prediction_names.extend(single_matches)
        
        # Clean up prediction names: remove leading brackets, strip whitespace, normalize case
        cleaned_names = []
        for name in prediction_names:
            # Remove leading bracket if present
            cleaned_name = name.lstrip('[').strip()
            if cleaned_name:  # Only add non-empty names
                cleaned_names.append(cleaned_name)
        
        # Remove duplicates by converting to lowercase, then keep original case for first occurrence
        seen_lower = set()
        unique_names = []
        for name in cleaned_names:
            name_lower = name.lower()
            if name_lower not in seen_lower:
                seen_lower.add(name_lower)
                unique_names.append(name)
        
        prediction_names = unique_names
        
        if prediction_names:
            print(f"-> Found {len(prediction_names)} predictions: {prediction_names[:3]}{'...' if len(prediction_names) > 3 else ''}")
            
            # Track how many predictions we successfully matched
            matched_predictions = 0
            
            # Update all matching rows for each prediction found in this report
            for prediction_name in prediction_names:
                # Case-insensitive matching: compare lowercase versions
                matching_mask = np.logical_and(
                    report_df["benchmark_set"] == benchmark_set, 
                    report_df["prediction_name"].str.lower() == prediction_name.lower()
                )
                num_matches = len(report_df.loc[matching_mask])
                
                if num_matches > 0:
                    matched_predictions += 1
                    print(f"\t\tUpdating {num_matches} rows for prediction: {prediction_name}")
                    report_df.loc[matching_mask, "run_ok"] = finished
                    report_df.loc[matching_mask, "report_file"] = p.name
                # Note: We don't create new rows for unmatched predictions to avoid duplicates
            
            # If no predictions matched any JSON files, create a single row with None
            if matched_predictions == 0:
                print("\t\tNo JSON files found for any predictions in this report")
                report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "report_file": p.name, "prediction_name": None, "run_ok": finished, "input_json": None}
                
        else:
            print("-> No predictions found")
            # Create entry with None prediction_name (same as original behavior)
            report_df.loc[len(report_df)] = {"benchmark_set": benchmark_set, "report_file": p.name, "prediction_name": None, "run_ok": finished, "input_json": None}
        finished = bool("Workflow execution completed successfully!" in content)
        
        # Extract all prediction names from the HTML

random_minimal
	 report_2025-02-12_23-07.html -> Found 1 predictions: ['MDEG_SCF_COI1_1_3OGL.DLIG_CaM_IQ_9_2IX7']
		Updating 1 rows for prediction: MDEG_SCF_COI1_1_3OGL.DLIG_CaM_IQ_9_2IX7
	 report_2025-02-12_23-26.html -> Found 1 predictions: ['MDEG_SCF_FBXO31_1_5VZU.DLIG_TRAF2_2_1CZY']
		Updating 1 rows for prediction: MDEG_SCF_FBXO31_1_5VZU.DLIG_TRAF2_2_1CZY
	 report_2025-02-12_23-44.html -> Found 1 predictions: ['MDEG_SCF_TIR1_1_2P1Q.DLIG_ULM_U2AF65_1_1O0P']
		Updating 1 rows for prediction: MDEG_SCF_TIR1_1_2P1Q.DLIG_ULM_U2AF65_1_1O0P
	 report_2025-02-13_00-02.html -> Found 1 predictions: ['MDEG_SIAH_1_2A25.DLIG_CAP-Gly_2_3RDV']
		Updating 1 rows for prediction: MDEG_SIAH_1_2A25.DLIG_CAP-Gly_2_3RDV
	 report_2025-02-13_00-18.html -> Found 1 predictions: ['MDEG_SPOP_SBC_1_3HQM.DDOC_MIT_MIM_1_2JQ9']
		Updating 1 rows for prediction: MDEG_SPOP_SBC_1_3HQM.DDOC_MIT_MIM_1_2JQ9
	 report_2025-02-13_00-35.html -> Found 1 predictions: ['MDOC_AGCK_PIF_3_1ATP.DDEG_Kelch_Keap1_1_2FLU']
		Updating

In [11]:
# Displaying output
num_input = len(report_df[~report_df['input_json'].isna()])
num_output_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), ~report_df["report_file"].isna())])
num_output_total = len(report_df[~report_df['report_file'].isna()])
num_ok_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), report_df["run_ok"] == True)])
num_fail_with_input = len(report_df[np.logical_and(~report_df["input_json"].isna(), report_df["run_ok"] == False)])
num_ok = len(report_df[report_df["run_ok"] == True])
num_fail = len(report_df[report_df["run_ok"] == False])

print(f"{num_output_with_input}/{num_input} of the scheduled structures have finished. {num_ok_with_input} were successful and {num_fail_with_input} failed")
if num_output_total != num_output_with_input:
    print(f"There are {num_output_total - num_output_with_input} reported runs which could not be identified. {num_ok - num_ok_with_input} of them were successful and {num_fail - num_fail_with_input} failed")
print(f"Benchmark sets: {set(report_df['benchmark_set'])}")
report_df

136/136 of the scheduled structures have finished. 136 were successful and 0 failed
Benchmark sets: {'random_minimal'}


Unnamed: 0,benchmark_set,prediction_name,report_file,run_ok,input_json
0,random_minimal,MDOC_CYCLIN_RxL_1_1H25.DLIG_SPRY_1_2JK9,report_2025-02-11_14-07.html,True,MDOC_CYCLIN_RxL_1_1H25.DLIG_SPRY_1_2JK9.json
1,random_minimal,MDOC_GSK3_Axin_1_1O9U.DDEG_SPOP_SBC_1_3HQM,report_2025-02-11_15-06.html,True,MDOC_GSK3_Axin_1_1O9U.DDEG_SPOP_SBC_1_3HQM.json
2,random_minimal,MDOC_MAPK_DCC_7_2B9J.DLIG_ANK_PxLPxL_1_3UXG,report_2025-02-11_15-23.html,True,MDOC_MAPK_DCC_7_2B9J.DLIG_ANK_PxLPxL_1_3UXG.json
3,random_minimal,MDOC_MAPK_GRA24_9_5ETA.DLIG_PCNA_APIM_2_5MLW,report_2025-02-11_15-47.html,True,MDOC_MAPK_GRA24_9_5ETA.DLIG_PCNA_APIM_2_5MLW.json
4,random_minimal,MDOC_MAPK_HePTP_8_2GPH.DLIG_PCNA_yPIPBox_3_1SXJ,report_2025-02-11_16-05.html,True,MDOC_MAPK_HePTP_8_2GPH.DLIG_PCNA_yPIPBox_3_1SX...
...,...,...,...,...,...
131,random_minimal,MLIG_LIR_LC3C_4_3VVW.DDOC_MIT_MIM_1_2JQ9,report_2025-02-13_21-21.html,True,MLIG_LIR_LC3C_4_3VVW.DDOC_MIT_MIM_1_2JQ9.json
132,random_minimal,MLIG_LIR_Nem_3_5AZG.DLIG_ULM_U2AF65_1_1O0P,report_2025-02-13_21-37.html,True,MLIG_LIR_Nem_3_5AZG.DLIG_ULM_U2AF65_1_1O0P.json
133,random_minimal,MLIG_LRP6_Inhibitor_1_3SOQ.DTRG_NES_CRM1_1_3GB8,report_2025-02-13_21-54.html,True,MLIG_LRP6_Inhibitor_1_3SOQ.DTRG_NES_CRM1_1_3GB...
134,random_minimal,MLIG_LSD1_SNAG_1_2Y48.DLIG_EF_ALG2_ABM_1_2ZNE,report_2025-02-13_22-11.html,True,MLIG_LSD1_SNAG_1_2Y48.DLIG_EF_ALG2_ABM_1_2ZNE....


### 2 Parsing the AF output
Iterates over the nextflow output folders, reads the AF data and creates a tsv files containing all the metrics. On the way it checks for missing, corrupted or unexpected data using the report_df from section 1.

In [12]:
# LAST VERSION
dataAF = pd.DataFrame() # Holding the output metrics and metadata of the runs
missformed_outputs = pd.DataFrame(columns=["benchmark_set", "prediction_name", "model_seed", "reason"])
empty_outputs = pd.DataFrame(columns=["benchmark_set", "nextflow_name"])

for folder in benchmark_folders:
    benchmark_set = folder.name
    print(benchmark_set)
    nextflowFolders = [p for p in folder.iterdir() if p.is_dir()]
    
    for nextflowFolder in nextflowFolders:
        print("\t", f"{nextflowFolder.name:<30}", end=" -> ")
        
        # Check for the TSV metrics file (keeping original logic)
        if not (metricPath := (nextflowFolder / "alphafold3_metrics.tsv")).exists():
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set":benchmark_set, "nextflow_name": nextflowFolder.name}
            print("TSV file not found")
            continue
        
        # Read the TSV file (keeping original logic)
        metric_file = pd.read_csv(metricPath, delimiter="\t", header=0)
        metric_file["benchmark_set"] = benchmark_set
        metric_file["model_path"] = None
        
        # Check if TSV file is empty
        tsv_is_empty = not (metric_file.shape[0] >= 1)
        if tsv_is_empty:
            print("empty TSV file - will create metrics from folder structure")
        
        # Check if predictions/alphafold3 folder exists
        predictions_path = nextflowFolder / "predictions" / "alphafold3"
        if not predictions_path.exists():
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set": benchmark_set, "nextflow_name": nextflowFolder.name}
            print("predictions/alphafold3 folder not found")
            continue
        
        # Get all prediction name folders within predictions/alphafold3/
        prediction_folders = [p for p in predictions_path.iterdir() if p.is_dir()]
        
        if not prediction_folders:
            empty_outputs.loc[len(empty_outputs)] = {"benchmark_set": benchmark_set, "nextflow_name": nextflowFolder.name}
            print("no prediction folders found")
            continue
        
        # Process each prediction folder found in the directory structure
        for prediction_folder in prediction_folders:
            prediction_name = prediction_folder.name
            print(f" -> {prediction_name}")
            
            # Handle empty TSV files by creating metrics from folder structure
            if tsv_is_empty:
                prediction_metrics = pd.DataFrame()
            else:
                # Filter the metric_file for this specific prediction_name (if it exists in TSV)
                prediction_metrics = metric_file[metric_file["prediction_name"] == prediction_name].copy()
                
                # If this prediction_name is not in the TSV, skip it
                if prediction_metrics.empty:
                    missformed_outputs.loc[len(missformed_outputs)] = {
                        "benchmark_set": benchmark_set, 
                        "prediction_name": prediction_name, 
                        "reason": "prediction_name not found in TSV file"
                    }
                    continue
            
            # Check if the prediction folder exists (should exist since we got it from folder scan)
            if not prediction_folder.exists():
                missformed_outputs.loc[len(missformed_outputs)] = {
                    "benchmark_set": benchmark_set, 
                    "prediction_name": prediction_name, 
                    "reason": "prediction folder does not exist"
                }
                continue
            
            # Process model files for this prediction
            found_models = False
            
            # Look for CIF files in the nested structure: prediction_folder/seed_folder/model.cif
            cif_files = []
            seed_folders = [p for p in prediction_folder.iterdir() if p.is_dir()]
            print(f"\t\t\t Found {len(seed_folders)} seed folders: {[f.name for f in seed_folders]}")
            
            for seed_folder in seed_folders:
                # Look for any .cif file in the seed folder (there should be only one)
                cif_files_in_folder = [f for f in seed_folder.iterdir() if f.is_file() and f.suffix.lower() == '.cif']
                if cif_files_in_folder:
                    cif_files.append(cif_files_in_folder[0])  # Take the single .cif file
            
            print(f"\t\t\t Found {len(cif_files)} CIF files: {[f.parent.name for f in cif_files]}")
            
            for i, model_file in enumerate(cif_files):
                model_seed = model_file.parent.name
                
                if tsv_is_empty:
                    # Create metrics entry from scratch since TSV is empty
                    model_row = {
                        "prediction_name": prediction_name,
                        "model_id": f"ranked_{i}",
                        "benchmark_set": benchmark_set,
                        "model_path": model_file.relative_to(af3_runs_folder),
                        "ranking_score": 1.0 - (i * 0.1),  # Mock ranking score
                        # Add other default columns as needed
                    }
                    prediction_metrics = pd.concat([prediction_metrics, pd.DataFrame([model_row])], ignore_index=True)
                else:
                    # Use existing TSV logic
                    if len(prediction_metrics.loc[prediction_metrics["model_id"] == model_seed, ["model_path"]]) == 0:
                        missformed_outputs.loc[len(missformed_outputs)] = {
                            "benchmark_set": benchmark_set, 
                            "prediction_name": prediction_name, 
                            "model_seed": model_seed,
                            "reason": "model seed is not contained in tsv file"
                        }
                        continue
                    
                    prediction_metrics.loc[prediction_metrics["model_id"] == model_seed, ["model_path"]] = model_file.relative_to(af3_runs_folder)
                
                found_models = True
            
            if not found_models:
                print(f"\t\t\t WARNING: No CIF files found for {prediction_name}")
            
            if found_models:
                # Sort and rank the models for this prediction
                prediction_metrics.sort_values(by=['ranking_score'], ascending=False, ignore_index=True, inplace=True)
                prediction_metrics["model_id"] = prediction_metrics.apply(lambda r: f"ranked_{int(r.name)}", axis=1)
                dataAF = pd.concat([dataAF, prediction_metrics], ignore_index=True)

# Clean up columns
if 'project_name' in dataAF.columns:
    dataAF.drop(columns=["project_name"], inplace=True)

print(f"\nDEBUG: dataAF contains {len(dataAF)} rows before merge")
print(f"DEBUG: Unique prediction names in dataAF: {sorted(dataAF['prediction_name'].unique()) if not dataAF.empty else 'None'}")
print(f"DEBUG: Unique benchmark sets in dataAF: {sorted(dataAF['benchmark_set'].unique()) if not dataAF.empty else 'None'}")

# Also check if CIF files have model_path filled
if not dataAF.empty and 'model_path' in dataAF.columns:
    print(f"DEBUG: CIF files found: {dataAF['model_path'].notna().sum()} out of {len(dataAF)} entries")
    if dataAF['model_path'].notna().sum() > 0:
        print(f"DEBUG: Sample CIF paths: {dataAF[dataAF['model_path'].notna()]['model_path'].head(3).tolist()}")

# Reordering of the columns
if not dataAF.empty:
    c = list(dataAF.columns)
    columns_to_reorder = ["prediction_name", "model_preset", "benchmark_set", "ranking_score"]
    
    for col in columns_to_reorder:
        if col in c:
            c.remove(col)
    
    # Insert columns in desired order
    if "model_preset" in dataAF.columns:
        c.insert(0, "model_preset")
    if "benchmark_set" in dataAF.columns:
        c.insert(1, "benchmark_set") 
    if "prediction_name" in dataAF.columns:
        c.insert(2, "prediction_name")
    if "ranking_score" in dataAF.columns:
        c.insert(4, "ranking_score")
    
    dataAF = dataAF[c]

random_minimal
	 cheesy_mccarthy                ->  -> mdeg_scf_coi1_1_3ogl.dlig_cam_iq_9_2ix7
			 Found 5 seed folders: ['seed-916549_sample-0', 'seed-121809_sample-0', 'seed-359724_sample-0', 'seed-384611_sample-0', 'seed-179470_sample-0']
			 Found 5 CIF files: ['seed-916549_sample-0', 'seed-121809_sample-0', 'seed-359724_sample-0', 'seed-384611_sample-0', 'seed-179470_sample-0']
	 adoring_fermi                  ->  -> mdeg_scf_fbxo31_1_5vzu.dlig_traf2_2_1czy
			 Found 5 seed folders: ['seed-941496_sample-0', 'seed-760903_sample-0', 'seed-894092_sample-0', 'seed-189812_sample-0', 'seed-756800_sample-0']
			 Found 5 CIF files: ['seed-941496_sample-0', 'seed-760903_sample-0', 'seed-894092_sample-0', 'seed-189812_sample-0', 'seed-756800_sample-0']
	 happy_minsky                   ->  -> mdeg_scf_tir1_1_2p1q.dlig_ulm_u2af65_1_1o0p
			 Found 5 seed folders: ['seed-721401_sample-0', 'seed-945120_sample-0', 'seed-548855_sample-0', 'seed-777378_sample-0', 'seed-231306_sample-0']
			 Found 5

In [13]:
# Find missing structures and correct lower case names

report_df_ = report_df[~report_df["prediction_name"].isna()].copy() # Create copy to allow merging by lowercase prediction_name
report_df_["prediction_name_lower"] = report_df["prediction_name"].str.lower()

# Correcting lower case names
dataAF = pd.merge(
    left = dataAF,
    right = report_df_,
    how="outer", # Using outer to check for missing runs using report_df and filter in a later step
    left_on = ["benchmark_set", "prediction_name"],
    right_on = ["benchmark_set", "prediction_name_lower"],
    suffixes = ["", "_input"]
)

# Detecting missing outputs (= structures with input json but without a output nextflow folder)
missing_outputs = dataAF[np.logical_and(~dataAF["input_json"].isna(), dataAF["prediction_name"].isna())]
missing_outputs = missing_outputs[["benchmark_set", "prediction_name_input", "report_file", "run_ok", "input_json"]]
# Detect unidentified outputs (= output folders, which do not have a input.json)
unidentified_outputs = dataAF[dataAF["prediction_name_input"].isna()]
unidentified_outputs = dataAF[["benchmark_set", "prediction_name", "model_id"]]

# Filter to only include AF outputs and not input files
dataAF = dataAF[~dataAF["prediction_name"].isna()]
# Replacing AF prediction_name with the proper upper case variant
dataAF["prediction_name"] = dataAF["prediction_name_input"] 
# Drop the unnecessary columns added
dataAF.drop(columns=["prediction_name_input", "prediction_name_lower", "report_file", "run_ok", "input_json"], inplace=True) 
dataAF = dataAF.copy()

In [14]:
# Display the dataAF output and informations about potential errors
print(f"Currently {len(set(dataAF['prediction_name']))} valid AF output folders have been generated")
display(dataAF)
print("Processed files with errors or missing output")
display(missing_outputs)
print("Missformed outputs")
display(missformed_outputs)
print("Empty output folders")
display(empty_outputs)

Currently 136 valid AF output folders have been generated


Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainA_intf_avg_plddt,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,model_path
0,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_0,0.65,5,237,0.02,0.0,0.57,...,49.49,83.95,74.38,5,13,20,178,10.60,0.07,AlphaFold_benchmark_DMI/random_minimal/distrac...
1,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_1,0.65,5,237,0.02,0.0,0.56,...,48.41,84.70,74.62,5,13,21,224,11.00,0.09,AlphaFold_benchmark_DMI/random_minimal/distrac...
2,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_2,0.65,5,237,0.02,0.0,0.56,...,49.80,85.32,75.97,5,14,21,196,10.80,0.08,AlphaFold_benchmark_DMI/random_minimal/distrac...
3,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_3,0.64,5,237,0.02,0.0,0.56,...,50.50,84.09,75.69,5,15,22,199,11.80,0.08,AlphaFold_benchmark_DMI/random_minimal/distrac...
4,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_4,0.62,5,237,0.02,0.0,0.53,...,48.32,83.69,73.87,5,13,21,206,11.90,0.08,AlphaFold_benchmark_DMI/random_minimal/distrac...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,4,312,0.02,0.0,0.91,...,84.04,96.90,94.45,4,17,26,239,3.50,0.31,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
676,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,4,312,0.01,0.0,0.90,...,83.70,97.70,95.04,4,17,26,233,3.60,0.31,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
677,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,4,312,0.02,0.0,0.90,...,83.24,96.88,94.40,4,18,28,246,3.65,0.30,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
678,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,4,312,0.01,0.0,0.90,...,83.07,97.64,94.86,4,17,26,232,3.60,0.30,AlphaFold_benchmark_DMI/random_minimal/gloomy_...


Processed files with errors or missing output


Unnamed: 0,benchmark_set,prediction_name_input,report_file,run_ok,input_json


Missformed outputs


Unnamed: 0,benchmark_set,prediction_name,model_seed,reason


Empty output folders


Unnamed: 0,benchmark_set,nextflow_name


### 3 Output adjustments
The following sections corrent some errors and problems with the data

### 3a Undo chain naming by length
For the AF3 run, the chains have been given an ID by their length (chain A is shortest, chain B is longest). This is fine for DMI, but fails for DDI. Proper chain ID as in the solved structures is important for template depended metrics. The following cells reads the fasta files of the AF2 runs to switch chain A and chain B if necessary.

In [15]:
# Undos chain moving if necessary
# This function is based on a function of Joelle Strom in make_json_files.py (Last updated: 24.01.2025) to detect runs where the chain IDs have been switched
from pathlib import Path

def have_chains_been_switched(prediction_name: str, benchmark_set: str):
    """
        Modified function from make_json_files.py to detect switched chains

    """
    path_af2 = af2_folders[benchmark_set] / (af2_folders[benchmark_set].name + "_" + prediction_name + ".fasta")
    #path_af3 = af3_runs_folder/ benchmark_set / (prediction_name.upper() + ".json")


        # Automatically find the correct subdirectory under af3_runs_folder
    for subdir in af3_runs_folder.iterdir():
        candidate = subdir / benchmark_set / (prediction_name.upper() + ".json")
        if candidate.exists():
            path_af3 = candidate
            break
    else:
        raise FileNotFoundError(f"No JSON found for {prediction_name.upper()} in any subdir of {af3_runs_folder}")


    with open(path_af3, "r") as f:
        af3_input = json.load(f)
        af3_input_len1 = len(af3_input["sequences"][0]["protein"]["sequence"])
        af3_input_len2 = len(af3_input["sequences"][1]["protein"]["sequence"])
    if not path_af2.exists():
        print(f"Can't find {path_af2.name} in {path_af2.parent.parent.name}/{path_af2.parent.name}", end="")
        return None
    chains = {}
    with open(path_af2, "r") as f:
        fasta_contents = f.readlines()
    i = 0
    for line in fasta_contents:
        if re.search(">",line):
            new_chain = True
            i+=1
        else:
            new_chain = False
        if new_chain:
            id = str(i)
            sequence = []
        else:
            sequence.append(line.strip("\n"))
            sequence_str = "".join(sequence)
            chains[i] = sequence_str
    if not len(chains) == 2:
        print(f"{prediction_name} has an invalid chain length of {len(chains)}", end="")
        return None
    if (l1 := len(list(chains.values())[0])) == af3_input_len1:
        return False
    elif (l2 := len(list(chains.values())[1])) == af3_input_len2:
        return False
    elif l1 == l2:
        print("(same length) ", end="")
    else:
        return True

# Generate column in data frame if chains have been switched
dataAF["chains_flipped"] = None
for i, row in dataAF.iterrows():
    prediction_name, benchmark_set = row["prediction_name"], row["benchmark_set"]
    print(f"{f'{prediction_name} ({benchmark_set}) -> ':<80}", end="")
    chains_switched = have_chains_been_switched(prediction_name, benchmark_set)
    dataAF.at[i, "chains_flipped"] = chains_switched
    print(chains_switched)

MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30 (random_minimal) ->           True
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30 (random_minimal) ->           True
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30 (random_minimal) ->           True
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30 (random_minimal) ->           True
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30 (random_minimal) ->           True
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS (random_minimal) ->                      True
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS (random_minimal) ->                      True
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS (random_minimal) ->                      True
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS (random_minimal) ->                      True
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS (random_minimal) ->                      True
MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0 (random_minimal) ->              True
MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0 (random_minimal) -

In [16]:
# Flip the metrics if necessary
for i, row in dataAF.iterrows():
    if not row["chains_flipped"]:
        continue
    dataAF.at[i, "chainA_length"], dataAF.at[i, "chainB_length"] = row["chainB_length"], row["chainA_length"]
    dataAF.at[i, "chainA_intf_avg_plddt"], dataAF.at[i, "chainB_intf_avg_plddt"] = row["chainB_intf_avg_plddt"], row["chainA_intf_avg_plddt"]
    dataAF.at[i, "num_chainA_intf_res"], dataAF.at[i, "num_chainB_intf_res"] = row["num_chainB_intf_res"], row["num_chainA_intf_res"]
    if row["model_id"] == "ranked_0":
        print(f"Modified {row['prediction_name']}")

Modified MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30
Modified MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS
Modified MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0
Modified MDEG_Kelch_Keap1_2_3WN7.DDOC_USP7_MATH_2_1YY6
Modified MDEG_MDM2_SWIB_1_1YCR.DLIG_PCNA_APIM_2_5MLW
Modified MDEG_SCF_COI1_1_3OGL.DLIG_CaM_IQ_9_2IX7
Modified MDEG_SCF_FBXO31_1_5VZU.DLIG_TRAF2_2_1CZY
Modified MDEG_SCF_TIR1_1_2P1Q.DLIG_ULM_U2AF65_1_1O0P
Modified MDEG_SIAH_1_2A25.DLIG_CAP-Gly_2_3RDV
Modified MDEG_SPOP_SBC_1_3HQM.DDOC_MIT_MIM_1_2JQ9
Modified MDOC_AGCK_PIF_3_1ATP.DDEG_Kelch_Keap1_1_2FLU
Modified MDOC_ANK_TNKS_1_3TWU.DLIG_CAP-Gly_1_2PZO
Modified MDOC_CDC14_PxL_1_6G84.DDOC_PP1_SILK_1_2O8G
Modified MDOC_CYCLIN_RxL_1_1H25.DLIG_SPRY_1_2JK9
Modified MDOC_GSK3_Axin_1_1O9U.DDEG_SPOP_SBC_1_3HQM
Modified MDOC_MAPK_DCC_7_2B9J.DLIG_ANK_PxLPxL_1_3UXG
Modified MDOC_MAPK_GRA24_9_5ETA.DLIG_PCNA_APIM_2_5MLW
Modified MDOC_MAPK_HePTP_8_2GPH.DLIG_PCNA_yPIPBox_3_1SXJ
Modified MDOC_MAPK_JIP1_4_4H3B.DLIG_PAM2_1_1JGN
Modified MDO

### 4 Converting AF3 structure files (.cif) to pdb files
The following section loads the model.cif files in the dataAF table and exports them to the destination path. Already existing structures are skipped depending on the setting _export_skip_existing_structures_. The chains are flipped as described in the sections above

In [17]:
# Converting .cif to .pdb files 
try:
    dataAF
except NameError:
    raise Exception("Please first run the cells to get the dataAF frame")

# If this property is not set, pymol will ignore the alter commands on the ID when exporting
pymol.cmd.set("pdb_retain_ids", False)
# No interest to mess up with segments instead of chain IDs
pymol.cmd.set("ignore_pdb_segi", True)

if not export_destination.exists() or not export_destination.is_dir():
    raise Exception("Your destination path does not exist")

for index, row in dataAF.iterrows():
    prediction_file = af3_runs_folder / Path(row["model_path"])
    chains_flipped = row["chains_flipped"]
    if not prediction_file.exists():
        print(f"For {row['prediction_name']} does not exist at expected location ({prediction_file.resolve()})")
        continue

    structure_folder_dest: Path = (export_destination / ("DDI" if "ddi" in str(row['benchmark_set']).lower() else "DMI") / row["benchmark_set"] / row["prediction_name"])
    structure_folder_dest.mkdir(parents=True, exist_ok=True)

    if (structure_file_dest := structure_folder_dest / (str(row["model_id"]) + ".pdb")).exists() and export_skip_existing_structures:
        print(f"{row['prediction_name']}/{structure_file_dest.name} already processed. Skip")
        continue
    else:
        print(f"{row['prediction_name']}/{structure_file_dest.name} ->", "Flipping chains" if chains_flipped else "")

    pymol.cmd.load(prediction_file, prediction_file.stem)

    if chains_flipped: # Reorder chains
        pymol.cmd.alter(selection="chain A", expression="chain = 'C'")
        pymol.cmd.alter(selection="chain B", expression="chain = 'A'")
        pymol.cmd.alter(selection="chain C", expression="chain = 'B'")
        pymol.cmd.alter(selection="segi A", expression="segi = 'C'")
        pymol.cmd.alter(selection="segi B", expression="segi = 'A'")
        pymol.cmd.alter(selection="segi C", expression="segi = 'B'")
        pymol.cmd.sort()
        pymol.cmd.alter(selection="chain A", expression=f"ID = (int(ID) - {pymol.cmd.count_atoms('chain B')})")
        pymol.cmd.alter(selection="chain B", expression=f"ID = (int(ID) + {pymol.cmd.count_atoms('chain A')})")
        pymol.cmd.sort()

    pymol.cmd.save(structure_file_dest)
    for o in pymol.cmd.get_object_list():
        pymol.cmd.delete(o)


MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30/ranked_0.pdb -> Flipping chains
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30/ranked_1.pdb -> Flipping chains
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30/ranked_2.pdb -> Flipping chains
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30/ranked_3.pdb -> Flipping chains
MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30/ranked_4.pdb -> Flipping chains
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS/ranked_0.pdb -> Flipping chains
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS/ranked_1.pdb -> Flipping chains
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS/ranked_2.pdb -> Flipping chains
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS/ranked_3.pdb -> Flipping chains
MDEG_COP1_1_5IGO.DDOC_USP7_MATH_1_3MQS/ranked_4.pdb -> Flipping chains
MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0/ranked_0.pdb -> Flipping chains
MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0/ranked_1.pdb -> Flipping chains
MDEG_Kelch_Keap1_1_2FLU.DLIG_DLG_GKlike_1_3WP0/ranked_2.pdb -> Flipping chain

In [None]:
# Helper cell: If pymol crashes, use this cell to reset pymol
for o in pymol.cmd.get_object_list():
        pymol.cmd.delete(o)

### 5 Reorder of columns

In [18]:
dataAF.columns

Index(['model_preset', 'benchmark_set', 'prediction_name', 'model_id',
       'ranking_score', 'chainA_length', 'chainB_length',
       'fraction_disordered', 'has_clash', 'iptm', 'ptm',
       'chainA_intf_avg_plddt', 'chainB_intf_avg_plddt', 'intf_avg_plddt',
       'num_chainA_intf_res', 'num_chainB_intf_res', 'num_res_res_contact',
       'num_atom_atom_contact', 'iPAE', 'pDockQ', 'model_path',
       'chains_flipped'],
      dtype='object')

In [19]:
c = list(dataAF.columns)
c.remove("model_path")
c.remove("chains_flipped")
c.append("chains_flipped")
c.append("model_path")

if "num_mutations" in c:
    c.remove("num_mutations")
    c.insert(4,"num_mutations")

dataAF = dataAF[c].copy()
dataAF

Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,chains_flipped,model_path
0,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_0,0.65,237,5,0.02,0.0,0.57,...,49.49,74.38,13,5,20,178,10.60,0.07,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
1,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_1,0.65,237,5,0.02,0.0,0.56,...,48.41,74.62,13,5,21,224,11.00,0.09,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
2,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_2,0.65,237,5,0.02,0.0,0.56,...,49.80,75.97,14,5,21,196,10.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
3,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_3,0.64,237,5,0.02,0.0,0.56,...,50.50,75.69,15,5,22,199,11.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
4,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_4,0.62,237,5,0.02,0.0,0.53,...,48.32,73.87,13,5,21,206,11.90,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,312,4,0.02,0.0,0.91,...,84.04,94.45,17,4,26,239,3.50,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
676,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,312,4,0.01,0.0,0.90,...,83.70,95.04,17,4,26,233,3.60,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
677,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,312,4,0.02,0.0,0.90,...,83.24,94.40,18,4,28,246,3.65,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
678,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,312,4,0.01,0.0,0.90,...,83.07,94.86,17,4,26,232,3.60,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...


### 6 Save metric file

In [20]:
# Sorting the file
dataAF["_benchmark_id"] = dataAF["benchmark_set"].replace({"known_minimal": "1", "random_minimal": "2", "mutations": "3","known_extension": "4", "known_ddi": "5", "random_ddi": "6"}).astype(int)
dataAF.sort_values(["_benchmark_id", "prediction_name", "model_id"], inplace=True)
dataAF.drop(columns=["_benchmark_id"], inplace=True)
dataAF.reset_index(drop=True, inplace=True)

dataAF

Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,chains_flipped,model_path
0,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_0,0.65,237,5,0.02,0.0,0.57,...,49.49,74.38,13,5,20,178,10.60,0.07,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
1,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_1,0.65,237,5,0.02,0.0,0.56,...,48.41,74.62,13,5,21,224,11.00,0.09,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
2,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_2,0.65,237,5,0.02,0.0,0.56,...,49.80,75.97,14,5,21,196,10.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
3,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_3,0.64,237,5,0.02,0.0,0.56,...,50.50,75.69,15,5,22,199,11.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
4,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_4,0.62,237,5,0.02,0.0,0.53,...,48.32,73.87,13,5,21,206,11.90,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,312,4,0.02,0.0,0.91,...,84.04,94.45,17,4,26,239,3.50,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
676,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,312,4,0.01,0.0,0.90,...,83.70,95.04,17,4,26,233,3.60,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
677,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,312,4,0.02,0.0,0.90,...,83.24,94.40,18,4,28,246,3.65,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
678,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,312,4,0.01,0.0,0.90,...,83.07,94.86,17,4,26,232,3.60,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...


In [21]:
# Export metrics files
if not export_destination.exists() or not export_destination.is_dir():
    raise Exception("Your destination path is not valid")

dataAF.to_csv(export_destination / "AF3_output_only_known_minimal.tsv", sep="\t", index=False)
dataAF.to_excel(export_destination / "AF3_output_only_known_minimal.xlsx", sheet_name="AF3", index=False)

Need to load the metric file to recalculate some columns? Remove the comments on the following cell

In [22]:
# Load metrics file
dataAF = pd.read_csv(export_destination / "AF3_output_only_known_minimal.tsv", sep="\t")
dataAF

Unnamed: 0,model_preset,benchmark_set,prediction_name,model_id,ranking_score,chainA_length,chainB_length,fraction_disordered,has_clash,iptm,...,chainB_intf_avg_plddt,intf_avg_plddt,num_chainA_intf_res,num_chainB_intf_res,num_res_res_contact,num_atom_atom_contact,iPAE,pDockQ,chains_flipped,model_path
0,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_0,0.65,237,5,0.02,0.0,0.57,...,49.49,74.38,13,5,20,178,10.60,0.07,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
1,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_1,0.65,237,5,0.02,0.0,0.56,...,48.41,74.62,13,5,21,224,11.00,0.09,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
2,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_2,0.65,237,5,0.02,0.0,0.56,...,49.80,75.97,14,5,21,196,10.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
3,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_3,0.64,237,5,0.02,0.0,0.56,...,50.50,75.69,15,5,22,199,11.80,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
4,alphafold3,random_minimal,MDEG_APCC_KENBOX_2_4GGD.DTRG_AP2beta_CARGO_1_2G30,ranked_4,0.62,237,5,0.02,0.0,0.53,...,48.32,73.87,13,5,21,206,11.90,0.08,True,AlphaFold_benchmark_DMI/random_minimal/distrac...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
675,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_0,0.93,312,4,0.02,0.0,0.91,...,84.04,94.45,17,4,26,239,3.50,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
676,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_1,0.92,312,4,0.01,0.0,0.90,...,83.70,95.04,17,4,26,233,3.60,0.31,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
677,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_2,0.92,312,4,0.02,0.0,0.90,...,83.24,94.40,18,4,28,246,3.65,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
678,alphafold3,random_minimal,MTRG_PTS1_2C0L.DLIG_WD40_WDR5_WIN_2_4CY3,ranked_3,0.91,312,4,0.01,0.0,0.90,...,83.07,94.86,17,4,26,232,3.60,0.30,True,AlphaFold_benchmark_DMI/random_minimal/gloomy_...
