# Using Industry Benchmark Systems

This notebook demonstrates how to use the `industry_benchmark_systems` module to access remediated benchmark system inputs ready for use with the OpenFE toolkit.

The module provides a convenient factory method to load benchmark systems with properly organized protein structures, ligands with different partial charge types, optional cofactors, and network mappings.

## Setup

First, let's import the necessary functions from the module:

In [1]:
from openfe_benchmarks.data import (
    get_benchmark_system,
    list_benchmark_sets,
    list_systems,
    get_benchmark_set_systems,
    PARTIAL_CHARGE_TYPES,
)

## Discovering Available Benchmark Sets

The module automatically discovers all available benchmark sets in the directory structure. Let's see what's available:

In [2]:
# List all available benchmark sets
benchmark_sets = list_benchmark_sets()
print(f"Available benchmark sets ({len(benchmark_sets)}):")
for bset in benchmark_sets:
    print(f"  - {bset}")

Available benchmark sets (8):
  - industry_benchmark_systems.charge_annihilation_set
  - industry_benchmark_systems.fragments
  - industry_benchmark_systems.jacs_set
  - industry_benchmark_systems.janssen_bace
  - industry_benchmark_systems.mcs_docking_set
  - industry_benchmark_systems.merck
  - industry_benchmark_systems.miscellaneous_set
  - industry_benchmark_systems.water_set


## Exploring Systems in a Benchmark Set

Each benchmark set contains multiple systems. Let's explore what's available in the `industry_benchmark_systems.jacs_set`:

In [3]:
# List systems in the industry_benchmark_systems.jacs_set
jacs_systems = list_systems('industry_benchmark_systems.jacs_set')
print(f"Systems in 'industry_benchmark_systems.jacs_set' ({len(jacs_systems)}):")
for system in jacs_systems:
    print(f"  - {system}")

Systems in 'industry_benchmark_systems.jacs_set' (8):
  - bace
  - cdk2
  - jnk1
  - mcl1
  - p38
  - ptp1b
  - thrombin
  - tyk2


Let's also look at the `industry_benchmark_systems.fragments` benchmark set:

In [4]:
fragment_systems = list_systems('industry_benchmark_systems.fragments')
print(f"Systems in 'industry_benchmark_systems.fragments' ({len(fragment_systems)}):")
for system in fragment_systems:
    print(f"  - {system}")

Systems in 'industry_benchmark_systems.fragments' (7):
  - hsp90_2rings
  - hsp90_single_ring
  - jak2_set1
  - jak2_set2
  - liga
  - mup1
  - p38


## Loading a Benchmark System

Now let's load a specific benchmark system using the factory method. We'll use the HNE system from the MCS Docking set:

In [5]:
p38_system = get_benchmark_system('industry_benchmark_systems.mcs_docking_set', 'hne')
p38_system.__dict__

[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'hne' from benchmark set 'industry_benchmark_systems.mcs_docking_set' with 5 ligand file(s), 5 cofactor file(s), and 1 network file(s).[0m


{'name': 'hne',
 'benchmark_set': 'industry_benchmark_systems.mcs_docking_set',
 'protein': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/protein.pdb'),
 'ligands': {'openeye_am1bcc': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands_openeye_am1bcc.sdf'),
  'nagl_openff-gnn-am1bcc-1.0.0.pt': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf'),
  'no_charges': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands.sdf'),
  'antechamber_am1bcc': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands_antechamber_am1bcc.sdf'),
  'openeye_am1bccelf10': PosixPath('/Us

## Accessing System Components

The `BenchmarkSystem` object provides easy access to all components:

### System Metadata

In [6]:
print(f"System name: {p38_system.name}")
print(f"Benchmark set: {p38_system.benchmark_set}")

System name: hne
Benchmark set: industry_benchmark_systems.mcs_docking_set


### Protein Structure

In [7]:
print(f"Protein PDB file: {p38_system.protein}")
print(f"File exists: {p38_system.protein.exists()}")
print(f"File size: {p38_system.protein.stat().st_size / 1024:.2f} KB")

Protein PDB file: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/protein.pdb
File exists: True
File size: 311.10 KB


### Ligands with Different Partial Charges

The system can contain ligands with different partial charge types. Let's see what's available:

In [8]:
print(f"Available partial charge types in this module: {PARTIAL_CHARGE_TYPES}")
print(f"\nLigand files available for P38 system:")
for charge_type, ligand_path in p38_system.ligands.items():
    print(f"  - {charge_type}: {ligand_path.name}")

Available partial charge types in this module: ['antechamber_am1bcc', 'nagl_openff-gnn-am1bcc-1.0.0.pt', 'openeye_am1bcc', 'openeye_am1bccelf10']

Ligand files available for P38 system:
  - openeye_am1bcc: ligands_openeye_am1bcc.sdf
  - nagl_openff-gnn-am1bcc-1.0.0.pt: ligands_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf
  - no_charges: ligands.sdf
  - antechamber_am1bcc: ligands_antechamber_am1bcc.sdf
  - openeye_am1bccelf10: ligands_openeye_am1bccelf10.sdf


### Accessing Specific Charge Type

In [9]:
# Get the path to ligands with AM1-BCC charges from antechamber
am1bcc_ligands = p38_system.ligands['antechamber_am1bcc']
print(f"AM1-BCC ligands: {am1bcc_ligands}")

# Get the path to ligands with OpenEye AM1-BCC ELF10 charges
elf10_ligands = p38_system.ligands['openeye_am1bccelf10']
print(f"ELF10 ligands: {elf10_ligands}")

AM1-BCC ligands: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands_antechamber_am1bcc.sdf
ELF10 ligands: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/mcs_docking_set/hne/ligands_openeye_am1bccelf10.sdf


### Cofactors

Some systems may include cofactors. Let's check:

In [10]:
if p38_system.cofactors:
    print("Cofactor files available:")
    for charge_type, cofactor_path in p38_system.cofactors.items():
        print(f"  - {charge_type}: {cofactor_path.name}")
else:
    print("No cofactors for this system.")

Cofactor files available:
  - antechamber_am1bcc: cofactors_antechamber_am1bcc.sdf
  - openeye_am1bccelf10: cofactors_openeye_am1bccelf10.sdf
  - nagl_openff-gnn-am1bcc-1.0.0.pt: cofactors_nagl_openff-gnn-am1bcc-1.0.0.pt.sdf
  - no_charges: cofactors.sdf
  - openeye_am1bcc: cofactors_openeye_am1bcc.sdf


### Networks

Systems include network files (e.g., LOMAP networks):

In [11]:
print(f"Network files ({len(p38_system.networks)}):")
for network_path in p38_system.networks:
    print(f"  - {network_path.name}")
    print(f"    Size: {network_path.stat().st_size / 1024:.2f} KB")

Network files (1):
  - industry_benchmarks_network.json
    Size: 162.91 KB


## Working with Multiple Systems

Let's load and compare multiple systems from the same benchmark set:

In [12]:
# Load multiple systems
systems = get_benchmark_set_systems('industry_benchmark_systems.jacs_set')

# Compare them
print("System comparison:")
print(f"{'System':<15} {'Charge Types':<30} {'Networks':<10} {'Cofactors'}")
print("="*70)
for name, system in systems.items():
    charge_types = ', '.join(system.ligands.keys())
    has_cofactors = 'Yes' if system.cofactors else 'No'
    print(f"{name:<15} {charge_types:<30} {len(system.networks):<10} {has_cofactors}")

[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'bace' from benchmark set 'industry_benchmark_systems.jacs_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'cdk2' from benchmark set 'industry_benchmark_systems.jacs_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'jnk1' from benchmark set 'industry_benchmark_systems.jacs_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'mcl1' from benchmark set 'industry_benchmark_systems.jacs_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'p38' from benchmark set 'industry_benchmark_systems.jacs_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file

System comparison:
System          Charge Types                   Networks   Cofactors
bace            openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
cdk2            openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
jnk1            openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
mcl1            openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
p38             openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
ptp1b           openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1          No
thrombin        openeye_am1bcc, nagl_openff-gnn-am1bcc-1.0.0.pt, no_charges, antechamber_am1bcc, openeye_am1bccelf10 1         

## Finding Systems with Cofactors

Let's search through all benchmark sets to find systems that include cofactors:

In [13]:
systems_with_cofactors = []

for benchmark_set in list_benchmark_sets():
    for system_name in list_systems(benchmark_set):
        system = get_benchmark_system(benchmark_set, system_name)
        if system.cofactors:
            systems_with_cofactors.append((benchmark_set, system_name, system))

if systems_with_cofactors:
    print(f"Found {len(systems_with_cofactors)} system(s) with cofactors:")
    for bset, sname, system in systems_with_cofactors:
        print(f"\n  {bset}/{sname}:")
        for charge_type in system.cofactors.keys():
            print(f"    - {charge_type} charges available")
else:
    print("No systems with cofactors found.")

[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'cdk2' from benchmark set 'industry_benchmark_systems.charge_annihilation_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'dlk' from benchmark set 'industry_benchmark_systems.charge_annihilation_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'egfr' from benchmark set 'industry_benchmark_systems.charge_annihilation_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'ephx2' from benchmark set 'industry_benchmark_systems.charge_annihilation_set' with 5 ligand file(s), 0 cofactor file(s), and 1 network file(s).[0m
[32m2026-01-29 11:14:56[0m | [1mINFO    [0m | [1mLoaded system 'irak4_s2' from benchmark set 'industry_benchmark_systems.charge

Found 4 system(s) with cofactors:

  industry_benchmark_systems.charge_annihilation_set/thrombin:
    - antechamber_am1bcc charges available
    - openeye_am1bccelf10 charges available
    - nagl_openff-gnn-am1bcc-1.0.0.pt charges available
    - no_charges charges available
    - openeye_am1bcc charges available

  industry_benchmark_systems.mcs_docking_set/hne:
    - antechamber_am1bcc charges available
    - openeye_am1bccelf10 charges available
    - nagl_openff-gnn-am1bcc-1.0.0.pt charges available
    - no_charges charges available
    - openeye_am1bcc charges available

  industry_benchmark_systems.merck/pfkfb3:
    - antechamber_am1bcc charges available
    - openeye_am1bccelf10 charges available
    - nagl_openff-gnn-am1bcc-1.0.0.pt charges available
    - no_charges charges available
    - openeye_am1bcc charges available

  industry_benchmark_systems.merck/tnks2:
    - antechamber_am1bcc charges available
    - openeye_am1bccelf10 charges available
    - nagl_openff-gnn-am1b

## Using with OpenFE

Here's how you would typically use these paths with OpenFE to load the actual molecular structures:

In [14]:
from openfe import ProteinComponent, SmallMoleculeComponent, LigandNetwork
from rdkit import Chem

# Load a benchmark system that has network files
# Using HNE from mcs_docking_set which has networks, cofactors, and charged ligands
system = get_benchmark_system('industry_benchmark_systems.mcs_docking_set', 'hne')

[32m2026-01-29 11:14:59[0m | [1mINFO    [0m | [1mLoaded system 'hne' from benchmark set 'industry_benchmark_systems.mcs_docking_set' with 5 ligand file(s), 5 cofactor file(s), and 1 network file(s).[0m


In [15]:
# Load protein
protein = ProteinComponent.from_pdb_file(str(system.protein))
print(f"Loaded protein: {protein}")
print(f"  Number of atoms: {protein.to_rdkit().GetNumAtoms()}")

Loaded protein: ProteinComponent(name=)
  Number of atoms: 3804


In [16]:
# Check available ligand files
print(f"\nAvailable ligand files for {system.name}:")
for charge_type in system.ligands.keys():
    print(f"  - {charge_type}")

# Load ligands without charges first (most reliable)
# Then OpenFE can assign charges during setup
print(f"\nLoading ligands from: {system.ligands['no_charges'].name}")

ligand_supplier = Chem.SDMolSupplier(str(system.ligands['no_charges']), removeHs=False)
ligands = []
for i, mol in enumerate(ligand_supplier):
    if mol is not None:
        try:
            ligands.append(SmallMoleculeComponent(mol))
        except Exception as e:
            print(f"  Warning: Skipping molecule {i+1} due to error: {e}")

print(f"\nSuccessfully loaded {len(ligands)} ligands")

# Show details of first few ligands
print("\nFirst 3 ligands:")
for i, ligand in enumerate(ligands[:3], 1):
    rdkit_mol = ligand.to_rdkit()
    print(f"  {i}. {ligand.name} - {rdkit_mol.GetNumAtoms()} atoms, {rdkit_mol.GetNumBonds()} bonds")



Available ligand files for hne:
  - openeye_am1bcc
  - nagl_openff-gnn-am1bcc-1.0.0.pt
  - no_charges
  - antechamber_am1bcc
  - openeye_am1bccelf10

Loading ligands from: ligands.sdf

Successfully loaded 17 ligands

First 3 ligands:
  1. 6 - 55 atoms, 58 bonds
  2. 14 - 49 atoms, 51 bonds
  3. 15 - 56 atoms, 58 bonds


In [17]:
# Load network if available
network = None
if system.networks:
    print(f"\n{'='*60}")
    print(f"Loading network from: {system.networks[0].name}")
    
    # Load network using from_json with file parameter (pass the file path directly)
    network = LigandNetwork.from_json(file=str(system.networks[0]))
    
    print(f"  Network loaded successfully!")
    print(f"  Network nodes: {len(network.nodes)}")
    print(f"  Network edges: {len(network.edges)}")
    
    # Show some example edges
    print("\nFirst 3 network edges:")
    for i, edge in enumerate(list(network.edges)[:3], 1):
        print(f"  {i}. {edge.componentA.name} <-> {edge.componentB.name}")
        
    # Note: The network contains its own SmallMoleculeComponent objects
    # which may differ slightly from the ligands we loaded above
    print(f"\nNote: Network contains {len(network.nodes)} ligand(s) with embedded molecule data")
else:
    print("\nNo network files found for this system.")


Loading network from: industry_benchmarks_network.json
  Network loaded successfully!
  Network nodes: 17
  Network edges: 23

First 3 network edges:
  1. 6 <-> 18
  2. 18 <-> 19
  3. 6 <-> 14

Note: Network contains 17 ligand(s) with embedded molecule data


In [18]:
# Load cofactors if available
cofactors = []
if system.cofactors:
    print(f"\n{'='*60}")
    print(f"Loading cofactors from: {system.cofactors['no_charges'].name}")
    cofactor_supplier = Chem.SDMolSupplier(str(system.cofactors['no_charges']), removeHs=False)
    for mol in cofactor_supplier:
        if mol is not None:
            print(mol)
            try:
                cofactors.append(SmallMoleculeComponent(mol, name=mol.GetProp("s_m_entry_name")))
            except Exception as e:
                print(f"  Warning: Skipping cofactor due to error: {e}")
    print(f"Successfully loaded {len(cofactors)} cofactor(s)")



Loading cofactors from: cofactors.sdf
<rdkit.Chem.rdchem.Mol object at 0x303e72c00>
Successfully loaded 1 cofactor(s)




In [19]:
SmallMoleculeComponent(mol, name=mol.s_m_entry_name)

AttributeError: 'Mol' object has no attribute 's_m_entry_name'

In [None]:
print(f"\n{'='*60}")
print(f"System components ready for OpenFE:")
print(f"  Protein: {protein.name if protein.name else 'unnamed'} ({system.protein.name})")
print(f"  Ligands: {len(ligands)} molecules loaded from SDF")
print(f"  Cofactors: {len(cofactors)} molecule(s)")
print(f"  Network: {len(network.edges) if network else 0} transformations")


System components ready for OpenFE:
  Protein: unnamed (protein.pdb)
  Ligands: 27 molecules loaded from SDF
  Cofactors: 0 molecule(s)
  Network: 0 transformations


## Error Handling

The module provides helpful error messages when you try to access non-existent benchmark sets or systems:

In [None]:
# Try to load a non-existent benchmark set
try:
    system = get_benchmark_system('industry_benchmark_systems.nonexistent_set', 'p38')
except ValueError as e:
    print(f"Error: {e}")

Error: Benchmark set 'industry_benchmark_systems.nonexistent_set' not found. Available benchmark sets: ['industry_benchmark_systems.charge_annihilation_set', 'industry_benchmark_systems.fragments', 'industry_benchmark_systems.jacs_set', 'industry_benchmark_systems.janssen_bace', 'industry_benchmark_systems.mcs_docking_set', 'industry_benchmark_systems.merck', 'industry_benchmark_systems.miscellaneous_set', 'industry_benchmark_systems.water_set']


In [None]:
# Try to load a non-existent system
try:
    system = get_benchmark_system('industry_benchmark_systems.jacs_set', 'nonexistent_system')
except ValueError as e:
    print(f"Error: {e}")

Error: System 'nonexistent_system' not found in benchmark set 'industry_benchmark_systems.jacs_set'. Available systems in 'industry_benchmark_systems.jacs_set': ['bace', 'cdk2', 'jnk1', 'mcl1', 'p38', 'ptp1b', 'thrombin', 'tyk2']
