# Using Industry Benchmark Systems

This notebook demonstrates how to use the `industry_benchmark_systems` module to access remediated benchmark system inputs ready for use with the OpenFE toolkit.

The module provides a convenient factory method to load benchmark systems with properly organized protein structures, ligands with different partial charge types, optional cofactors, and network mappings.

## Setup

First, let's import the necessary functions from the module:

In [None]:
from openfe_benchmarks.data import (
    get_benchmark_system,
    list_benchmark_sets,
    list_systems,
    PARTIAL_CHARGE_TYPES,
)

## Discovering Available Benchmark Sets

The module automatically discovers all available benchmark sets in the directory structure. Let's see what's available:

In [2]:
# List all available benchmark sets
benchmark_sets = list_benchmark_sets()
print(f"Available benchmark sets ({len(benchmark_sets)}):")
for bset in benchmark_sets:
    print(f"  - {bset}")

Available benchmark sets (4):
  - fragments
  - jacs_set
  - janssen_bace
  - mcs_docking_set


## Exploring Systems in a Benchmark Set

Each benchmark set contains multiple systems. Let's explore what's available in the `jacs_set`:

In [3]:
# List systems in the jacs_set
jacs_systems = list_systems('jacs_set')
print(f"Systems in 'jacs_set' ({len(jacs_systems)}):")
for system in jacs_systems:
    print(f"  - {system}")

Systems in 'jacs_set' (8):
  - bace
  - cdk2
  - jnk1
  - mcl1
  - p38
  - ptp1b
  - thrombin
  - tyk2


Let's also look at the `fragments` benchmark set:

In [None]:
fragment_systems = list_systems('fragments')
print(f"Systems in 'fragments' ({len(fragment_systems)}):")
for system in fragment_systems:
    print(f"  - {system}")

Systems in 'fragments' (9):
  - hsp90_2rings
  - hsp90_single_ring
  - jak2_set1
  - jak2_set2
  - liga
  - mcl1
  - mup1
  - p38
  - t4_lysozyme


## Loading a Benchmark System

Now let's load a specific benchmark system using the factory method. We'll use the P38 system from the JACS set:

In [None]:
# Load the P38 system
p38_system = get_benchmark_system('jacs_set', 'p38')
p38_system.__dict__

[32m2026-01-21 08:49:52.891[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36mget_benchmark_system[0m:[36m254[0m - [34m[1mLoading benchmark system 'p38' from 'jacs_set'...[0m
[32m2026-01-21 08:49:52.891[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m151[0m - [34m[1mFound network mapping: lomap_network.graphml[0m
[32m2026-01-21 08:49:52.891[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with openeye_elf10 charges: ligands_openeye_elf10.sdf[0m
[32m2026-01-21 08:49:52.893[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with antechamber_am1bcc charges: ligands_antechamber_am1bcc.sdf[0m
[32m2026-01-21 08:49:52.894[0m | [34m

{'name': 'p38',
 'benchmark_set': 'jacs_set',
 'protein': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/protein.pdb'),
 'ligands': {'openeye_elf10': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/ligands_openeye_elf10.sdf'),
  'antechamber_am1bcc': PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/ligands_antechamber_am1bcc.sdf')},
 'cofactors': {},
 'mappings': [PosixPath('/Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/lomap_network.graphml')]}

## Accessing System Components

The `BenchmarkSystem` object provides easy access to all components:

### System Metadata

In [None]:
print(f"System name: {p38_system.name}")
print(f"Benchmark set: {p38_system.benchmark_set}")

System name: p38
Benchmark set: jacs_set


### Protein Structure

In [None]:
print(f"Protein PDB file: {p38_system.protein}")
print(f"File exists: {p38_system.protein.exists()}")
print(f"File size: {p38_system.protein.stat().st_size / 1024:.2f} KB")

Protein PDB file: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/protein.pdb
File exists: True
File size: 547.42 KB


### Ligands with Different Partial Charges

The system can contain ligands with different partial charge types. Let's see what's available:

In [None]:
print(f"Available partial charge types in this module: {PARTIAL_CHARGE_TYPES}")
print(f"\nLigand files available for P38 system:")
for charge_type, ligand_path in p38_system.ligands.items():
    print(f"  - {charge_type}: {ligand_path.name}")

Available partial charge types in this module: ['antechamber_am1bcc', 'openeye_elf10']

Ligand files available for P38 system:
  - openeye_elf10: ligands_openeye_elf10.sdf
  - antechamber_am1bcc: ligands_antechamber_am1bcc.sdf


### Accessing Specific Charge Type

In [None]:
# Get the path to ligands with AM1-BCC charges
am1bcc_ligands = p38_system.ligands['antechamber_am1bcc']
print(f"AM1-BCC ligands: {am1bcc_ligands}")

# Get the path to ligands with ELF10 charges
elf10_ligands = p38_system.ligands['openeye_elf10']
print(f"ELF10 ligands: {elf10_ligands}")

AM1-BCC ligands: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/ligands_antechamber_am1bcc.sdf
ELF10 ligands: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/ligands_openeye_elf10.sdf


### Cofactors

Some systems may include cofactors. Let's check:

In [None]:
if p38_system.cofactors:
    print("Cofactor files available:")
    for charge_type, cofactor_path in p38_system.cofactors.items():
        print(f"  - {charge_type}: {cofactor_path.name}")
else:
    print("No cofactors for this system.")

No cofactors for this system.


### Network Mappings

Systems include network mapping files (e.g., LOMAP networks):

In [None]:
print(f"Network mapping files ({len(p38_system.mappings)}):")
for mapping_path in p38_system.mappings:
    print(f"  - {mapping_path.name}")
    print(f"    Size: {mapping_path.stat().st_size / 1024:.2f} KB")

Network mapping files (1):
  - lomap_network.graphml
    Size: 269.85 KB


## Working with Multiple Systems

Let's load and compare multiple systems from the same benchmark set:

In [None]:
# Load multiple systems
systems = {}
for system_name in ['p38', 'tyk2', 'ptp1b']:
    systems[system_name] = get_benchmark_system('jacs_set', system_name)

# Compare them
print("System comparison:")
print(f"{'System':<15} {'Charge Types':<30} {'Mappings':<10} {'Cofactors'}")
print("="*70)
for name, system in systems.items():
    charge_types = ', '.join(system.ligands.keys())
    has_cofactors = 'Yes' if system.cofactors else 'No'
    print(f"{name:<15} {charge_types:<30} {len(system.mappings):<10} {has_cofactors}")

[32m2026-01-21 08:49:24.710[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36mget_benchmark_system[0m:[36m254[0m - [34m[1mLoading benchmark system 'p38' from 'jacs_set'...[0m
[32m2026-01-21 08:49:24.711[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m151[0m - [34m[1mFound network mapping: lomap_network.graphml[0m
[32m2026-01-21 08:49:24.711[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with openeye_elf10 charges: ligands_openeye_elf10.sdf[0m
[32m2026-01-21 08:49:24.711[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with antechamber_am1bcc charges: ligands_antechamber_am1bcc.sdf[0m
[32m2026-01-21 08:49:24.711[0m | [34m

System comparison:
System          Charge Types                   Mappings   Cofactors
p38             openeye_elf10, antechamber_am1bcc 1          No
tyk2            openeye_elf10, antechamber_am1bcc 1          No
ptp1b           openeye_elf10, antechamber_am1bcc 1          No


## Finding Systems with Cofactors

Let's search through all benchmark sets to find systems that include cofactors:

In [None]:
systems_with_cofactors = []

for benchmark_set in list_benchmark_sets():
    for system_name in list_systems(benchmark_set):
        system = get_benchmark_system(benchmark_set, system_name)
        if system.cofactors:
            systems_with_cofactors.append((benchmark_set, system_name, system))

if systems_with_cofactors:
    print(f"Found {len(systems_with_cofactors)} system(s) with cofactors:")
    for bset, sname, system in systems_with_cofactors:
        print(f"\n  {bset}/{sname}:")
        for charge_type in system.cofactors.keys():
            print(f"    - {charge_type} charges available")
else:
    print("No systems with cofactors found.")

[32m2026-01-21 08:49:24.767[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36mget_benchmark_system[0m:[36m254[0m - [34m[1mLoading benchmark system 'hsp90_2rings' from 'fragments'...[0m
[32m2026-01-21 08:49:24.767[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m151[0m - [34m[1mFound network mapping: lomap_network.graphml[0m
[32m2026-01-21 08:49:24.767[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with openeye_elf10 charges: ligands_openeye_elf10.sdf[0m
[32m2026-01-21 08:49:24.768[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with antechamber_am1bcc charges: ligands_antechamber_am1bcc.sdf[0m
[32m2026-01-21 08:49:24.768[

Found 1 system(s) with cofactors:

  mcs_docking_set/hne:
    - antechamber_am1bcc charges available
    - openeye_elf10 charges available


## Using with OpenFE

Here's how you would typically use these paths with OpenFE to load the actual molecular structures:

In [None]:
# Example (uncomment if you have OpenFE installed):
# from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent
# from rdkit import Chem
# import MDAnalysis as mda

# Load the benchmark system
system = get_benchmark_system('jacs_set', 'p38')

# Load protein
# protein = ProteinComponent.from_pdb_file(str(system.protein))

# Load ligands with AM1-BCC charges
# ligand_supplier = Chem.SDMolSupplier(str(system.ligands['antechamber_am1bcc']))
# ligands = [SmallMoleculeComponent(mol) for mol in ligand_supplier if mol is not None]

print(f"System ready to use with OpenFE:")
print(f"  Protein: {system.protein}")
print(f"  Ligands: {system.ligands['antechamber_am1bcc']}")
print(f"  Network: {system.mappings[0] if system.mappings else 'N/A'}")

[32m2026-01-21 08:49:24.834[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36mget_benchmark_system[0m:[36m254[0m - [34m[1mLoading benchmark system 'p38' from 'jacs_set'...[0m
[32m2026-01-21 08:49:24.834[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m151[0m - [34m[1mFound network mapping: lomap_network.graphml[0m
[32m2026-01-21 08:49:24.835[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with openeye_elf10 charges: ligands_openeye_elf10.sdf[0m
[32m2026-01-21 08:49:24.835[0m | [34m[1mDEBUG   [0m | [36mopenfe_benchmarks.data.industry_benchmark_systems[0m:[36m_validate_and_load_system[0m:[36m128[0m - [34m[1mFound ligands with antechamber_am1bcc charges: ligands_antechamber_am1bcc.sdf[0m
[32m2026-01-21 08:49:24.835[0m | [34m

System ready to use with OpenFE:
  Protein: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/protein.pdb
  Ligands: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/ligands_antechamber_am1bcc.sdf
  Network: /Users/jenniferclark/bin/openfe-benchmarks/openfe_benchmarks/data/industry_benchmark_systems/jacs_set/p38/lomap_network.graphml


## Error Handling

The module provides helpful error messages when you try to access non-existent benchmark sets or systems:

In [None]:
# Try to load a non-existent benchmark set
try:
    system = get_benchmark_system('nonexistent_set', 'p38')
except ValueError as e:
    print(f"Error: {e}")

Error: Benchmark set 'nonexistent_set' not found. Available benchmark sets: ['fragments', 'jacs_set', 'janssen_bace', 'mcs_docking_set']


In [None]:
# Try to load a non-existent system
try:
    system = get_benchmark_system('jacs_set', 'nonexistent_system')
except ValueError as e:
    print(f"Error: {e}")

Error: System 'nonexistent_system' not found in benchmark set 'jacs_set'. Available systems in 'jacs_set': ['bace', 'cdk2', 'jnk1', 'mcl1', 'p38', 'ptp1b', 'thrombin', 'tyk2']


## Summary

The `industry_benchmark_systems` module provides:

1. **Easy discovery** of available benchmark sets and systems
2. **Organized access** to protein structures, ligands with different charge types, cofactors, and network mappings
3. **Validation** to ensure all required files are present and properly named
4. **Helpful error messages** when things go wrong

### Key Functions:
- `list_benchmark_sets()` - List all available benchmark sets
- `list_systems(benchmark_set)` - List all systems in a benchmark set
- `get_benchmark_system(benchmark_set, system_name)` - Load a specific benchmark system

### BenchmarkSystem Attributes:
- `name` - System name
- `benchmark_set` - Benchmark set name
- `protein` - Path to protein.pdb
- `ligands` - Dictionary of charge_type: path to ligands_<charge_type>.sdf
- `cofactors` - Dictionary of charge_type: path to cofactors_<charge_type>.sdf (may be empty)
- `mappings` - List of paths to network mapping files

In [None]:
# Example of visualizing ligands (uncomment if you have RDKit installed):
# from rdkit import Chem
# from rdkit.Chem import AllChem, Draw

# Load a system
system = get_benchmark_system('jacs_set', 'tyk2')

# Load ligands with AM1-BCC charges
# ligand_supplier = Chem.SDMolSupplier(str(system.ligands['antechamber_am1bcc']), removeHs=False)
# ligands = [mol for mol in ligand_supplier if mol is not None]

# Generate 2D coordinates for visualization
# for ligand in ligands:
#     AllChem.Compute2DCoords(ligand)

# Display as grid
# Draw.MolsToGridImage(ligands[:12], molsPerRow=4, subImgSize=(200, 200))

print(f"System: {system.name}")
print(f"Ligand file: {system.ligands['antechamber_am1bcc']}")
print(f"File size: {system.ligands['antechamber_am1bcc'].stat().st_size / 1024:.2f} KB")

## Visualizing Ligand Structures

Let's visualize the ligands in a system using RDKit:

In [None]:
# Example of loading a network file (uncomment if you have OpenFE/gufe installed):
# from gufe import LigandNetwork

# Load a system with network mappings
system = get_benchmark_system('jacs_set', 'p38')

if system.mappings:
    print(f"Found {len(system.mappings)} network mapping file(s):")
    for mapping_file in system.mappings:
        print(f"  - {mapping_file.name}")
        
        # To load the network:
        # with open(mapping_file, 'r') as f:
        #     graphml_str = f.read()
        # network = LigandNetwork.from_graphml(graphml_str)
        # print(f"    Network has {len(network.nodes)} nodes and {len(network.edges)} edges")
else:
    print("No network mapping files found for this system.")

## Loading Network Files

The benchmark systems include pre-computed network mapping files (GraphML format). Let's see how to load and inspect them: