# Benchmarking Experiments

This notebook provides a detailed overview of the benchmarking experiments and illustrates how the StereoMapper tool is applied to each dataset to generate the corresponding relationship-assignment results.

Note: As the pipelineâ€™s source code is subject to ongoing development, minor variations in results may occur between different versions of StereoMapper.

## Enantiomer benchmarking

In [None]:
import os
import pandas as pd
from pathlib import Path
import json
import re
import sqlite3
import subprocess
import pandas as pd
import sqlite3
from pathlib import Path
import logging
import os
import subprocess
logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [3]:
## some functions for use in the notebook

def norm_chebi(x: str) -> str:
    """Return a canonical CHEBI_###### string from inputs like 'chebi:123', 'CHEBI_123', '123'."""
    s = str(x).strip()
    m = re.search(r'(\d+)$', s)  # grab trailing digits
    if not m:
        return s.upper()  # fallback: just uppercase unknowns
    return f"CHEBI_{m.group(1)}"

def safe_load_members(x):
    if isinstance(x, (list, tuple)):
        return x
    try:
        return json.loads(x)
    except Exception:
        return x  # already a string or malformed

def canonical_chebi_id(members):
    # members may be a list of strings or a single string; return canonical ID like "CHEBI_12345"
    if not members:
        return ""
    # if a list, pick canonical representative (sorted deterministically)
    if isinstance(members, (list, tuple)):
        candidates = []
        for m in members:
            if not isinstance(m, str):
                continue
            s = m.replace("chebi:CHEBI:", "CHEBI_").replace("chebi:CHEBI", "CHEBI_")
            s = s.replace("chebi:", "")
            candidates.append(s.strip())
        return sorted(candidates)[0] if candidates else ""
    # single string
    s = members
    s = s.replace("chebi:CHEBI:", "CHEBI_").replace("chebi:CHEBI", "CHEBI_")
    s = s.replace("chebi:", "").strip()
    return s

def canonical_pairkey(a, b):
    # a,b are identifiers or lists; normalise to strings then sort lexicographically
    ka = canonical_chebi_id(a)
    kb = canonical_chebi_id(b)
    if not ka and not kb:
        return ""
    # sort so order doesn't matter
    left, right = sorted([ka, kb])
    return f"{left}__{right}"

In [4]:
# import csv with control pairings
control_pairs = pd.read_csv('data/enantiomer_control_set.csv')
control_pairs

Unnamed: 0,id1,id2,label
0,CHEBI_43796,CHEBI_30314,Enantiomers
1,CHEBI_30314,CHEBI_43796,Enantiomers
2,CHEBI_15570,CHEBI_16977,Enantiomers
3,CHEBI_16977,CHEBI_15570,Enantiomers
4,CHEBI_32433,CHEBI_32437,Enantiomers
...,...,...,...
2809,CHEBI_235379,CHEBI_235380,Enantiomers
2810,CHEBI_76640,CHEBI_195630,Enantiomers
2811,CHEBI_195630,CHEBI_76640,Enantiomers
2812,CHEBI_235487,CHEBI_76457,Enantiomers


In [None]:
# 1) Canonicalize + dedupe control set
dfc = control_pairs.copy()
dfc["id1_c"] = dfc["id1"].map(norm_chebi)
dfc["id2_c"] = dfc["id2"].map(norm_chebi)
# order-invariant canonical pair (tuple sorted)
dfc["pair_key"] = dfc.apply(lambda r: tuple(sorted((r["id1_c"], r["id2_c"]))), axis=1)
# keep one row per unordered pair (optionally verify labels agree before dropping)
dfc_dedup = dfc.drop_duplicates(subset=["pair_key"]).reset_index(drop=True)
control_keys = set(dfc_dedup["pair_key"])

dfc_dedup

Unnamed: 0,id1,id2,label,id1_c,id2_c,pair_key
0,CHEBI_43796,CHEBI_30314,Enantiomers,CHEBI_43796,CHEBI_30314,"(CHEBI_30314, CHEBI_43796)"
1,CHEBI_15570,CHEBI_16977,Enantiomers,CHEBI_15570,CHEBI_16977,"(CHEBI_15570, CHEBI_16977)"
2,CHEBI_32433,CHEBI_32437,Enantiomers,CHEBI_32433,CHEBI_32437,"(CHEBI_32433, CHEBI_32437)"
3,CHEBI_32447,CHEBI_32452,Enantiomers,CHEBI_32447,CHEBI_32452,"(CHEBI_32447, CHEBI_32452)"
4,CHEBI_17561,CHEBI_16375,Enantiomers,CHEBI_17561,CHEBI_16375,"(CHEBI_16375, CHEBI_17561)"
...,...,...,...,...,...,...
1402,CHEBI_234521,CHEBI_234520,Enantiomers,CHEBI_234521,CHEBI_234520,"(CHEBI_234520, CHEBI_234521)"
1403,CHEBI_234547,CHEBI_234545,Enantiomers,CHEBI_234547,CHEBI_234545,"(CHEBI_234545, CHEBI_234547)"
1404,CHEBI_235380,CHEBI_235379,Enantiomers,CHEBI_235380,CHEBI_235379,"(CHEBI_235379, CHEBI_235380)"
1405,CHEBI_76640,CHEBI_195630,Enantiomers,CHEBI_76640,CHEBI_195630,"(CHEBI_195630, CHEBI_76640)"


In [10]:
# run stereomapper as a subprocess
enantiomers = Path('benchmarking_data/enantiomer_benchmark_data') # downloadable from Zenodo (DOI: 10.5281/zenodo.17831412)
results = Path('enantiomer_benchmark_results.sqlite') 
cache_path = Path('enantiomer_benchmark_cache.sqlite')

In [11]:
cmd =[
    "stereomapper",
    "run",
    "-d", enantiomers.as_posix(),
    "-o", results.as_posix(),
    "-p", cache_path.as_posix(),
    "--fresh-cache"
]

subprocess.run(cmd) 

INFO    Logging initialised. File: logs/stereomapper_20251210_123713.log
Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:11<00:00]


âœ… Pipeline completed in 11.2s
ðŸ“¦ Inputs attempted: 2,504 (skipped 0)
ðŸ“Š Successes: 2,502 | Failures: 2
ðŸ”— Inchikey groups â€” processed 931, skipped 0, failed 0
ðŸ§® Relationship rows: 3,618
ðŸ§¾ Unique inchikeys observed: 931
ðŸ’¾ Cache hit rate: 0.0%


Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:11<00:00]


CompletedProcess(args=['stereomapper', 'run', '-d', '/home/jackmcgoldrick/Downloads/benchmarking_data/enantiomer_benchmark_data', '-o', '/home/jackmcgoldrick/enantiomer_benchmark_results.sqlite', '-p', '/home/jackmcgoldrick/enantiomer_benchmark_cache.sqlite', '--fresh-cache'], returncode=0)

In [14]:
conn = sqlite3.connect(results)
merged_df_q = """ 
SELECT * from relationships;
"""
df_merged = pd.read_sql_query(merged_df_q, conn)
df_merged

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
0,1,2,"[""chebi:17521""]","[""chebi:36124""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
1,3,4,"[""chebi:137507""]","[""chebi:137513""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
2,5,6,"[""chebi:134198""]","[""chebi:134199""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3,7,8,"[""chebi:28651""]","[""chebi:27702""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
4,9,10,"[""chebi:145480""]","[""chebi:145483""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
...,...,...,...,...,...,...,...,...,...,...,...
3613,2456,2458,"[""chebi:133313""]","[""chebi:133312""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3614,2457,2458,"[""chebi:133311""]","[""chebi:133312""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3615,2459,2460,"[""chebi:233960""]","[""chebi:233959""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3616,2461,2462,"[""chebi:142550""]","[""chebi:9399""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0


In [17]:
expanded_keys = set()
for _, row in df_merged.iterrows():
    if not row["cluster_a_members"] or not row["cluster_b_members"]:
        continue
    m1s = [norm_chebi(x) for x in json.loads(row["cluster_a_members"])]
    m2s = [norm_chebi(x) for x in json.loads(row["cluster_b_members"])]
    for a in m1s:
        for b in m2s:
            if a == b:
                continue
            expanded_keys.add(tuple(sorted((a, b))))

# --- Overlap ---
tp = control_keys & expanded_keys
fp = expanded_keys - control_keys
fn = control_keys - expanded_keys

print(f"True Positives: {len(tp)}")
print(f"False Positives: {len(fp)}")
print(f"False Negatives: {len(fn)}")


True Positives: 1243
False Positives: 2494
False Negatives: 164


Not a true representation of the results, contains 2384 false positives majority of which are off target relationships. Therefore, the dataset needs to be remapped back onto the original pairs stored in the control dataset.

In [18]:
valid_pairs = control_keys

df_pred_filtered = []
for _, row in df_merged.iterrows():
    members_1 = [norm_chebi(x) for x in json.loads(row["cluster_a_members"])]
    members_2 = [norm_chebi(x) for x in json.loads(row["cluster_b_members"])]
    for a in members_1:
        for b in members_2:
            pair = tuple(sorted((a, b)))
            if pair in valid_pairs:
                df_pred_filtered.append({"id1": a, "id2": b, **row.to_dict()})

df_pred_filtered = pd.DataFrame(df_pred_filtered)

In [19]:
df_pred_filtered = df_pred_filtered.copy()
df_pred_filtered["pair_key"] = df_pred_filtered.apply(
    lambda r: tuple(sorted([norm_chebi(r["id1"]), norm_chebi(r["id2"])])), axis=1
)

In [22]:
df_eval = pd.merge(
    df_pred_filtered,
    dfc_dedup[["pair_key", "label"]],
    on="pair_key",
    how="left"
)

Dataframe `df_eval` now contains the original pairs of identifiers with their predicitions from stereomapper. This dataframe will be used to accurately calculate the precision, recall and F1 score on the enantiomer control dataset.

In [24]:
# only rows where we have a prediction - removing FN, ensure to state in results section
df_with_pred = df_eval.dropna(subset=["classification"])

tp = ((df_eval["classification"] == df_eval["label"]).sum())
fp = ((df_eval["classification"] != df_eval["label"]).sum())
fn = df_eval["classification"].isna().sum()

print(f"TP: {tp}, FP: {fp}, FN: {fn}")

TP: 1184, FP: 59, FN: 0


In [25]:
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

print(f"Precision: {precision:.3f}")
print(f"Recall:    {recall:.3f}")
print(f"F1 score:  {f1:.3f}")


Precision: 0.953
Recall:    1.000
F1 score:  0.976


Great results, lets investigate the false positive results to see what went wrong and if the pipeline can be improved.

In [26]:
df_fp = df_with_pred[df_with_pred["classification"] != df_with_pred["label"]]
df_fp

Unnamed: 0,id1,id2,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pair_key,label
0,CHEBI_17521,CHEBI_36124,1,2,"[""chebi:17521""]","[""chebi:36124""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_17521, CHEBI_36124)",Enantiomers
117,CHEBI_27374,CHEBI_27372,232,233,"[""chebi:27374""]","[""chebi:27372""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_27372, CHEBI_27374)",Enantiomers
184,CHEBI_140637,CHEBI_136698,364,365,"[""chebi:140637""]","[""chebi:136698""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_136698, CHEBI_140637)",Enantiomers
202,CHEBI_28548,CHEBI_37209,400,401,"[""chebi:28548""]","[""chebi:37209""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_28548, CHEBI_37209)",Enantiomers
203,CHEBI_16002,CHEBI_47537,402,404,"[""chebi:16002""]","[""chebi:47537""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_16002, CHEBI_47537)",Enantiomers
206,CHEBI_21101,CHEBI_21398,406,409,"[""chebi:21101""]","[""chebi:21398""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_21101, CHEBI_21398)",Enantiomers
209,CHEBI_37546,CHEBI_37547,412,416,"[""chebi:37546""]","[""chebi:37547""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_37546, CHEBI_37547)",Enantiomers
210,CHEBI_21100,CHEBI_21397,413,417,"[""chebi:21100""]","[""chebi:21397""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_21100, CHEBI_21397)",Enantiomers
230,CHEBI_17924,CHEBI_28789,456,457,"[""chebi:17924""]","[""chebi:28789""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_17924, CHEBI_28789)",Enantiomers
232,CHEBI_134311,CHEBI_192698,459,461,"[""chebi:134311""]","[""chebi:192698""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_134311, CHEBI_192698)",Enantiomers


In [27]:
df_fp['classification'].value_counts()

classification
Diastereomers              37
Stereo-resolution pairs    16
Unresolved                  5
Unclassified                1
Name: count, dtype: int64

In [28]:
df_fp_dia = df_fp[df_fp['classification'] == "Diastereomers"]
df_fp_dia

Unnamed: 0,id1,id2,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pair_key,label
0,CHEBI_17521,CHEBI_36124,1,2,"[""chebi:17521""]","[""chebi:36124""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_17521, CHEBI_36124)",Enantiomers
117,CHEBI_27374,CHEBI_27372,232,233,"[""chebi:27374""]","[""chebi:27372""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_27372, CHEBI_27374)",Enantiomers
184,CHEBI_140637,CHEBI_136698,364,365,"[""chebi:140637""]","[""chebi:136698""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_136698, CHEBI_140637)",Enantiomers
202,CHEBI_28548,CHEBI_37209,400,401,"[""chebi:28548""]","[""chebi:37209""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_28548, CHEBI_37209)",Enantiomers
203,CHEBI_16002,CHEBI_47537,402,404,"[""chebi:16002""]","[""chebi:47537""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_16002, CHEBI_47537)",Enantiomers
206,CHEBI_21101,CHEBI_21398,406,409,"[""chebi:21101""]","[""chebi:21398""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_21101, CHEBI_21398)",Enantiomers
209,CHEBI_37546,CHEBI_37547,412,416,"[""chebi:37546""]","[""chebi:37547""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_37546, CHEBI_37547)",Enantiomers
210,CHEBI_21100,CHEBI_21397,413,417,"[""chebi:21100""]","[""chebi:21397""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_21100, CHEBI_21397)",Enantiomers
230,CHEBI_17924,CHEBI_28789,456,457,"[""chebi:17924""]","[""chebi:28789""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_17924, CHEBI_28789)",Enantiomers
232,CHEBI_134311,CHEBI_192698,459,461,"[""chebi:134311""]","[""chebi:192698""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_134311, CHEBI_192698)",Enantiomers


Lets investigate these cases manually to determine what went wrong.

Upon manual review, it has been determined that 23 out of the 37 pairs are deemed to be actual correct predictions by the stereomapper pipeline. An the 23 cases, each has been identified to be a mistake in assignment by ChEBI. Most of these relationships are indeed diastereomers, whilst others contain missing stereochemistry, which is picked up by the stereomapper pipeline. In one case, two structures are indeed enantiomers, but the two structures actually share different formal charges meaning they are not deemed enantiomers by the pipeline.

In reality, when accounting for these disagreements, stereomapper has the actual precision, recall and F1 score accounted for below:

In [30]:
df_fp_parent_child = df_fp[df_fp['classification'] == "Stereo-resolution pairs"]
df_fp_parent_child

Unnamed: 0,id1,id2,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pair_key,label
275,CHEBI_37477,CHEBI_37476,547,549,"[""chebi:37477""]","[""chebi:37476""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_37476, CHEBI_37477)",Enantiomers
312,CHEBI_38969,CHEBI_3332,621,622,"[""chebi:38969""]","[""chebi:3332""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_3332, CHEBI_38969)",Enantiomers
387,CHEBI_90389,CHEBI_90391,771,772,"[""chebi:90389""]","[""chebi:90391""]",1,1,Stereo-resolution pairs,75.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_90389, CHEBI_90391)",Enantiomers
388,CHEBI_90394,CHEBI_90395,773,774,"[""chebi:90394""]","[""chebi:90395""]",1,1,Stereo-resolution pairs,75.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_90394, CHEBI_90395)",Enantiomers
402,CHEBI_189872,CHEBI_189871,801,802,"[""chebi:189872""]","[""chebi:189871""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_189871, CHEBI_189872)",Enantiomers
406,CHEBI_17426,CHEBI_137932,809,810,"[""chebi:17426""]","[""chebi:137932""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0,"(CHEBI_137932, CHEBI_17426)",Enantiomers
414,CHEBI_90386,CHEBI_90387,826,827,"[""chebi:90386""]","[""chebi:90387""]",1,1,Stereo-resolution pairs,75.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_90386, CHEBI_90387)",Enantiomers
467,CHEBI_39336,CHEBI_39335,931,932,"[""chebi:39336""]","[""chebi:39335""]",1,1,Stereo-resolution pairs,76.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_39335, CHEBI_39336)",Enantiomers
689,CHEBI_63698,CHEBI_63695,1370,1373,"[""chebi:63698""]","[""chebi:63695""]",1,1,Stereo-resolution pairs,77.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_63695, CHEBI_63698)",Enantiomers
690,CHEBI_63705,CHEBI_63700,1371,1372,"[""chebi:63705""]","[""chebi:63700""]",1,1,Stereo-resolution pairs,77.0,"{""confidence_bin"":""medium""}",,v1.0,"(CHEBI_63700, CHEBI_63705)",Enantiomers


All are incorrect predictions by StereoMapper.

In [31]:
df_fp_unresolved = df_fp[df_fp['classification'] == "Unresolved"]
df_fp_unresolved

Unnamed: 0,id1,id2,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pair_key,label
252,CHEBI_37465,CHEBI_15386,501,503,"[""chebi:37465""]","[""chebi:15386""]",1,1,Unresolved,100.0,"{""confidence_bin"":""high""}",Possible pipeline error - should be no identic...,v1.0,"(CHEBI_15386, CHEBI_37465)",Enantiomers
786,CHEBI_38139,CHEBI_44343,1564,1566,"[""chebi:38139""]","[""chebi:44343""]",1,1,Unresolved,100.0,"{""confidence_bin"":""high""}",Possible pipeline error - should be no identic...,v1.0,"(CHEBI_38139, CHEBI_44343)",Enantiomers
895,CHEBI_47008,CHEBI_47011,1778,1780,"[""chebi:47008""]","[""chebi:47011""]",1,1,Unresolved,100.0,"{""confidence_bin"":""high""}",Possible pipeline error - should be no identic...,v1.0,"(CHEBI_47008, CHEBI_47011)",Enantiomers
1119,CHEBI_83132,CHEBI_83130,2218,2219,"[""chebi:83132""]","[""chebi:83130""]",1,1,Unresolved,75.0,"{""confidence_bin"":""medium""}",Possible pipeline error - should be no identic...,v1.0,"(CHEBI_83130, CHEBI_83132)",Enantiomers
1120,CHEBI_83131,CHEBI_83129,2220,2221,"[""chebi:83131""]","[""chebi:83129""]",1,1,Unresolved,75.0,"{""confidence_bin"":""medium""}",Possible pipeline error - should be no identic...,v1.0,"(CHEBI_83129, CHEBI_83131)",Enantiomers


all incorrect by StereoMapper. the single unclassified case is a complex case where stereochemistry differs, but so does protonation states.

In [32]:
# 23 for corrected FP in diastereomers, 1 for corrected FP in unclassified
tp_act = tp + 24
fp_act = fp - 24

actual_precision = tp_act / (tp_act + fp_act) if (tp_act + fp_act) > 0 else 0
actual_recall = tp_act / (tp_act + fn) if (tp_act + fn) > 0 else 0
actual_f1 = 2 * actual_precision * actual_recall / (actual_precision + actual_recall) if (actual_precision + actual_recall) > 0 else 0

print(f"Adjusted Precision: {actual_precision:.3f}")
print(f"Adjusted Recall:    {actual_recall:.3f}")
print(f"Adjusted F1 score:  {actual_f1:.3f}")

Adjusted Precision: 0.972
Adjusted Recall:    1.000
Adjusted F1 score:  0.986


## Diastereomer benchmarking

In [2]:
control_pairs_dia =  'data/diastereomer_control_set.csv'
df_dia = pd.read_csv(control_pairs_dia)
df_dia

Unnamed: 0,id1,id2,label
0,CHEBI_122389,CHEBI_122317,Diastereomers
1,CHEBI_216464,CHEBI_215732,Diastereomers
2,CHEBI_120133,CHEBI_120171,Diastereomers
3,CHEBI_96557,CHEBI_96022,Diastereomers
4,CHEBI_112617,CHEBI_112586,Diastereomers
...,...,...,...
3092,CHEBI_199087,CHEBI_203717,Diastereomers
3093,CHEBI_207142,CHEBI_215860,Diastereomers
3094,CHEBI_18809,CHEBI_18624,Diastereomers
3095,CHEBI_195839,CHEBI_195840,Diastereomers


In [10]:
# 1) Canonicalize + dedupe control set
dfc_dia = df_dia.copy()
dfc_dia["id1_c"] = dfc_dia["id1"].map(norm_chebi)
dfc_dia["id2_c"] = dfc_dia["id2"].map(norm_chebi)
# order-invariant canonical pair (tuple sorted)
dfc_dia["pair_key"] = dfc_dia.apply(lambda r: tuple(sorted((r["id1_c"], r["id2_c"]))), axis=1)
# keep one row per unordered pair (optionally verify labels agree before dropping)
dfc_dedup_dia = dfc_dia.drop_duplicates(subset=["pair_key"]).reset_index(drop=True)

control_keys_dia = set(dfc_dedup_dia["pair_key"])

dfc_dedup_dia

Unnamed: 0,id1,id2,label,id1_c,id2_c,pair_key
0,CHEBI_122389,CHEBI_122317,Diastereomers,CHEBI_122389,CHEBI_122317,"(CHEBI_122317, CHEBI_122389)"
1,CHEBI_216464,CHEBI_215732,Diastereomers,CHEBI_216464,CHEBI_215732,"(CHEBI_215732, CHEBI_216464)"
2,CHEBI_120133,CHEBI_120171,Diastereomers,CHEBI_120133,CHEBI_120171,"(CHEBI_120133, CHEBI_120171)"
3,CHEBI_96557,CHEBI_96022,Diastereomers,CHEBI_96557,CHEBI_96022,"(CHEBI_96022, CHEBI_96557)"
4,CHEBI_112617,CHEBI_112586,Diastereomers,CHEBI_112617,CHEBI_112586,"(CHEBI_112586, CHEBI_112617)"
...,...,...,...,...,...,...
3092,CHEBI_199087,CHEBI_203717,Diastereomers,CHEBI_199087,CHEBI_203717,"(CHEBI_199087, CHEBI_203717)"
3093,CHEBI_207142,CHEBI_215860,Diastereomers,CHEBI_207142,CHEBI_215860,"(CHEBI_207142, CHEBI_215860)"
3094,CHEBI_18809,CHEBI_18624,Diastereomers,CHEBI_18809,CHEBI_18624,"(CHEBI_18624, CHEBI_18809)"
3095,CHEBI_195839,CHEBI_195840,Diastereomers,CHEBI_195839,CHEBI_195840,"(CHEBI_195839, CHEBI_195840)"


In [6]:
# run stereomapper as a subprocess
diastereomers = Path('/home/jackmcgoldrick/Downloads/benchmarking_data/enantiomer_benchmark_data') # downloadable from Zenodo (DOI: 10.5281/zenodo.17831412)
results = Path('diastereomer_benchmark_results.sqlite') 
cache_path = Path('diastereomer_benchmark_cache.sqlite')

In [7]:
cmd =[
    "stereomapper",
    "run",
    "-d", diastereomers.as_posix(),
    "-o", results.as_posix(),
    "-p", cache_path.as_posix(),
    "--fresh-cache"
]

subprocess.run(cmd) 

INFO    Logging initialised. File: logs/stereomapper_20251211_095128.log
Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:10<00:00]


âœ… Pipeline completed in 10.8s
ðŸ“¦ Inputs attempted: 2,504 (skipped 0)
ðŸ“Š Successes: 2,502 | Failures: 2
ðŸ”— Inchikey groups â€” processed 931, skipped 0, failed 0
ðŸ§® Relationship rows: 3,618
ðŸ§¾ Unique inchikeys observed: 931
ðŸ’¾ Cache hit rate: 0.0%


Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:11<00:00]


CompletedProcess(args=['stereomapper', 'run', '-d', '/home/jackmcgoldrick/Downloads/benchmarking_data/enantiomer_benchmark_data', '-o', 'diastereomer_benchmark_results.sqlite', '-p', 'diastereomer_benchmark_cache.sqlite', '--fresh-cache'], returncode=0)

In [8]:
conn = sqlite3.connect(results)
merged_df_q = """ 
SELECT * from relationships;
"""
df_merged_dia = pd.read_sql_query(merged_df_q, conn)
df_merged_dia

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
0,1,2,"[""chebi:17521""]","[""chebi:36124""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
1,3,4,"[""chebi:137507""]","[""chebi:137513""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
2,5,6,"[""chebi:134198""]","[""chebi:134199""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3,7,8,"[""chebi:28651""]","[""chebi:27702""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
4,9,10,"[""chebi:145480""]","[""chebi:145483""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
...,...,...,...,...,...,...,...,...,...,...,...
3613,2456,2458,"[""chebi:133313""]","[""chebi:133312""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3614,2457,2458,"[""chebi:133311""]","[""chebi:133312""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3615,2459,2460,"[""chebi:233960""]","[""chebi:233959""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3616,2461,2462,"[""chebi:142550""]","[""chebi:9399""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0


In [11]:
# compare back to original df, to permit calculation of accuracy
expanded_keys = set()
for _, row in df_merged_dia.iterrows():
    if not row["cluster_a_members"] or not row["cluster_b_members"]:
        continue
    m1s = [norm_chebi(x) for x in json.loads(row["cluster_a_members"])]
    m2s = [norm_chebi(x) for x in json.loads(row["cluster_b_members"])]
    for a in m1s:
        for b in m2s:
            if a == b:
                continue
            expanded_keys.add(tuple(sorted((a, b))))

# --- Overlap ---
tp = control_keys_dia & expanded_keys
fp = expanded_keys - control_keys_dia
fn = control_keys_dia - expanded_keys

print(f"True Positives: {len(tp)}")
print(f"False Positives: {len(fp)}")
print(f"False Negatives: {len(fn)}")

True Positives: 38
False Positives: 3699
False Negatives: 3059


## Protomer benchmarking

In [20]:
divese_pairs_df = pd.read_csv('data/protomers_control_set.csv')
divese_pairs_df

Unnamed: 0,id_left,id_right,pairkey
0,CHEBI_23123,CHEBI_27869,CHEBI_23123__CHEBI_27869
1,CHEBI_28240,CHEBI_36386,CHEBI_28240__CHEBI_36386
2,CHEBI_27455,CHEBI_30956,CHEBI_27455__CHEBI_30956
3,CHEBI_28995,CHEBI_30748,CHEBI_28995__CHEBI_30748
4,CHEBI_19984,CHEBI_49410,CHEBI_19984__CHEBI_49410
...,...,...,...
3364,CHEBI_234122,CHEBI_234119,CHEBI_234119__CHEBI_234122
3365,CHEBI_234323,CHEBI_88950,CHEBI_234323__CHEBI_88950
3366,CHEBI_234339,CHEBI_234342,CHEBI_234339__CHEBI_234342
3367,CHEBI_234377,CHEBI_32050,CHEBI_234377__CHEBI_32050


In [None]:
protomers = Path('benchmarking_data/protomer_benchmark_data')
results = Path('protomer_benchmarking_results.sqlite')
cache = Path('protomer_benchmarking_cache.sqlite')

In [18]:
cmd =[
    "stereomapper",
    "run",
    "-d", protomers.as_posix(),
    "-o", results.as_posix(),
    "-p", cache_path.as_posix(),
    "--fresh-cache"
]

subprocess.run(cmd) 

INFO    Logging initialised. File: logs/stereomapper_20251211_100617.log
Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:44<00:00]


âœ… Pipeline completed in 44.9s
ðŸ“¦ Inputs attempted: 6,730 (skipped 0)
ðŸ“Š Successes: 6,728 | Failures: 2
ðŸ”— Inchikey groups â€” processed 3,469, skipped 0, failed 0
ðŸ§® Relationship rows: 3,315
ðŸ§¾ Unique inchikeys observed: 3,469
ðŸ’¾ Cache hit rate: 0.0%


Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [00:45<00:00]


CompletedProcess(args=['stereomapper', 'run', '-d', '/home/jackmcgoldrick/Downloads/benchmarking_data/protomer_benchmark_data', '-o', 'protomer_benchmarking_results.sqlite', '-p', 'diastereomer_benchmark_cache.sqlite', '--fresh-cache'], returncode=0)

In [51]:
conn = sqlite3.connect(results)
merged_df_q = """
SELECT * from relationships;
"""
df_merged_protomer = pd.read_sql_query(merged_df_q, conn)
df_merged_protomer

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
0,1,2,"[""chebi:52504""]","[""chebi:40617""]",1,1,Protomers,74.0,"{""confidence_bin"":""medium""}",,v1.0
1,4,5,"[""chebi:17521""]","[""chebi:29751""]",1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0
2,6,7,"[""chebi:18381""]","[""chebi:58467""]",1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0
3,8,9,"[""chebi:33462""]","[""chebi:16215""]",1,1,Protomers,89.0,"{""confidence_bin"":""medium""}",,v1.0
4,10,11,"[""chebi:64808""]","[""chebi:64790""]",1,1,Protomers,93.0,"{""confidence_bin"":""high""}",,v1.0
...,...,...,...,...,...,...,...,...,...,...,...
3310,6697,6698,"[""chebi:17793""]","[""chebi:77992""]",1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0
3311,6699,6700,"[""chebi:136357""]","[""chebi:133821""]",1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0
3312,6701,6702,"[""chebi:49269""]","[""chebi:58803""]",1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0
3313,6703,6704,"[""chebi:17803""]","[""chebi:58277""]",1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0


In [63]:
df_merged_protomer['id1_c'] = df_merged_protomer['cluster_a_members'].apply(lambda x: norm_chebi(x[0]))
df_merged_protomer['id2_c'] = df_merged_protomer['cluster_b_members'].apply(lambda x: norm_chebi(x[0]))

df_merged_protomer['pairkey'] = df_merged_protomer.apply(
    lambda r: '__'.join(sorted([r['id1_c'], r['id2_c']])),
    axis=1
)

df_merged_protomer

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pairkey,id1_c,id2_c
0,1,2,[chebi:52504],[chebi:40617],1,1,Protomers,74.0,"{""confidence_bin"":""medium""}",,v1.0,CHEBI_40617__CHEBI_52504,CHEBI_52504,CHEBI_40617
1,4,5,[chebi:17521],[chebi:29751],1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17521__CHEBI_29751,CHEBI_17521,CHEBI_29751
2,6,7,[chebi:18381],[chebi:58467],1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_18381__CHEBI_58467,CHEBI_18381,CHEBI_58467
3,8,9,[chebi:33462],[chebi:16215],1,1,Protomers,89.0,"{""confidence_bin"":""medium""}",,v1.0,CHEBI_16215__CHEBI_33462,CHEBI_33462,CHEBI_16215
4,10,11,[chebi:64808],[chebi:64790],1,1,Protomers,93.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_64790__CHEBI_64808,CHEBI_64808,CHEBI_64790
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3310,6697,6698,[chebi:17793],[chebi:77992],1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17793__CHEBI_77992,CHEBI_17793,CHEBI_77992
3311,6699,6700,[chebi:136357],[chebi:133821],1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_133821__CHEBI_136357,CHEBI_136357,CHEBI_133821
3312,6701,6702,[chebi:49269],[chebi:58803],1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_49269__CHEBI_58803,CHEBI_49269,CHEBI_58803
3313,6703,6704,[chebi:17803],[chebi:58277],1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17803__CHEBI_58277,CHEBI_17803,CHEBI_58277


In [64]:
merged_protomer = df_merged_protomer.merge(divese_pairs_df[['pairkey']], on='pairkey', how='inner')
merged_protomer

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pairkey,id1_c,id2_c
0,1,2,[chebi:52504],[chebi:40617],1,1,Protomers,74.0,"{""confidence_bin"":""medium""}",,v1.0,CHEBI_40617__CHEBI_52504,CHEBI_52504,CHEBI_40617
1,4,5,[chebi:17521],[chebi:29751],1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17521__CHEBI_29751,CHEBI_17521,CHEBI_29751
2,6,7,[chebi:18381],[chebi:58467],1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_18381__CHEBI_58467,CHEBI_18381,CHEBI_58467
3,8,9,[chebi:33462],[chebi:16215],1,1,Protomers,89.0,"{""confidence_bin"":""medium""}",,v1.0,CHEBI_16215__CHEBI_33462,CHEBI_33462,CHEBI_16215
4,10,11,[chebi:64808],[chebi:64790],1,1,Protomers,93.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_64790__CHEBI_64808,CHEBI_64808,CHEBI_64790
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3197,6697,6698,[chebi:17793],[chebi:77992],1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17793__CHEBI_77992,CHEBI_17793,CHEBI_77992
3198,6699,6700,[chebi:136357],[chebi:133821],1,1,Protomers,95.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_133821__CHEBI_136357,CHEBI_136357,CHEBI_133821
3199,6701,6702,[chebi:49269],[chebi:58803],1,1,Protomers,96.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_49269__CHEBI_58803,CHEBI_49269,CHEBI_58803
3200,6703,6704,[chebi:17803],[chebi:58277],1,1,Protomers,91.0,"{""confidence_bin"":""high""}",,v1.0,CHEBI_17803__CHEBI_58277,CHEBI_17803,CHEBI_58277


In [67]:
merged_protomer['classification'].value_counts()

classification
Protomers       3166
Unclassified      36
Name: count, dtype: int64

In [68]:
df_fp_extra_info = merged_protomer[merged_protomer['classification'] != 'Protomers']
df_fp_extra_info

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag,pairkey,id1_c,id2_c
143,301,302,[chebi:90408],[chebi:87351],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_87351__CHEBI_90408,CHEBI_90408,CHEBI_87351
358,746,747,[chebi:57438],[chebi:15618],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_15618__CHEBI_57438,CHEBI_57438,CHEBI_15618
374,779,780,[chebi:87636],[chebi:87634],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_87634__CHEBI_87636,CHEBI_87636,CHEBI_87634
491,1026,1027,[chebi:3403],[chebi:59205],1,1,Unclassified,,"{""confidence_bin"":null}",Diastereomers must share protonation/charge; n...,v1.0,CHEBI_3403__CHEBI_59205,CHEBI_3403,CHEBI_59205
497,1038,1039,[chebi:60839],[chebi:60830],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_60830__CHEBI_60839,CHEBI_60839,CHEBI_60830
634,1326,1327,[chebi:6506],[chebi:234341],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_234341__CHEBI_6506,CHEBI_6506,CHEBI_234341
645,1348,1349,[chebi:16666],[chebi:29934],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_16666__CHEBI_29934,CHEBI_16666,CHEBI_29934
828,1737,1738,[chebi:77618],[chebi:7621],1,1,Unclassified,,"{""confidence_bin"":null}",Stereo and charge differ (complex); no classif...,v1.0,CHEBI_7621__CHEBI_77618,CHEBI_77618,CHEBI_7621
889,1863,1864,[chebi:25682],[chebi:197308],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_197308__CHEBI_25682,CHEBI_25682,CHEBI_197308
1021,2142,2143,[chebi:15827],[chebi:60267],1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0,CHEBI_15827__CHEBI_60267,CHEBI_15827,CHEBI_60267


36 rows are cases where StereoMapper cannot handle complex relationships, as described in the manuscript. Note the extra_info column.

In [69]:
# get tp, fp, fn counts
tp = len(merged_protomer[ merged_protomer['classification'] == 'Protomers' ])
fp = len(merged_protomer[ merged_protomer['classification'] != 'Protomers' ])
fn = divese_pairs_df.shape[0] - tp

print(f"TP: {tp}, FP: {fp}, FN: {fn}")

TP: 3166, FP: 36, FN: 203


False positives have been accounted for. False negatives are likely due to the structures which are clustered together by StereoMapper, resulting in missing comparisons / mappings, or the identifiers in question have no downloadable structure from chebi. The former is the most likely as these molfiles which have been validated to exist are taken from chebi. Wildcard structures have been removed so it cannot be their influence

In [70]:
 # calculate precision, recall, f1
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0 

print(f"Precision: {precision:.2%}, Recall: {recall:.2%}, F1 Score: {f1:.2%}")

Precision: 98.88%, Recall: 93.97%, F1 Score: 96.36%


In [72]:
# remove false positives which have no classification but have extra_info indicating the diff in scope
df_filtered = merged_protomer[~(
    (merged_protomer['classification'] == 'Unclassified') &
    (merged_protomer['extra_info'].notnull())
)]

# calc tp, fp, fn again
tp_v2 = len(df_filtered[ df_filtered['classification'] == 'Protomers' ])
fp_v2 = len(df_filtered[ df_filtered['classification'] != 'Protomers' ])
fn_v2 = divese_pairs_df.shape[0] - tp_v2

print(f"TP: {tp_v2}, FP: {fp_v2}, FN: {fn_v2}")

TP: 3166, FP: 0, FN: 203


In [73]:
 # calculate precision, recall, f1
precision_v2 = tp_v2 / (tp_v2 + fp_v2) if (tp_v2 + fp_v2) > 0 else 0
recall_v2 = tp_v2 / (tp_v2 + fn_v2) if (tp_v2 + fn_v2) > 0 else 0
f1_v2 = 2 * (precision_v2 * recall_v2) / (precision_v2 + recall_v2) if (precision_v2 + recall_v2) > 0 else 0

print(f"Precision: {precision_v2:.2%}, Recall: {recall_v2:.2%}, F1 Score: {f1_v2:.2%}")

Precision: 100.00%, Recall: 93.97%, F1 Score: 96.89%


## Stereo-resolution pairs

In [2]:
stereo_resolution_pairs_control = pd.read_csv('data/stereo_resolution_pairs.csv')
stereo_resolution_pairs_control

Unnamed: 0,mnxparent_label,parent_label,mnxchild_label,child_label
0,MNXM100051,chebi:187719,MNXM100344,chebi:188242
1,MNXM10010,chebi:28254,MNXM1380605,chebi:198196
2,MNXM10026,chebi:86041,MNXM1104681,chebi:62616
3,MNXM10026,chebi:86041,MNXM9497,chebi:74272
4,MNXM10039,chebi:50168,MNXM36497,chebi:50170
...,...,...,...,...
5068,MNXM9879,chebi:173438,MNXM1409533,chebi:228257
5069,MNXM9987,chebi:27389,MNXM162802,chebi:57731
5070,MNXM9987,chebi:27389,MNXM732376,chebi:58655
5071,MNXM99960,chebi:186405,MNXM101734,chebi:195754


In [3]:
stereo_resolution_pairs = Path('/home/jackmcgoldrick/Downloads/benchmarking_data/parent_child_benchmark_data')
results = Path('stereo_resolution_benchmarking_results.sqlite')
cache = Path('stereo_resolution_benchmarking_cache.sqlite')

In [32]:
cmd =[
    "stereomapper",
    "run",
    "-d", stereo_resolution_pairs.as_posix(),
    "-o", results.as_posix(),
    "-p", cache.as_posix(),
    "--fresh-cache"
]

subprocess.run(cmd) 

INFO    Logging initialised. File: logs/stereomapper_20251211_105747.log
Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [01:19<00:00]


âœ… Pipeline completed in 79.7s
ðŸ“¦ Inputs attempted: 7,693 (skipped 0)
ðŸ“Š Successes: 7,693 | Failures: 0
ðŸ”— Inchikey groups â€” processed 2,541, skipped 0, failed 0
ðŸ§® Relationship rows: 15,272
ðŸ§¾ Unique inchikeys observed: 2,541
ðŸ’¾ Cache hit rate: 0.0%


Pipeline: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| , Complete! [01:20<00:00]


CompletedProcess(args=['stereomapper', 'run', '-d', '/home/jackmcgoldrick/Downloads/benchmarking_data/parent_child_benchmark_data', '-o', 'stereo_resolution_benchmarking_results.sqlite', '-p', 'stereo_resolution_benchmarking_cache.sqlite', '--fresh-cache'], returncode=0)

In [4]:
conn = sqlite3.connect(results)
merged_df_q = """ 
SELECT * from relationships;
"""
df_merged_stereo = pd.read_sql_query(merged_df_q, conn)
df_merged_stereo

Unnamed: 0,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
0,1,2,"[""chebi:184417""]","[""chebi:81326""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
1,1,3,"[""chebi:184417""]","[""chebi:81325""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
2,2,3,"[""chebi:81326""]","[""chebi:81325""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
3,4,5,"[""chebi:20604""]","[""chebi:40617""]",1,1,Unclassified,,"{""confidence_bin"":null}",Parent-child stereochemical relationships must...,v1.0
4,4,6,"[""chebi:20604""]","[""chebi:52505""]",1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
...,...,...,...,...,...,...,...,...,...,...,...
15267,7686,7689,"[""chebi:138363""]","[""chebi:138362""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
15268,7687,7688,"[""chebi:138597""]","[""chebi:138596""]",1,1,Enantiomers,100.0,"{""confidence_bin"":""high""}",,v1.0
15269,7687,7689,"[""chebi:138597""]","[""chebi:138362""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
15270,7688,7689,"[""chebi:138596""]","[""chebi:138362""]",1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0


In [5]:
df_merged_stereo['cluster_a_members'] = df_merged_stereo['cluster_a_members'].apply(json.loads)
df_merged_stereo['cluster_b_members'] = df_merged_stereo['cluster_b_members'].apply(json.loads)

In [6]:
# now the data fromats match, create unique keys on both dataframes to allow for cross-referencing
stereo_resolution_pairs_control['pair_key'] = stereo_resolution_pairs_control.apply(lambda row: frozenset([row['parent_label'], row['child_label']]), axis=1)
df_merged_stereo['pair_key'] = df_merged_stereo.apply(lambda row: frozenset(row['cluster_a_members'] + row['cluster_b_members']), axis=1)

# now we can cross-reference the results back to the original set of parent-child pairs
merged_df = pd.merge(stereo_resolution_pairs_control, df_merged_stereo, on='pair_key', how='inner', suffixes=('_parent_child', '_results'))
print(f"Number of matched parent-child pairs in results: {merged_df.shape[0]}")

Number of matched parent-child pairs in results: 5029


In [7]:
frac_found = merged_df.shape[0] / stereo_resolution_pairs_control.shape[0]
print(f"Fraction of parent-child pairs found in results: {frac_found:.2%}")

Fraction of parent-child pairs found in results: 99.13%


In [8]:
merged_df['classification'].value_counts()

classification
Stereo-resolution pairs    4332
Unclassified                695
Diastereomers                 2
Name: count, dtype: int64

Vast majority are as expected.

In [9]:
df_miss_dia = merged_df[merged_df['classification'] == 'Diastereomers']
df_miss_dia

Unnamed: 0,mnxparent_label,parent_label,mnxchild_label,child_label,pair_key,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
2129,MNXM1374129,chebi:182512,MNXM734257,chebi:31053,"(chebi:31053, chebi:182512)",7233,7234,[chebi:182512],[chebi:31053],1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0
2220,MNXM1376176,chebi:188778,MNXM1364249,chebi:32903,"(chebi:32903, chebi:188778)",5466,5467,[chebi:32903],[chebi:188778],1,1,Diastereomers,100.0,"{""confidence_bin"":""high""}",,v1.0


In the case of (chebi:182512, chebi:31053) and (chebi:188778, chebi:32903) these are in fact diastereomers. Unclassified cases again represent cases where the pipeline cannot handle complex relationships.

In [10]:
 # get the stats, false positives, false negatives, true positives
tp = merged_df[merged_df['classification'] == 'Stereo-resolution pairs']
fp = merged_df[merged_df['classification'] != 'Stereo-resolution pairs']
fn = stereo_resolution_pairs_control[~stereo_resolution_pairs_control['pair_key'].isin(merged_df['pair_key'])]

# convert to integers
tp = int(tp.shape[0])
fp = int(fp.shape[0])
fn = int(fn.shape[0])

print(f"True Positives: {tp}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")

True Positives: 4332
False Positives: 697
False Negatives: 44


In the cases of false negatives, two likely causes exist:

(1) The identifier in question has no structural representation on the chebi website / database 

(2) stereomapper correctly / incorrectly clusters the structures in the pair together in the identity step, so no relationship can be assigned to them. 
 
 These are the likely causes for the false negatives, as we removed wildcards in a previous step before entry to avoid their influence in this phase.
 
-----
In terms of false positives, a few possible cases exist:

(1) The relationship between the structures in question do not fit into any classifications defined by the pipeline e.g., a case where structures A and B have different stereochemistry and protonation states.


(2) Misclassifications by MetaNetX / ChEBI , where stereomapper corrects them (in the cases mentioned previously)

(3) Cases where the pipeline assigns the incorrect relationship.


In [11]:
actual_fp = fp - 2 # remove the 2 diastereomer cases where stereomapper is correct  

In [12]:
# now lets calculate precision, recall and F1 score
## note this is the case where we count all "No classification" as false positives
precision = tp / (tp + actual_fp) if (tp + actual_fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"Precision: {precision:.2%}")
print(f"Recall: {recall:.2%}")
print(f"F1 Score: {f1_score:.2%}")

Precision: 86.17%
Recall: 98.99%
F1 Score: 92.14%


These are the initial set of results, not removing the false positives which we conclude to be due to limitations in the pipeline. Now lets confirm our suspicisions that these are due to not being able to fit into our defined classes due to charge differences. Then recompute stats above!As such, even with inclusion of these false positives, the pipeline performs quite well in these cases.

In [13]:
# now let's remove those false positives we know are due to complex classification cases
# use extra_info column: drop rows whose extra_info mentions 'charge' or 'Radioactivity'
# cast to str to avoid issues with None / lists and use a regex to match either term
mask = merged_df['extra_info'].astype(str).str.contains(r'charge|Radioactivity', case=False, na=False)
df_filtered_no_class = merged_df[~mask]
df_filtered_no_class

Unnamed: 0,mnxparent_label,parent_label,mnxchild_label,child_label,pair_key,cluster_a,cluster_b,cluster_a_members,cluster_b_members,cluster_a_size,cluster_b_size,classification,score,score_details,extra_info,version_tag
0,MNXM100051,chebi:187719,MNXM100344,chebi:188242,"(chebi:187719, chebi:188242)",6902,6903,[chebi:187719],[chebi:188242],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
1,MNXM10010,chebi:28254,MNXM1380605,chebi:198196,"(chebi:198196, chebi:28254)",6795,6796,[chebi:28254],[chebi:198196],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
2,MNXM10026,chebi:86041,MNXM1104681,chebi:62616,"(chebi:86041, chebi:62616)",2115,2117,[chebi:86041],[chebi:62616],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
3,MNXM10026,chebi:86041,MNXM9497,chebi:74272,"(chebi:74272, chebi:86041)",2115,2116,[chebi:86041],[chebi:74272],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
4,MNXM10039,chebi:50168,MNXM36497,chebi:50170,"(chebi:50168, chebi:50170)",253,257,[chebi:50168],[chebi:50170],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5021,MNXM9859,chebi:48946,MNXM1107135,chebi:145932,"(chebi:145932, chebi:48946)",6518,6520,[chebi:48946],[chebi:145932],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0
5025,MNXM9987,chebi:27389,MNXM162802,chebi:57731,"(chebi:27389, chebi:57731)",4825,4827,[chebi:27389],[chebi:57731],1,1,Stereo-resolution pairs,84.0,"{""confidence_bin"":""medium""}",,v1.0
5026,MNXM9987,chebi:27389,MNXM732376,chebi:58655,"(chebi:27389, chebi:58655)",4825,4826,[chebi:27389],[chebi:58655],1,1,Stereo-resolution pairs,84.0,"{""confidence_bin"":""medium""}",,v1.0
5027,MNXM99960,chebi:186405,MNXM101734,chebi:195754,"(chebi:186405, chebi:195754)",1058,1059,[chebi:186405],[chebi:195754],1,1,Stereo-resolution pairs,100.0,"{""confidence_bin"":""high""}",,v1.0


 Vast majority of unclassified cases due to complex / ambiguous cases associated with charge diffs that aren't handled in StereoMapper.Remaining are due to differences in isotopically labelled species, which are also not handled.

In [14]:
df_filtered_no_class['classification'].value_counts() 

classification
Stereo-resolution pairs    4332
Diastereomers                 2
Name: count, dtype: int64

In [17]:
 # get the stats, false positives, false negatives, true positives
tp_v2 = df_filtered_no_class[df_filtered_no_class['classification'] == 'Stereo-resolution pairs']
fp_v2 = df_filtered_no_class[df_filtered_no_class['classification'] != 'Stereo-resolution pairs']
fn_v2 = stereo_resolution_pairs_control[~stereo_resolution_pairs_control['pair_key'].isin(merged_df['pair_key'])] # still use merged df to find fns

# convert to integers
tp_v2 = int(tp_v2.shape[0])
fp_v2 = int(fp_v2.shape[0])
fn_v2 = int(fn_v2.shape[0])

# take away two diastereomer cases from false positives we know to be correct by molD
fp_v2 -= 2

print(f"True Positives: {tp_v2}")
print(f"False Positives: {fp_v2}")
print(f"False Negatives: {fn_v2}")

True Positives: 4332
False Positives: 0
False Negatives: 44


In [18]:
 # now lets calculate precision, recall and F1 score
## note this is the case where we exclude "No classification" from false positives
precision_v2 = tp_v2 / (tp_v2 + fp_v2) if (tp_v2 + fp_v2) > 0 else 0
recall_v2 = tp_v2 / (tp_v2 + fn_v2) if (tp_v2 + fn_v2) > 0 else 0
f1_score_v2 = 2 * (precision_v2 * recall_v2) / (precision_v2 + recall_v2) if (precision_v2 + recall_v2) > 0 else 0

print(f"Precision: {precision_v2:.2%}")
print(f"Recall: {recall_v2:.2%}")
print(f"F1 Score: {f1_score_v2:.2%}")

Precision: 100.00%
Recall: 98.99%
F1 Score: 99.49%


## Generate Figure