# 🔬 Activity Cliff Calculator

**A Python tool to identify and analyze activity cliffs** — pairs of structurally similar compounds that show large potency differences.

---

## 🔍 Overview

`Activity Cliff Calculator` processes a dataset of bioactive molecules and automatically detects *activity cliffs* based on:
- Differences in biological activity (**ΔpIC50**)  
- Chemical similarity (**Tanimoto coefficient**)

The program uses modern **RDKit** fingerprint generators (Morgan, Feature Morgan, MACCS, RDK, AtomPair, Torsion, Pattern) to compute molecular similarity, and provides both **interactive visualization** and **Excel export** options.

---

The program needs pandas, numpy, rdkit, xlrd, openpyxl, nbformat and plotly.express libraries

Example on an input file https://www.mdpi.com/article/10.3390/ijms23010259/s1 from Macip G, Garcia-Segura P, Mestres-Truyol J, Saldivar-Espinoza B, Pujadas G, Garcia-Vallvé S. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet? Int J Mol Sci. 2021 Dec 27;23(1):259. doi: 10.3390/ijms23010259

In [1]:
from activity_cliffs_utils import (
    fp_as_bitvect, generate_pairs,
    export_activity_cliffs_to_excel, show_molecule_table, mol_to_image_bytes, smiles_to_svg
)
from rdkit import Chem
import pandas as pd
import plotly.express as px

Compute activity Cliffs (pairs or groups of structurally similar compounds that are active against the same target but have large differences in potency) from an xls file containing a set of molecules. The columns "standardize_smiles", "PubMedID" and "pIC50" are mandatory. 
The program uses rdkit to compute Morgan fingerprints of each molecule. When the PubMedID is the same, for each pair of molecules a disparity value is calculated as disparity = pIC50_diff / (1 - tanimoto).

The program needs pandas, numpy, rdkit, xlrd, openpyxl, nbformat and plotly.express libraries

Example on an input file https://www.mdpi.com/article/10.3390/ijms23010259/s1 from Macip G, Garcia-Segura P, Mestres-Truyol J, Saldivar-Espinoza B, Pujadas G, Garcia-Vallvé S. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet? Int J Mol Sci. 2021 Dec 27;23(1):259. doi: 10.3390/ijms23010259

In [2]:
# ---------- Main execution ----------

# Define which fingerprints you want to calculate
fp_types = ["morgan", "feature_morgan", "rdk", "maccs", "pattern", "atompair", "torsion"]
N_BITS = 2048
RADIUS = 2

# Read input data
df = pd.read_excel("M-pro_Inhibitors.xls")

# Compute fingerprints for all types
for fp_type in fp_types:
    col_name = f"fp_{fp_type}"
    print(f"Computing {fp_type} fingerprints...")
    df[col_name] = df["standardize_smiles"].apply(
        lambda x: fp_as_bitvect(Chem.MolFromSmiles(x), fp_type=fp_type, n_bits=N_BITS, radius=RADIUS)
        if Chem.MolFromSmiles(x) else None
    )

# Generate pairs and calculate disparities
fp_col= "feature_morgan" # Choose between "morgan", "feature_morgan", "rdk", "maccs", "pattern", "atompair", "torsion"
fp_col = f"fp_{fp_col}"
result_df = generate_pairs(df, group_col="PubMedID", fp_col=fp_col)
result_df = result_df.sort_values(by="Disparity", ascending=False)

# Create the interactive chart
fig = px.scatter(
    result_df,
    x='Tanimoto',
    y='pIC50_diff',
    title='Activity difference vs Tanimoto Similarity',
    labels={'Tanimoto': f'Tanimoto Similarity ({fp_col})', 'pIC50_diff': 'Activity difference (ΔpIC50)'},
    hover_data=['Compound1', 'Compound2', 'Disparity']
)
# Adjust the chart dimensions
fig.update_layout(
    height=500  # You can change this value to adjust the height
)
# Show the graph
fig.show()

Computing morgan fingerprints...
Computing feature_morgan fingerprints...
Computing rdk fingerprints...
Computing maccs fingerprints...
Computing pattern fingerprints...
Computing atompair fingerprints...
Computing torsion fingerprints...


In [3]:
# ---------- Filtering section ----------
tanimoto_min = 0.8          # set to None if not used
disparity_min = None         # set to None if not used
activity_diff_min = 1.5    # set to None if not used

# Image export options
image_size_excel = 120
image_size_preview = (150, 150)


# Build dynamic filter mask
mask = pd.Series(True, index=result_df.index)

if tanimoto_min is not None:
    mask &= result_df['Tanimoto'] >= tanimoto_min

if disparity_min is not None:
    mask &= result_df['Disparity'] >= disparity_min

if activity_diff_min is not None:
    mask &= result_df['pIC50_diff'] >= activity_diff_min

filtered_df = result_df[mask].copy()
print(f"✅ Pairs remaining after filtering: {len(filtered_df):,}")

# Generate PNG images (for Excel export)
filtered_df["Mol1_img"] = filtered_df["SMILES1"].apply(
    lambda s: mol_to_image_bytes(s, image_size=(image_size_excel, image_size_excel))
)
filtered_df["Mol2_img"] = filtered_df["SMILES2"].apply(
    lambda s: mol_to_image_bytes(s, image_size=(image_size_excel, image_size_excel))
)

# Generate inline SVG images (for notebook or Visual Studio Code display)
filtered_df["Mol1_svg"] = filtered_df["SMILES1"].apply(
    lambda s: smiles_to_svg(s, size=image_size_preview)
)
filtered_df["Mol2_svg"] = filtered_df["SMILES2"].apply(
    lambda s: smiles_to_svg(s, size=image_size_preview)
)

# ---------- Excel export ----------
output_xlsx = f"disparity_results_feature_morgan_filtered.xlsx"
export_activity_cliffs_to_excel(filtered_df, output_xlsx,image_size=image_size_excel)

# ----------Display Preview Table ----------
show_molecule_table(filtered_df, max_rows=20, img_size=image_size_preview)


✅ Pairs remaining after filtering: 19
✅ Excel file saved with molecule images (120px): disparity_results_feature_morgan_filtered.xlsx
🧪 Displaying all 19 rows with molecule drawings.


PubMedID,Mol1_svg,Compound1,pIC50_1,Mol2_svg,Compound2,pIC50_2,pIC50_diff,Tanimoto,Fingerprint,Disparity
34198327,,Z-AVLD-FMK,9.045757,,Z-ASAVLD-FMK,6.585027,2.460731,0.895833,feature_morgan,23.623016
32798789,,184904-82-3,7.346787,,2488719-74-8,4.603801,2.742987,0.878788,feature_morgan,22.629641
34347470,,2694063-54-0,5.069051,,2694063-56-2,6.924453,1.855402,0.914894,feature_morgan,21.800974
34347470,,2694063-54-0,5.069051,,2694063-57-3,6.754487,1.685436,0.914894,feature_morgan,19.803877
34242027,,2683066-40-0,7.197226,,2683066-42-2,9.0,1.802774,0.887097,feature_morgan,15.967424
32798789,,184904-82-3,7.346787,,2488719-75-9,4.406714,2.940074,0.8125,feature_morgan,15.680392
33891389,,2596275-64-6,4.454693,,2596275-66-8,6.7,2.245307,0.853659,feature_morgan,15.342932
34347470,,2694063-36-8,4.724228,,2694063-37-9,6.356547,1.632319,0.875,feature_morgan,13.058554
33655614,,Gü3608,5.632644,,Gü3619,7.422508,1.789864,0.851064,feature_morgan,12.017659
33655614,,1025906-65-3,4.66354,,PZB10620019,6.643974,1.980434,0.825,feature_morgan,11.316765
