# WP3 — Kernel-based Classification (SVM)

This notebook implements kernel inner products on precomputed hashed feature sets and runs
SVM classification for DRF–WL and ITS–WL across different feature types (vertex/edge/shortest-path),
dataset sizes, numbers of classes, and train/test splits.

In [1]:
from wp3_functions import (
    load_precomputed_features
)

import pandas as pd
from vis_utils import plot_drf_from_counters_rsmi
import plotly.io as pio
from synkit.IO import rsmi_to_graph
pio.renderers.default = "vscode"

## 1) Paths to precomputed feature directories

We load precomputed feature representations (stored as `.pkl`) for:
- DRF–WL: reactant/product difference features
- ITS–WL: features from the ITS reaction graph

Each representation is available for three feature modes: vertex, edge, shortest-path.

### Load DRF–WL Features
Load precomputed DRF–WL feature sets and reaction class labels for kernel-based classification.

In [2]:
from wp3_functions import load_precomputed_features

in_dirs = [
    "drf/precomputed_drf_edge",
    "drf/precomputed_drf_vertex",
    "drf/precomputed_drf_sp",
]

for dir in in_dirs:
    X_drf, y_drf = load_precomputed_features(
        dir,
        feature_key="drf_wl"
    )
    print(f"\nLoaded DRF features from {dir}")
    print("Number of reactions:", len(X_drf))
    print("Number of classes:", len(set(y_drf)))


Loaded DRF features from drf/precomputed_drf_edge
Number of reactions: 50000
Number of classes: 50

Loaded DRF features from drf/precomputed_drf_vertex
Number of reactions: 50000
Number of classes: 50

Loaded DRF features from drf/precomputed_drf_sp
Number of reactions: 50000
Number of classes: 50


### Load ITS–WL Features
Load precomputed ITS–WL feature sets and reaction class labels derived from the ITS graph.

In [3]:
in_dirs = [
    "its/precomputed_its_edge",
    "its/precomputed_its_vertex",
    "its/precomputed_its_sp",
]

for dir in in_dirs:
    X_its, y_its = load_precomputed_features(
        dir,
        feature_key="its_wl"
    )
    print(f"\nLoaded ITS features from {dir}")
    print("Number of reactions:", len(X_its))
    print("Number of classes:", len(set(y_its)))


Loaded ITS features from its/precomputed_its_edge
Number of reactions: 50000
Number of classes: 50

Loaded ITS features from its/precomputed_its_vertex
Number of reactions: 50000
Number of classes: 50

Loaded ITS features from its/precomputed_its_sp
Number of reactions: 50000
Number of classes: 50


The output confirms that all precomputed DRF–WL feature representations
(edge, vertex, and shortest-path) were loaded successfully. Each representation
contains the full dataset of 50,000 reactions across 50 reaction classes,
providing a consistent basis for kernel computation and classification.

## 2) Kernel inner product on hash sets

The lab definition reduces all kernels to counting common elements of two hashed feature sets.
Given two reactions with feature hash sets \(S_G, S_H\), the kernel is:
\[
k(G,H) = |S_G \cap S_H|
\]

Our precomputed features are stored as Counters. For the required hashset kernel, we use the Counter keys.