# Classification using FGW

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from fgw.custom_svc import Graph_FGW_SVC_Classifier
# import os,sys
# sys.path.append(os.path.realpath('../lib'))
from fgw.data_loader import load_local_data
from sklearn.model_selection import train_test_split


Simple training example using FGW on the mutag dataset

In [2]:
dataset_n='mutag'

We load the Mutag dataset, using the "wl" option that computes the Weisfeler-Lehman features for each nodes as shown is the notebook wl_labeling.ipynb

In [3]:
path='../data/'
X,y=load_local_data(path,dataset_n,wl=2)

We create a SVM-like classifier on the precomputed matrix $K=e^{-\gamma*FGW}$.
To compute FGW we use the shortest_path distance for the structure matrices of each graph, and the so-called 'hamming'distance between their features. It is defined as 

$$d(a_{i},b_{j})=\sum_{k=0}^{wl} \delta(\tau(a_{i}^{k}),\tau(b_{j}^{k}))$$

where $\delta(x,y)=1$ if $x\neq y$ else $\delta(x,y)=0$ and $\tau(a_{i}^{k})$ denotes the concatenated label at iteration $k$ in the Weisfeler-Lehman process.

In [4]:
graph_svc=Graph_FGW_SVC_Classifier(C=1,gamma=1,alpha=1, method='cttil',features_metric='hamming_dist')

In [5]:
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.33, random_state=42)

In [6]:
%%time
graph_svc.fit(X_train,y_train)

CPU times: user 21.9 s, sys: 2.06 ms, total: 21.9 s
Wall time: 21.9 s


In [7]:
%%time
preds=graph_svc.predict(X_test)

CPU times: user 22.9 s, sys: 0 ns, total: 22.9 s
Wall time: 22.9 s


In [8]:
np.sum(preds==y_test) / len(y_test)

0.7619047619047619