## Spectrum String Kernel basic usage

Install with pip:

In [1]:
# ! pip install strkernels

! python ../strkernels/compile_core.py

from sys import path
path.append('..')

Compiling str_kernel.c...
Compiling locality_improved_sk.c...
Compiling str_kernel_matrix.c...
Compiling sqrt_diag_normalizer.c...
Compiling subsequence_sk.c...
Compiling fixed_degree_sk.c...
Compiling spectrum_sk.c...
Compiling normalizer.c...
Compiling weighted_degree_sk.c...
Linking object files...
Compilation completed successfully!


Import and create a kernel:

In [2]:
from strkernels import SpectrumStringKernel
spectrum_kernel = SpectrumStringKernel(order=3)

Example data:

In [3]:
import numpy as np
strings = np.array(["ATCG", "ATGG", "TACG", "GCTA"])
y = np.array([-1, -1, 1, 1])

Compute the kernel matrix:

In [4]:
kernel_matrix = spectrum_kernel(strings, strings)

print(kernel_matrix)

[[1.         0.50251891 0.55555556 0.44444444]
 [0.50251891 1.         0.40201513 0.40201513]
 [0.55555556 0.40201513 1.         0.55555556]
 [0.44444444 0.40201513 0.55555556 1.        ]]


Or use the kernel object with Scikit-learn:

In [5]:
from sklearn.svm import SVC
clf = SVC(kernel=spectrum_kernel)

# train the classifier
clf.fit(strings, y)

# make predictions using the classifier
predictions = clf.predict(strings)

print(predictions)

[-1 -1  1  1]


Example with two sets of strings:

In [6]:
strings1 = np.array(["ATCG", "ATGG"])
strings2 = np.array(["ATCG", "GGGG", "CCCC"])

kernel_matrix = spectrum_kernel(strings1, strings2)

print(kernel_matrix)

[[1.         0.24759378 0.24759378]
 [0.50251891 0.61588176 0.        ]]
