[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/FastSK/blob/master/docs/2demo/fastDemo.ipynb)
 
[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/FastSK/blob/master/docs/2demo/fastDemo.ipynb)

# FastSK Demo

Here is a quick tutorial on how to use the methods in FastSK package.


## Using the main FastSK Class


#### fastsk.FastSK( *int* g, *int* m, *int* t=-1, *bool* *approx*=False, *double* *delta*=0.025, *int* max_iters=-1 *bool* skip_variance=False)

Constructor of the FastSK class. This creates a FastSK object with the specified parameters.

*g*: Required. The overall sequence feature length. FastSK will extract length-g contiguous features (or g-mers) from each training and test sequence. 

*m*: Required. The number of mismatch positions to insert into each of the g-mers.

*t*: Optional. The number of threads to use to compute the kernel matrix.

*approx* Optional. Whether to use the FastSK approximation algorithm.

*delta* Optional. The delta parameter to use for the approximation algorithm. Controls how quickly the algorithm converges.

*int* Optional. The maximum number of iterations of the approximation algorithm to use.

*skip_variance* Optional. If *max_iters* is set, the *skip_variance* flag tells FastSK to iterate up to *max_iters* without performing variance computations when running.


#### FastSK.compute_kernel(Xtrain, Xtest)

*Xtrain*, *Xtest*: Required. The training and test sequences. These can be either:
1. The paths to the FASTA files containing the sequences.
2. The sequences in numerical form.

For example, using paths to the data files as arguments:"


Alternatively, using the processed sequences in numerical form (e.g., numpy arrays) as inputs. This method is useful if you have a different dataset format; you can simply read in the sequences using a custom method to convert them a numerical form and then use FastSK. 


For example:"
     

In [None]:
from fastsk import FastSK
fastsk = FastSK(g=3, m=2, t=4, approx=True, max_iters=50)

# Compute the kernel matrix
fastsk.compute_kernel(Xtrain="1.1.train.fasta", Xtest="1.1.test.fasta")

# Train an SVM

fastsk.fit(C=1.0, kernel_type='linear')

# Score
fastsk.score(metric='auc')

In [None]:
from fastsk import FastSK

Xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
Xtest = [[1,0,1,1,1], [1,0,0,1], [0,0,1,0], [1,1,1,0,1]]

fastsk = FastSK(g=3, m=2)
fastsk.compute_kernel(Xtrain, Xtest)


#### FastSK.score(metric=\"auc\")


In [None]:
from fastsk import FastSK
kernel = FastSK(g=3, m=2, C=0.7)

kernel.compute_kernel("1.1.train.fasta", "1.1.test.fasta")
kernel.fit()
kernel.score(metric='auc')


#### FastSK.train_kernel()

#### FastSK.test_kernel()

*train_kernel()* returns the training portion of the kernel matrix.
*test_kernel()* returns the testing portion of the kernel matrix.

"For example, given the set up below.."


In [None]:
from fastsk import FastSK

kernel= FastSK(g=3, m=2)

Xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
Xtest = [[1,1,1,1,1], [1,0,1,0,1]]

kernel.compute_kernel(Xtrain, Xtest)

train_kernel = kernel.get_train_kernel()
test_kernel = kernel.get_test_kernel()


#### Kernel.save_kernel(*string* kernel_file)

This method takes a filename string as the destination to write the kernel to.


In [None]:
from fastsk import FastSK

kernel= FastSK(g=3, m=2)

xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
xtest = [[1,1,1,1,1], [1,0,1,0,1]]

kernel.compute_kernel(xtrain, xtest)\n",
kernel.save_kernel('output.txt')"
