## **FastSK Demo**
Here is a quick tutorial on how to use the methods in FastSK package. Please make sure you have fastsk pip install correctly before trying the demo. 

### **SVM Class**


#### **fastsk.SVM( *int* g, *int* m, *double* C=1.0, *double* *nu*=0.5, *double* *eps*=0.001, *string* kernel='linear')**


Constructor of SVM class. This creates a SVM object with the specified parameters. <br>

*g*: Required. Set gamma parameter in the kernel function<br>
*m*: Required. The number of mismatched positions.<br>
*C*: Optional. Optimization parameter. *Default: 1.0*<br>
*nu*: Optional. The parameter nu of nu-SVC, one-class SVM, and nu-SVR (*from LIBSVM*). *Default: 0.5*<br>
*eps*: Optional.  LIBSVM epsilon parameter. The tolerance of terminating criterion. *Default: 0.001*<br>
*kernel*: Optional. The kernel to use for SVM. *Default: 'linear'*<br>

#### **SVM.fit(*string* train_file, *string* test_file, *dict* dict="", *boolean* quite=False, *string* kernel_file="")**

*train_file*, *test_file*: Required. The file that contains the sequence strings for training and testing respectively, separated by ">"+*label_int*<br>
*dict*: Optional. *Default: empty string.*<br>
*quiet*: Optional. Setting to True would disable all the intermediate output during training. *Default: False.*<br>
*kernel_file*: Optional. The filename to write the kernel in. *Default: empty string.*

In [1]:
from fastsk import FastSK
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
import numpy as np

from fastsk import SVM
svm = SVM(g=3, m=2, C=0.7)

svm.fit(train_file="1.1.train.fasta", test_file="1.1.test.fasta", quiet=False, kernel_file="output.txt")
svm.predict('predictions.txt')

ImportError: cannot import name 'SVM' from 'fastsk' (unknown location)

#### **SVM.fit_from_array([*string*] Xtrain, [*int*] Ytrain, [*string*] Xtest, [*int*] Ytrain, *string* kernel_file)**

*Xtrain*, *Xtest*: Required. List of string sequences for training and testing.<br>
*Ytrain*, *Ytest*: Required. List of corresponding labels for training and testing.<br>
*kernel_file*: Optional. File destination to write the kernel to.

In [0]:
xtrain = ["ACACA", "AAACA"]
ytrain = [1, 0]
xtest = ["AAAAA", "ACACA"]
ytest = [1, 0]

svm.fit_from_arrays(xtrain, ytrain, xtest, ytest, "kernel.txt")

Message in terminal:
<pre><code>Dictionary: AC
Size: 3 (+1 for unknown character)
n_str = 2
Dictionary: AC
Size: 3 (+1 for unknown character)
n_str = 2
g = 3, k = 1, 12 features
Computing approximate kernel...
Computing 3 mismatch profiles using 3 threads...
Thread 0 finished in 1 iterations...
Thread 2 finished in 1 iterations...
Thread 1 finished in 1 iterations...
Writing kernel to kernel.txt...
*
optimization finished, #iter = 1
nu = 1.000000
obj = -1.393911, rho = 0.000000
nSV = 2, nBSV = 2
Total nSV = 2
*
optimization finished, #iter = 1
nu = 1.000000
obj = -1.393911, rho = 0.000000
nSV = 2, nBSV = 2
Total nSV = 2
*
optimization finished, #iter = 1
nu = 1.000000
obj = -1.393911, rho = 0.000000
nSV = 2, nBSV = 2
Total nSV = 2
*
optimization finished, #iter = 1
nu = 1.000000
obj = -1.393911, rho = 0.000000
nSV = 2, nBSV = 2
Total nSV = 2
</code></pre>



#### **SVM.fit_numerical(Xtrain, Ytrain, Xtest, Ytrain, kernel_file="")**

*Xtrain, Ytrain*: Required. List of list of ints.<br>
*Xtest, Ytest*: Required. List of corresponding int label to the int sequences in Xtrain, Ytrain.<br>
*kernel_file*: The file destination to write kernel matrix to.

In [0]:
xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
ytrain = [1, 0]
xtest = [[1,1,1,1,1], [1,0,1,0,1]]
ytest = [1, 0]

svm.fit_numerical(xtrain, ytrain, xtest, ytest, "kernel.txt")
svm.predict("preds.txt")

Message in terminal:
<pre><code>dictionarySize = 2
g = 3, k = 1, 12 features
Computing exact kernel...
Computing 3 mismatch profiles using 3 threads...
Thread 1 finished in 1 iterations...
Thread 2 finished in 1 iterations...
Thread 0 finished in 1 iterations...
Writing kernel to kernel.txt...
</code></pre>

#### **SVM.cv([[*int*]] X, [[*int*]] Y, *int* num_folds=7)**

This trains on X,Y with cross validation.

*X*: Required. Sequences list.<br>
*Y*: Required. Label list.<br>
*num_folds*: Optional. Set the number of holds in k-fold cross validation. Default: 7.

In [0]:
from fastsk import SVM

svm = SVM(g=3, m=2, C=0.7)
xtrain = [[1,0,1,0,1], [1,1,1,0,1], [1,0,1,1,1], [1,0,0,1], [0,0,1,0], [1,1,1,0,1], [1,0,1,1,0], [1,0,1,0,0], [1,0,1]]
ytrain = [1, 0, 0, 1, 0, 0, 0, 1] 

svm.cv(xtrain, ytrain, num_folds=8)

Output in terminal is as follows.
<pre><code>dictionarySize = 2
g = 3, k = 1, 23 features
Computing exact kernel...
Computing 3 mismatch profiles using 3 threads...
Thread 1 finished in 1 iterations...
Thread 0 finished in 1 iterations...
Thread 2 finished in 1 iterations...
subprob.l = 8
*
optimization finished, #iter = 6
nu = 0.857143
obj = -3.931428, rho = 1.637066
nSV = 6, nBSV = 6
Total nSV = 6
*
optimization finished, #iter = 2
...
</code></pre>

#### **SVM.predict(*string* predictions_file)**

*predictions_file*: Required. File destination to write the predicted labels of test_file in.<br>

Below is the same example from SVM.fit_numerical().

In [0]:
from fastsk import SVM

xtrain = ["ACACA", "AAACA"]
ytrain = [1, 0]
xtest = ["AAAAA", "ACACA"]
ytest = [1, 0]

svm = SVM(g=3, m=2, C=0.7)
svm.fit_from_arrays(xtrain, ytrain, xtest, ytest, "kernel.txt")
svm.predict("preds.txt")

In *preds.txt*,
<pre><code>1
0
</code></pre>

Message in terminal:
<pre><code>Predicting labels for 2 sequences...
Test kernel constructed...
num_sv = 2
Num sequences: 2
Num positive: 1, Num negative: 1
TPR: 1.000000
TNR: 1.000000
FNR: 0.000000
FPR: 0.000000

Accuracy: 0.500000
AUROC: 1.000000
</code></pre>

#### **SVM.score(metrics="accuracy")**


In [0]:
from fastsk import SVM
svm = SVM(g=3, m=2, C=0.7)

svm.fit(train_file="1.1.train.fasta", test_file="1.1.test.fasta", quiet=True, kernel_file="output.txt")
svm.score('accuracy')

### **Kernel Class**

#### **fastsk.Kernel(g, m, t=-1, approx=false, delta=0.025, max_iters, skip_variance=false)**

Constructor of the Kernel class. <br>

*g*: Required. Set gamma parameter in the kernel function<br>
*m*: Required. Number of mismatched positions.<br>
*t*: Optional. Tolerance of the stopping criteria. *Default: -1*<br>
*approx*: Optional. *Default: False*<br>
*max_iters*: Optional. Maximum number of iteration allowed. Set as -1 for no limit. *Default: -1*<br>
*skip_variance*: Optional. *Default: false*<br>

#### **Kernel.compute(Xtrian, Xtest)**
This computes the kernel matrix based on the given training set *Xtrain* and test set *Xtest*.

In [0]:
from fastsk import Kernel

kernel= Kernel(g=3, m=2)

xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
xtest = [[1,1,1,1,1], [1,0,1,0,1]]

kernel.compute(xtrain, xtest)

Output in terminal:
<pre><code>shortest train sequence: 5, shortest test sequence: 5
0,1,
dictionarySize = 2
g = 3, k = 1, 12 features
Computing exact kernel...
Computing 3 mismatch profiles using 3 threads...
Thread 0 finished in 1 iterations...
Thread 2 finished in 1 iterations...
Thread 1 finished in 1 iterations...
</code></pre>

#### **Kernel.compute_train(Xtrian)**
This computes the training kernel matrix based on *Xtrain*.

In [0]:
from fastsk import Kernel

kernel= Kernel(g=3, m=2)

xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
xtest = [[1,1,1,1,1], [1,0,1,0,1], [1,0,1,0]]

kernel.compute_train(xtrain)

Output in terminal:
<pre><code>0,1,
dictionarySize = 2
g = 3, k = 1, 6 features
Computing exact kernel...
Computing 3 mismatch profiles using 3 threads...
Thread 0 finished in 1 iterations...
Thread 1 finished in 1 iterations...
Thread 2 finished in 1 iterations...</code></pre>

#### **Kernel.train_kernel()<br>Kernel.test_kernel()<br>**

*train_kernel()* returns the training portion of the kernel matrix.<br>
*test_kernel()* returns the testing portion of the kernel matrix.<br>

For example, given the set up below..

In [0]:
from fastsk import Kernel

kernel= Kernel(g=3, m=2)

Xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
Xtest = [[1,1,1,1,1], [1,0,1,0,1]]

kernel.compute(Xtrain, Xtest)

Then we run..

In [0]:
with open('train_kernel.txt','w')  as f:
    f.write(str(kernel.train_kernel()))

with open('test_kernel.txt','w')  as f:
    f.write(str(kernel.test_kernel()))

In *train_kernel.txt*:

<pre><code>[[1.0, 0.8885233166386385], [0.8885233166386385, 1.0]]
</code></pre>

In *test_kernel.txt*:

<pre><code>[[0.7453559924999299, 0.9271726499455306], [1.0, 0.8885233166386385]]</code></pre>

#### **Kernel.save_kernel(*string* kernel_file)**
This method takes a filename string as the destination to write the kernel to.


In [0]:
from fastsk import Kernel

kernel= Kernel(g=3, m=2)

xtrain = [[1,0,1,0,1], [1,1,1,0,1]]
xtest = [[1,1,1,1,1], [1,0,1,0,1]]

kernel.compute(xtrain, xtest)
kernel.save_kernel('output.txt')

Message in terminal from *save_kernel*():
<pre><code>Writing kernel to output.txt...</code></pre>