Transfer String Kernel for Cross-Context String Classification
Transfer String Kernel

Reference Paper: Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction

Running the Matlab Code:

  1. Download code folder and open in Matlab
  2. Compile all the C files
mex mexcntsrtna.c
mex mexEtractKmer.c
mex mexextractuniqena.c
mex mexextractuniquena_int.c

Run TSK code


(stringkernel_betaKMM(FASTA format file, k parameter , m parameter , number of training samples)

Note: example.fasta consists of 2000 sequences, with following format:

>chr1:465738-478736 (position)|-1 (label)   [header]

This will generate the following kernel files for input into SVM package SVMLight (commented code modifications available for LIBSVM format output in the code):

example.fasta.10.3.1000.TESTKERNEL.txt : Test Kernel file
example.fasta.10.3.1000.TRAINKERNEL.txt : Train Kernel file (Simple SK)
example.fasta.10.3.1000.WEIGHTTRAINKERNEL.txt : Train Kernel file with weights (TSK)
example.fasta.LABELS.txt : File containing true labels for the testing

Running SVM Classifier

We use the SVMLight package for implementing SVM classifier for TSK :

Once the kernel files are generated, use following commands to train/test the svm

Note: These are just example commands, hyperparameter tuning of C parameter (-c) maybe be required to choose the best value.


svm_learn -c 1 example.fasta.10.3.1000.WEIGHTTRAINKERNEL.txt model.tmp


svm_classify -f 1 example.fasta.10.3.1000.TESTKERNEL.txt model.tmp example.fasta.10.3.1000.WEIGHTPRED.txt

Note: stringKernel_betaKMM.m also contains (commented) code to print out kernels in format for LIBSVM package.