Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make model parameters for SVCs with linear kernels accessible in SKLL #443

Merged
merged 3 commits into from
Dec 11, 2018

Conversation

desilinguist
Copy link
Member

@desilinguist desilinguist commented Dec 7, 2018

For SVCs with linear kernels, we want to print out the primal weights - that is, the weights for each feature for each one-vs-one binary classifier. These are the weights contained in the coef_ attribute of the underlying scikit-learn model. This is a matrix that has the shape [(n_classes)(n_classes -1)/2, n_features] since there are C(n_classes, 2) = n_classes(n_classes-1)/2 one-vs-one classifiers and each one has weights for each of the features. According to the scikit-learn user guide and the code for the function _one_vs_one_coef() in svm/base.py, the order of the rows is as follows is "0 vs 1", "0 vs 2", ... "0 vs n", "1 vs 2", "1 vs 3", "1 vs n", ... "n-1 vs n".

I have implemented this in the Learner.model_params() method. In order to doubly ensure that taking the coef_ values and assigning them to these class pairs, I wanted to check that LIBSVM (which is actually underlies the SVC classifier in scikit-learn) itself also gets the same weights. To do this, I first trained an SVC model with a linear kernel on our "Iris" example and ranprint_model_weights (which uses model_params()). The output of print_model_weights is as follows:

== intercept values ==
1.454174024546  setosa-vs-versicolor
1.507402364191  setosa-vs-virginica
5.730615419818  versicolor-vs-virginica

Number of nonzero features: 12
-1.921263583676 versicolor-vs-virginica f2
-1.862045422601 versicolor-vs-virginica f3
1.195963669885  versicolor-vs-virginica f1
-1.002975155027 setosa-vs-versicolor    f2
0.546299987195  versicolor-vs-virginica f0
-0.538398699463 setosa-vs-virginica f2
0.520869022653  setosa-vs-versicolor    f1
-0.464101072449 setosa-vs-versicolor    f3
-0.292290974055 setosa-vs-virginica f3
0.178903585666  setosa-vs-virginica f1
-0.046389035027 setosa-vs-versicolor    f0
-0.007134560620 setosa-vs-virginica f0

Note that in this case label 0 corresponds to setosa, label 1 corresponds to versicolor, and label 2 corresponds to virginica.

Next, I downloaded LIBSVM, compiled it, and ran the following commands that use LIBSVM to train an equivalent SVC model with the same hyperparameters.

$> cd examples/iris/train
$> skll_convert example_iris_features.jsonlines iris.libsvm
$> svm-train -s 0 -t 0 -c 1 ~/work/skll/examples/iris/train/iris.libsvm

Next, I looked at this entry in the LIBSVM FAQ that explains how to get the primal coefficients from the dual ones. To do this, I compiled the Python interface that comes with LIBSVM and then ran the following Python commands:

>>> from svmutil import svm_load_model
>>> m = svm_load_model('iris.libsvm.model') 

# note that the order of the labels is not what we would expect
>>> m.get_labels()
[1, 2, 0]

# get the number of support vectors per class
>>> m.nSV[:3]
[10, 9, 3]

# now get the dual coefficients
>>> sv_coefs = m.get_sv_coef() 
>>> svs = m.get_SV()

Now, as the entry says, in order to compute the primal coefficients for, say, label 1 (versicolor) vs label 2 (virginica), we first need to compute the y_i alpha_i for the two classes. Given that sv_coefs is a 22x2 array - the first 10 rows representing the support vector coefficeints for class 1 (label 1), the next 9 representing the same for class 2 (label 2), and the last 3 representing the same for class 3 (label 0). Each coefficient has two entries, with each entry representing the coefficient of classifer trained to classify between the main class vs. the other class. For example, each of the first 10 support vector coefficients have 2 entries - the first representing those for the classifier for class 1 (label 1) vs. class 2 (label 2) and the second representing those for the classifier for class 1 (label 1) and class 3 (label 0). The next 9 represent those for class 2 (label 2) vs. class 1 (label 1) and class 3 (label 0) respectively. And, finally, the last 3 represent those for class 3 (label 0) vs. class 1 (label 1) and class 2 (label 2) respectively (note that the classes in the two columns are in strict increasing order). Given this setup, the coefficients we are interested (class 1 vs class 2) can be computed as follows:

>>> our_coefs = np.array([x[0] for x in sv_coefs[:10]] + [x[0] for x in sv_coefs[10:19]])

Next, let's get the actual support vectors from the svs list of dictionaries for the same corresponding indices (the first 19):

>>> our_svs = np.array([[x[1], x[2], x[3], x[4]] for x in svs[:19]])

And, finally, let's take the dot product between the support vector coefficients (1x19) against the support vectors (19x4). This will give us the feature weight vector (1x4) for the class 1 vs. class 2 binary classifier.

>>> our_coefs2 = our_coefs.reshape(1,19)
>>> our_coefs2.dot(our_svs)
array([[ 0.54628096,  1.19553697, -1.92187359, -1.86235093]])

The four versicolor-vs-virginica weights from our print_model_weights output – sorted by feature names - are as follows:

0.546299987195  versicolor-vs-virginica f0
1.195963669885  versicolor-vs-virginica f1
-1.921263583676 versicolor-vs-virginica f2
-1.862045422601 versicolor-vs-virginica f3

The two vectors match to a satisfactory number of significant digits and this confirms that the implementation is accurate.

@desilinguist desilinguist added this to the 1.5.3 milestone Dec 7, 2018
@desilinguist desilinguist self-assigned this Dec 7, 2018
Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation! I've followed along with what you did myself and everything seems satisfactory to me.

@desilinguist
Copy link
Member Author

Thanks, @mulhod! 👍

Copy link
Collaborator

@jbiggsets jbiggsets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Also, really interesting. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants