Make model parameters for SVCs with linear kernels accessible in SKLL #443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For SVCs with linear kernels, we want to print out the primal weights - that is, the weights for each feature for each one-vs-one binary classifier. These are the weights contained in the
coef_
attribute of the underlying scikit-learn model. This is a matrix that has the shape [(n_classes)(n_classes -1)/2, n_features] since there are C(n_classes, 2) = n_classes(n_classes-1)/2 one-vs-one classifiers and each one has weights for each of the features. According to the scikit-learn user guide and the code for the function_one_vs_one_coef()
insvm/base.py
, the order of the rows is as follows is "0 vs 1", "0 vs 2", ... "0 vs n", "1 vs 2", "1 vs 3", "1 vs n", ... "n-1 vs n".I have implemented this in the
Learner.model_params()
method. In order to doubly ensure that taking thecoef_
values and assigning them to these class pairs, I wanted to check that LIBSVM (which is actually underlies the SVC classifier in scikit-learn) itself also gets the same weights. To do this, I first trained an SVC model with a linear kernel on our "Iris" example and ranprint_model_weights
(which usesmodel_params()
). The output ofprint_model_weights
is as follows:Note that in this case label 0 corresponds to
setosa
, label 1 corresponds toversicolor
, and label 2 corresponds tovirginica
.Next, I downloaded LIBSVM, compiled it, and ran the following commands that use LIBSVM to train an equivalent SVC model with the same hyperparameters.
Next, I looked at this entry in the LIBSVM FAQ that explains how to get the primal coefficients from the dual ones. To do this, I compiled the Python interface that comes with LIBSVM and then ran the following Python commands:
Now, as the entry says, in order to compute the primal coefficients for, say, label 1 (
versicolor
) vs label 2 (virginica
), we first need to compute they_i alpha_i
for the two classes. Given thatsv_coefs
is a 22x2 array - the first 10 rows representing the support vector coefficeints for class 1 (label1
), the next 9 representing the same for class 2 (label2
), and the last 3 representing the same for class 3 (label0
). Each coefficient has two entries, with each entry representing the coefficient of classifer trained to classify between the main class vs. the other class. For example, each of the first 10 support vector coefficients have 2 entries - the first representing those for the classifier for class 1 (label1
) vs. class 2 (label2
) and the second representing those for the classifier for class 1 (label1
) and class 3 (label0
). The next 9 represent those for class 2 (label2
) vs. class 1 (label1
) and class 3 (label0
) respectively. And, finally, the last 3 represent those for class 3 (label0
) vs. class 1 (label1
) and class 2 (label2
) respectively (note that the classes in the two columns are in strict increasing order). Given this setup, the coefficients we are interested (class 1 vs class 2) can be computed as follows:Next, let's get the actual support vectors from the
svs
list of dictionaries for the same corresponding indices (the first 19):And, finally, let's take the dot product between the support vector coefficients (1x19) against the support vectors (19x4). This will give us the feature weight vector (1x4) for the class 1 vs. class 2 binary classifier.
The four
versicolor-vs-virginica
weights from ourprint_model_weights
output – sorted by feature names - are as follows:The two vectors match to a satisfactory number of significant digits and this confirms that the implementation is accurate.