-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for precomputed kernel. #73
Conversation
Codecov Report
@@ Coverage Diff @@
## master #73 +/- ##
==========================================
+ Coverage 84.75% 86.61% +1.86%
==========================================
Files 5 6 +1
Lines 223 254 +31
==========================================
+ Hits 189 220 +31
Misses 34 34
Continue to review full report at Codecov.
|
Are there any sample code of precomputed kernel in C that can help us for verifying these changes? |
There are some tests in scikit learn. The test that I have added is loosely based on it. When I passed the test-case problem to Anyway, I think we can use |
a3dee33
to
7ae09ed
Compare
Prediction now works (according to the simple test case, at least). |
e105f0d
to
d0ddc4c
Compare
I have added two more test cases. I think we can consider it working. |
This commit is the necessary minimum to support precomputed kernel. I tried to keep the number of changes low, so the code could probably use some polishing. I also added a basic test (the expected values are based on the output of `svm-train` from LIBSVM).
We now throw `DimensionMismatch` instead of `ArgumentError` in case of invalid shape of the input matrix.
The prediction with precomputed kernel now works. Although, the code gets uglier and uglier.
It also fixes getting the number of instances for prediction in the case of precomputed kernel.
3a92dbf
to
a9fb62e
Compare
In the case of precomputed kernel, the first column of nodes must contain indices of the instances. This special method, `gram2nodes`, adds the indices implicitly so we don't have to add them to the input matrix explicitly.
A method is introduced to create `SVMNode`s from ordinary instances or from a precomputed kernel matrix. So far, it only works for full matrices, not sparse ones.
bc93253
to
498c1d9
Compare
This commit creates a new file `nodes.jl` which contains the code that converts input matrices into `SVMNode`s. Moreover, the function that converts instances represented as a sparse matrix is a bit changed to use only exported functions from SparseArrays. When the matrix containing kernel values is sparse, we convert it to a full matrix and use the full matrix implementation. Hopefully, it should not matter too much as there is a good chance that sparse matrices are not that useful for kernel values.
Ok, I guess this is ready for reviews now. One thing that is not so great is that sparse matrices will be converted to full matrices in the case of precomputed kernel. The reason why I did this is that I could not figure out how to reuse the code that we use for sparse matrices in the case of non-precomputed kernels. I don't think this is a significant issue since kernels tend to produce non-zero values anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but obviously would like another maintainer to review
This commit introduces a constant instance of `SVMNode(-1, NaN)` that is used as every terminal `SVMNode`.
@@ -334,6 +309,7 @@ function svmtrain( | |||
weights = Float64[] | |||
reverse_labels = Bool[] | |||
else | |||
check_train_input(X, y, kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think EpsilonSVR
and NuSVR
also need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think so too.
The check was originally in the indices_and_weights
so it's been missing for EpsilonSVR
and NuSVR
for quite some time. Should I add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, adding that check in this PR is fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@barucden I'm going to make a new release for this changes. Are there any planing PRs desired to be shipped with the new release? |
@barucden Thank you indeed for this work. I realize it was more than you bargained for. While it's fresh in your mind, could you add some brief clue in the readme documentation on the new option to specify the kernel? That will expedite updating the MLJ interface. |
We could address #61 in this release if you are interested. I still have the code.
I added a few notes to the docs of the relevant methods ( |
This is an effort to properly support precomputed kernel.
There are few more steps left:
svmpredict
Closes #71