Skip to content
EdwardRaff edited this page Feb 25, 2017 · 2 revisions

FAQ

Below are some frequently asked questions about JSAT.

What is the purpose of JSAT / who is it for?

JSAT is meant to be a general purpose machine learning library, that is easy to develop code for and with. This is different compared to other libraries like Weka and Orange, which are more designed to be used with a GUI by non-programmers.

Why does JSAT have no dependencies? Wouldn't it be better to use package X for feature Y?

It could be! The original reason JSAT has no dependencies is because I like implementing things. But as JSAT got bigger, the lack of dependencies became an unintended feature. A number of JSAT users started because JSAT had no dependencies, and other libraries were causing dependency conflicts in very large enterprise style projects. I now keep JSAT dependency free to so that it is easy to include in any project, and will never cause a dependency conflict.

Will you be moving JSAT to Java 8?

Yes! That is in the plans, but was delayed for medical reasons.

Will you please change the license from GPL to BSD / MIT / Apache / other?

No, I will not be changing the license of JSAT away from GPL. I have spent many years working on JSAT and made it available publicly (under the GPL) without any compensation. I consider the sharing requirements of the GPL as my "compensation" for the code I've released. That if you use JSAT to make something and distributed it, you must release your code as well. I am aware that the GPL "doesn't work" for some people, and they are free to ask me about alternative licensing if they wish.

I want to contribute code/fixes to JSAT, how can I do that?

You can always ask about things that are on my TODO list, or ask if ideas you have would work. Or if you have small typo / bug fixes feel free to just open a pull request. If you are doing a small change, copyright does not generally apply. If you are going to contribute some more significant code, I follow a policy similar to the GNU projects. I ask that you either contribute under the Public Domain or ask me about signing a ownership and licensing agreement. I'll end up with ownership of the code, you will have license to the code you contributed to do with as you please. This makes my life much easier.

What algorithms are in JSAT that might not be in other packages?

JSAT has a number of slightly more niche algorithms implemented in it that are often not available in other libraries. Below is a list of some of the particularly useful ones that I think have high utility, and other implementations that I'm aware of. I don't stalk other projects, so if I missed something please assume it is an accident - and let me know so I can update the table!

Algorithm Utility Also Available In
Extra Random Trees Classification and Regression problems Scikit-learn, Weka
DC-SVM Fast multi-threaded approximate and exact SVM solver only author's webpage
NewGLMNET Fast L1 and Elastic Net regularized logistic regression LibLinear has L1 version, but not elastic net
Support Passive Aggressive Native multi-class version of popular Passive Aggressive classifier no other implementation exists
t-SNE Popular data visualization algorithm Scikit-learn
LargeViz New and easier to use data visualization algorithm, related to t-SNE No other larger libraries
Elkan & Hamerly k-means Faster exact k-means clustering. Scikit-learn has Elkan version.
Elkan Kernel K-Means Faster exact kernel k-means clustering no other implementations exist
Adaptive Multi-Hyperplane Machine Non-linear classifier with training time similar to linear algorithms BudgetedSVM
RBF Kernel Merging Approximatino Fast and useful budgeted kernel method BudgetedSVM
Modest AdaBoost Version of AdaBoost that tends to overfit less none
DCDs & LogisticRegressionDCD Fast linear SVM and LR solvers Liblinear, scikit-learn
HDBSCAN Useful clustering algorithm that improves upon DBSCAN random independent implementations, but not in any libraries
LSDBC Useful clustering algorithm that improves upon DBSCAN no other implementations
KernelRLS Regression algorithm dlib