Releases: DmitryOlshansky/jsm4s
Configurable number of threads
v1.4.1 Configurable N of threads in generation step
Semi-decent encoding of continuous values
This time an attempt was made to properly encode numeric values. It's a tricky task with a multituide of strategies, and there is no one size fits all. Ultimately it should be done in a loop with a cross-validation. For now I opted to use a simplistic strategy to subdivide numeric spaces so as to get uniform amount of data points in each split.
Another addition is a small step on the path of removing unnecessary allocations along the critical path of generation/prediction. There is plenty of work left to do there.
Bottom line though is that adult.csv dataset is finally taking reasonable amount of time and still gets 83% percent accuracy. Not bad keeping in mind that I haven't explored the nature of failures or the dataset itself in any sufficient detail.
Parallel model generation
The biggest change is parallel model generation on top of ForkJoinPool.
There is also experimental support for sparse data structures instead of bit vectors but it fails to deliver any improvements in performance for now.
Lastly the jsm
command which is shortcut for generate + predict
without storing the model in a file.
v1.1.0 + Parallel prediction
About time to get parallel prediction in, for now a trivial implementation on parallel Scala collection suffices to get considerable speedup.
Also recognize
command is now called predict
.
Recognizer performance improvements
This release solely focues on recognize command speed.
A tricky scheme of weeding out hypotheses that won't fit the example at hand gets us from 336 sec down to 29.5 sec. This order of magnitude improvement prompts a new release.
Next up is parallel generation!
First release
In this release proof of concept of v0.9.0 was heavily revamped, as a result v1.0.0 lays firm foundation on which subsequent versions can extend easily.
Finally properties and attributes are properly separated leading to better performance and more flexible design.
Key points:
- support for different types of properties is included (but not exposed to command-line yet)
- about ~4x performance improvement in filtering of hypothesis
- new format with meta-data header, there is no need to pass correct
-p
parameter at command-line anymore
Initial public version
First release that features complete set of command line options except for jsm
convenience command. Building should be as simple as:
sbt assembly
The file at target/scala-2.12/jsm4s-0.9.0.jar is the swiss army knife command line driver for JSM ML method. See README for usage.
What works:
- The flow described in README works and gives 100% accuracy on mushroom dataset
Limitations:
- Only binary target properties work, only singular property was really tested
- Almost no knobs to tune yet, it either works or doesn't
- It's easy to missmatch files, no sanity checks are made to see if you accidentally swapped test dataset with model etc.