Skip to content

Releases: DmitryOlshansky/jsm4s

Configurable number of threads

27 Sep 08:06
Compare
Choose a tag to compare
v1.4.1

Configurable N of threads in generation step

Semi-decent encoding of continuous values

07 Aug 16:32
Compare
Choose a tag to compare

This time an attempt was made to properly encode numeric values. It's a tricky task with a multituide of strategies, and there is no one size fits all. Ultimately it should be done in a loop with a cross-validation. For now I opted to use a simplistic strategy to subdivide numeric spaces so as to get uniform amount of data points in each split.

Another addition is a small step on the path of removing unnecessary allocations along the critical path of generation/prediction. There is plenty of work left to do there.

Bottom line though is that adult.csv dataset is finally taking reasonable amount of time and still gets 83% percent accuracy. Not bad keeping in mind that I haven't explored the nature of failures or the dataset itself in any sufficient detail.

Parallel model generation

01 Aug 12:27
Compare
Choose a tag to compare

The biggest change is parallel model generation on top of ForkJoinPool.

There is also experimental support for sparse data structures instead of bit vectors but it fails to deliver any improvements in performance for now.

Lastly the jsm command which is shortcut for generate + predict without storing the model in a file.

v1.1.0 + Parallel prediction

25 Jul 12:19
Compare
Choose a tag to compare

About time to get parallel prediction in, for now a trivial implementation on parallel Scala collection suffices to get considerable speedup.

Also recognize command is now called predict.

Recognizer performance improvements

24 Jul 15:44
Compare
Choose a tag to compare

This release solely focues on recognize command speed.

A tricky scheme of weeding out hypotheses that won't fit the example at hand gets us from 336 sec down to 29.5 sec. This order of magnitude improvement prompts a new release.

Next up is parallel generation!

First release

14 Jul 14:37
Compare
Choose a tag to compare

In this release proof of concept of v0.9.0 was heavily revamped, as a result v1.0.0 lays firm foundation on which subsequent versions can extend easily.

Finally properties and attributes are properly separated leading to better performance and more flexible design.

Key points:

  • support for different types of properties is included (but not exposed to command-line yet)
  • about ~4x performance improvement in filtering of hypothesis
  • new format with meta-data header, there is no need to pass correct -p parameter at command-line anymore

Initial public version

11 Jul 14:57
Compare
Choose a tag to compare
Pre-release

First release that features complete set of command line options except for jsm convenience command. Building should be as simple as:

sbt assembly

The file at target/scala-2.12/jsm4s-0.9.0.jar is the swiss army knife command line driver for JSM ML method. See README for usage.

What works:

  • The flow described in README works and gives 100% accuracy on mushroom dataset

Limitations:

  • Only binary target properties work, only singular property was really tested
  • Almost no knobs to tune yet, it either works or doesn't
  • It's easy to missmatch files, no sanity checks are made to see if you accidentally swapped test dataset with model etc.