A gem to test what metric is best for certain kinds of datasets in machine
learning. Besides the
Array class, I also want to support
This is a fork of the gem Distance Measure, which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, @reddavis. :)
gem install measurable
This gem is currently being tested on MRI Ruby 1.9.3, 2.0, 2.1.0, 2.1 (HEAD) and on Rubinius 2.x (HEAD). I hope to add JRuby support in the future.
Available distance measures
I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
The following are the similarity measures supported at the moment:
- Euclidean distance
- Squared euclidean distance
- Cosine distance
- Max-min distance (from "K-Means clustering using max-min distance measure")
- Jaccard distance
- Tanimoto distance
- Haversine distance
- Minkowski (aka Cityblock or Manhattan) distance
- Chebyshev distance
- Hamming distance
- Levenshtein distance
- Kullback-Leibler divergence
How to use
The API I intend to support is something like this:
require 'measurable' # Calculate the distance between two points in space. Measurable.euclidean([1, 1], [0, 0]) # => 1.41421 # Calculate the norm of a vector, i.e. its distance from the origin. Measurable.euclidean([1, 1]) # => 1.4142135623730951 # Get the cosine distance between Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261 # Calculate sum of squares directly. Measurable.euclidean_squared([3, 4]) # => 25
Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use NMatrix.
The documentation is hosted on rubydoc.
See LICENSE for details.
distance_measures gem is copyrighted by @reddavis.