A gem to test what metric is best for certain kinds of datasets in machine learning.
Array class, I also want to support
NVector (from NMatrix).
The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).
gem install measurable
I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.
I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
The following are the similarity measures supported at the moment:
- Euclidean distance
- Squared euclidean distance
- Cosine distance
- Max-min distance (from "K-Means clustering using max-min distance measure")
- Jaccard distance
- Tanimoto distance
- Haversine distance
- Minkowski (Cityblock or Manhattan) distance
- Chebyshev distance
- Hamming distance
- Levenshtein distance
These still need to be implemented:
- Correlation distance
- Chi-square distance
- Kullback-Leibler divergence
- Jensen-Shannon divergence
- Mahalanobis distance
- Squared Mahalanobis distance
I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)
The API I intend to support is something like this:
require "measurable" u = NMatrix.ones([2, 1]) v = NMatrix.zeros([2, 1]) w = [1, 0] x = [2, 2] # Calculate the distance between two points in space. Measurable.euclidean(u, v) # => 1.41421 Measurable.euclidean(w, v) # => 1.00000 Measurable.cosine([1, 2], [2, 3]) # => 0.00772 # Calculate the norm of a vector, i.e. its distance from the origin. Measurable.euclidean_squared([3, 4]) # => 25
RDoc syntax is used to document the project. To build it locally, you'll need to install the Fivefish generator (
gem install rdoc-generator-fivefish) and run the following command:
If there's something wrong with an explanation or if there's information missing, please open an issue or send a pull request.
See LICENSE for details.
distance_measures gem is copyrighted by @reddavis.