Skip to content

Commit

Permalink
Updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCappelletti94 committed Nov 2, 2019
1 parent 0b35df4 commit 76e6337
Showing 1 changed file with 64 additions and 135 deletions.
199 changes: 64 additions & 135 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,144 +18,73 @@ Since some software handling coverages sometime get slightly different results,

|coveralls| |sonar_coverage| |code_climate_coverage|

Basic example
--------------
For each metric, an example is present in `the folder examples`_. Here's a basic example for those too lazy to click links (like me).

Available metrics
-----------------------------------------------
A number of distances and divergences are available:

- bhattacharyya
- bhattacharyya_coefficient
- canberra
- chebyshev
- cosine
- euclidean
- hamming
- jensen_shannon
- kullback_leibler
- mae
- manhattan
- cityblock
- minkowsky
- mse
- normal_total_variation
- nth_variation
- pearson
- squared_variation
- total_variation
- intersection_squared_variation
- intersection_squared_hellinger
- intersection_nth_variation

Usage example
--------------------

.. code:: python
from dictances import cosine
cosine(my_first_dictionary, my_second_dictionary)
Handling nested dictionaries
------------------------------------------
If you need to compute the distance between two nested dictionaries you can use `deflate_dict <https://github.com/LucaCappelletti94/deflate_dict>`_ as follows:

.. code:: python
import random
from dictances import cosine, euclidean, canberra
random.seed(42) # for reproducibility
# Simple function to generate the example dictionaries
def generate_example_dict(n=1000):
return {random.randint(0, 1000): random.uniform(0, 1000) for i in range(n)}
a, b = generate_example_dict(), generate_example_dict()
print(cosine(a, b))
# >>> 0.52336690346601
print(euclidean(a, b))
# >>> 15119.400349404095
print(canberra(a, b))
# >>> 624.9088876554047
Metrics table
--------------

+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| Metric name | Usage example | Average time on sample | Complexity |
+================================+===============================+=============================+======================================+
| `Euclidean distance`_ | `euclidean`_ | 90.4 µs ± 2.5 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| Squared variation | `squared_variation`_ | 90.8 µs ± 1.43 | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Total variation`_ | `total_variation`_ | 92.3 µs ± 1.28 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| Nth variation | `nth_variation`_ | 91.1 µs ± 1.2 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Manhattan distance`_ | `manhattan`_ | 92.7 µs ± 1.43 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Mean absolute error`_ | `mae`_ | 92.3 µs ± 1.28 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Mean squared error`_ | `mse`_ | 91.1 µs ± 1.2 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Chebyshev distance`_ | `chebyshev`_ | 101 µs ± 2.14 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Minkowski distance`_ | `minkowsky`_ | 91.1 µs ± 2.05 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Canberra distance`_ | `canberra`_ | 71.8 µs ± 1.95 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Cosine distance`_ | `cosine`_ | 61.3 µs ± 835 ns | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Pearson distance`_ | `pearson`_ | 46.9 µs ± 1.23 µs | |On+m| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Hamming distance`_ | `hamming`_ | 28.7 µs ± 784 ns | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| Normalized Total Variation | `normal_total_variation`_ | 34.6 µs ± 543 ns | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Kullback Leibler divergence`_ | `kullback_leibler`_ | 24 µs ± 587 ns | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Jensen Shannon divergence`_ | `jensen_shannon`_ | 38.2 µs ± 1.18 µs | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Bhattacharyya distance`_ | `bhattacharyya`_ | 32.7 µs ± 655 ns | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+
| `Hellinger distance`_ | `hellinger`_ | 42 µs ± 467 ns | |Omin| |
+--------------------------------+-------------------------------+-----------------------------+--------------------------------------+

Test computer specifications
----------------------------

The computer on which the metrics where timed had the following specifications:

+---------------------------------------+
| Computer specifications |
+=======================+===============+
| Model Name | MacBook Pro |
+-----------------------+---------------+
| Processor Name | Intel Core i7 |
+-----------------------+---------------+
| Processor Speed | 2.3 GHz |
+-----------------------+---------------+
| Number of Processors | 1 |
+-----------------------+---------------+
| Total Number of Cores | 4 |
+-----------------------+---------------+
| L2 Cache (per Core) | 256 KB |
+-----------------------+---------------+
| L3 Cache | 6 MB |
+-----------------------+---------------+
| Memory | 16 GB |
+-----------------------+---------------+

.. _Euclidean distance: https://en.wikipedia.org/wiki/Euclidean_distance
.. _Manhattan distance: https://en.wikipedia.org/wiki/Taxicab_geometry
.. _Jensen Shannon divergence: https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
.. _Bhattacharyya distance: https://en.wikipedia.org/wiki/Bhattacharyya_distance
.. _Total variation: https://en.wikipedia.org/wiki/Total_variation
.. _Hellinger distance: https://en.wikipedia.org/wiki/Hellinger_distance
.. _Kullback Leibler divergence: https://en.wikipedia.org/wiki/Hellinger_distance
.. _Mean absolute error: https://en.wikipedia.org/wiki/Mean_absolute_error
.. _Mean squared error: https://en.wikipedia.org/wiki/Mean_squared_error
.. _Chebyshev distance: https://en.wikipedia.org/wiki/Chebyshev_distance
.. _Minkowski distance: https://en.wikipedia.org/wiki/Minkowski_distance
.. _Canberra distance: https://en.wikipedia.org/wiki/Canberra_distance
.. _Cosine distance: https://en.wikipedia.org/wiki/Cosine_similarity
.. _Pearson distance: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
.. _Hamming distance: https://en.wikipedia.org/wiki/Hamming_distance

.. _euclidean: https://github.com/LucaCappelletti94/distances/blob/master/examples/euclidean.py
.. _jensen_shannon: https://github.com/LucaCappelletti94/distances/blob/master/examples/jensen_shannon.py
.. _bhattacharyya: https://github.com/LucaCappelletti94/distances/blob/master/examples/bhattacharyya.py
.. _total_variation: https://github.com/LucaCappelletti94/distances/blob/master/examples/total_variation.py
.. _squared_variation: https://github.com/LucaCappelletti94/distances/blob/master/examples/squared_variation.py
.. _nth_variation: https://github.com/LucaCappelletti94/distances/blob/master/examples/nth_variation.py
.. _hellinger: https://github.com/LucaCappelletti94/distances/blob/master/examples/hellinger.py
.. _kullback_leibler: https://github.com/LucaCappelletti94/distances/blob/master/examples/kullback_leibler.py
.. _manhattan: https://github.com/LucaCappelletti94/distances/blob/master/examples/manhattan.py
.. _mae: https://github.com/LucaCappelletti94/distances/blob/master/examples/mae.py
.. _mse: https://github.com/LucaCappelletti94/distances/blob/master/examples/mse.py
.. _chebyshev: https://github.com/LucaCappelletti94/distances/blob/master/examples/chebyshev.py
.. _minkowsky: https://github.com/LucaCappelletti94/distances/blob/master/examples/minkowski.py
.. _canberra: https://github.com/LucaCappelletti94/distances/blob/master/examples/canberra.py
.. _cosine: https://github.com/LucaCappelletti94/distances/blob/master/examples/cosine.py
.. _pearson: https://github.com/LucaCappelletti94/distances/blob/master/examples/pearson.py
.. _hamming: https://github.com/LucaCappelletti94/distances/blob/master/examples/hamming.py
.. _normal_total_variation: https://github.com/LucaCappelletti94/distances/blob/master/examples/normal_total_variation.py

.. _test utilities here: https://github.com/LucaCappelletti94/distances/blob/master/tests/helpers/utils.py
.. _the folder examples: https://github.com/LucaCappelletti94/distances/tree/master/examples

.. |On+m| image:: https://github.com/LucaCappelletti94/distances/blob/master/images/On+m.gif?raw=true
.. |Omin| image:: https://github.com/LucaCappelletti94/distances/blob/master/images/Omin.gif?raw=true
from dictances import cosine
from deflate_dict import deflate
my_first_dictionary = {
"a": 8,
"b": {
"c": 3,
"d": 6
}
}
my_second_dictionary = {
"b": {
"c": 8,
"d": 1
},
"y": 3,
}
cosine(deflate(my_first_dictionary), deflate(my_second_dictionary))
.. |travis| image:: https://travis-ci.org/LucaCappelletti94/dictances.png
:target: https://travis-ci.org/LucaCappelletti94/dictances
Expand Down

0 comments on commit 76e6337

Please sign in to comment.