Improv perf #29

nucccc · 2023-01-05T21:43:08Z

Good afternoon,
I hope this message finds you well, and I compliment and thank you for the code.

I worked with frechet and dtw, and I found that computational performances of the frechet function were subpar since they didn't use the cdist function from scipy (which i found by far more performing than the minkowski_distance one).

I took the freedom to change the code and propose you a pull request (i also modified tests code in order for it not to import the already installed package).

On my day to day job I also use cython to improve performances of python programs, and I was thinking that maybe it could benefit some of the loops in the code.

Best of wishes for whatever!

Nuc

…dtw calculation

cjekel · 2023-01-07T19:39:29Z

Hi Nuc,

Thanks for this PR! It will definitely increase the performance of discrete frechet distance.

I'm going to recommend you make a couple minor changes, then I'll merge this in and publish a new release out there.

As far as cython goes -- yes I'd expect cython to be even faster. I kind of like the easier to read source code right now... but I can see people needing performance. I think the best path in the future could be an optional cython installation, which could fall back to python.

Thanks,
CJ

cjekel · 2023-01-07T19:40:11Z

similaritymeasures/similaritymeasures.py

@@ -434,7 +434,7 @@ def curve_length_measure(exp_data, num_data):
    return np.sqrt(np.sum(r_sq))


-def frechet_dist(exp_data, num_data, p=2):
+def frechet_dist(exp_data, num_data, metric='euclidean'):


Can you fix the flake8 errors here:

similaritymeasures/similaritymeasures.py:4:1: F401 'scipy.spatial.minkowski_distance' imported but unused similaritymeasures/similaritymeasures.py:499:50: E251 unexpected spaces around keyword / parameter equals similaritymeasures/similaritymeasures.py:499:52: E251 unexpected spaces around keyword / parameter equals similaritymeasures/similaritymeasures.py:502:19: E231 missing whitespace after ',' similaritymeasures/similaritymeasures.py:504:39: E231 missing whitespace after ',' similaritymeasures/similaritymeasures.py:506:39: E231 missing whitespace after ',' similaritymeasures/similaritymeasures.py:510:31: E231 missing whitespace after ','

Sincerely I did not run flake on my machine I don't quickly know how to do it, but I think I fixed everything. I would kindly ask you to rerun it and evaluate again if standards are met.

cjekel · 2023-01-07T19:47:53Z

tests/tests.py

+import sys
+# caution: path[0] is reserved for script path (or '' in REPL)
+sys.path.insert(1, '../similaritymeasures')
+


These lines are not needed when you install the package. Please remove these 4 lines.

When you do this kind of package development, you should install the library from the source code. I normally clone the library, and then install the local files with pip. After making changes to the source files you then re-install them.

A lot of people find this constant installation annoying.

A good alternative is to use pip -e which will make a symbolic link from your site packages folder to the local source code. Essentially then the library's source code becomes editable.

So with all that said, you can avoid setting paths by simply doing

git clone https://github.com/cjekel/similarity_measures.git python -m pip install -e ./similarity_measures

which will then let you make changes to the source code and always import the latest code.

Thank you for the explanation on how to reinstall everything!

I just removed these lines in my latest commit

nucccc · 2023-01-07T20:23:42Z

I made my best to handle the changes! Please let me know if I missed something.

nucccc · 2023-01-07T20:41:35Z

Hi Nuc,

Thanks for this PR! It will definitely increase the performance of discrete frechet distance.

I'm going to recommend you make a couple minor changes, then I'll merge this in and publish a new release out there.

As far as cython goes -- yes I'd expect cython to be even faster. I kind of like the easier to read source code right now... but I can see people needing performance. I think the best path in the future could be an optional cython installation, which could fall back to python.

Thanks, CJ

I'll do my best to answer here regarding cython. I already took the code and cythonized it in a separate folder of mine, did some quick and dirty benchmarking/profiling. The best of the performance gain came just from using distance.cdist function. That gave a gain of like 7x, up to what I recall.

Then, the cythonization I did was quite light: I just used cdef to take the i and j variable (the ones used in the loops) and declared them as c integers. That made gave the code a 30% speed up to what I recall (on the already improved version using cdist...).

That is to provide you with my personal experience. Readability was not that much affected I suppose! It was just two initial declarations. But the cython build process may become messy in the installation procedure I fear.

Maybe a similar improvement can be obtained by just setting i and j as np.int32/np.int16 or something, but I don't know if the range will discard them and replace them with actual python ints (which slow down stuff since they are full fledged pointers to objects instead of just 4-byte integers).

All this stuff up to my limited knowledge.

A horrible thing that I can propose is something that would possibly require modifications to the interface: if you already calculated the distance matrix between each pair of points of the curves, you can think of passing it to frechet and dtw functions, in order to avoid recalculating it twice (in case you need both distances). I suppose sklearn allows something like this in certain clustering algorithms, like "I will calculate the distances matrix to execute the clustering algo, but if you already have it just pass it and that will be used directly". This also allows people to use peculiar distance calculation criteria. But I suggest such an idea just for the sake of it, knowing the importance of keeping the interface stable and standardized!

Thank you for the time and consideration

cjekel · 2023-01-09T17:11:02Z

I made my best to handle the changes! Please let me know if I missed something.

Code looks good to be merged. Thank you for this contribution!

This is going to trigger a 1.0.0 release of this software since I think I'm going to break backwards compatibility.

cjekel · 2023-01-09T17:12:16Z

@nucccc can you make an issue on cython dtw and frechet with your points? I don't have time right now to give it much thought, but I will to help out with that!

nucccc · 2023-01-09T22:48:46Z

In my last commit I found a way to mantain backwards compatibility. It was just necessary to pass the minkowski distance as an argument to distance.cdist and then set p=p in the **kwargs of the scipy function

`def frechet_dist(exp_data, num_data, p=2):

...

c = distance.cdist(exp_data, num_data, metric='minkowski', p=p)`

So I hope backwards compatibility shall not be broken

cjekel · 2023-01-14T18:30:01Z

In my last commit I found a way to mantain backwards compatibility. It was just necessary to pass the minkowski distance as an argument to distance.cdist and then set p=p in the **kwargs of the scipy function

woah! I didn't know that was a thing. That will make the merge and release a bit easier!

LGTM. Thank you for this contribution!

nucccc added 3 commits January 5, 2023 22:15

frechet_dist now using minkowski calculation from scipy cdist as for …

1bb7bd9

…dtw calculation

tests now taking local file as import

4860e75

frechet_dist interface mdofied

3e09c7f

cjekel requested changes Jan 7, 2023

View reviewed changes

nucccc added 2 commits January 7, 2023 20:57

removed path from tests

fdc2651

fixed for adaptation to interface and flake

d205315

cjekel approved these changes Jan 9, 2023

View reviewed changes

nucccc mentioned this pull request Jan 9, 2023

possible cythonization #31

Open

cjekel merged commit aa8227c into cjekel:master Jan 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improv perf #29

Improv perf #29

nucccc commented Jan 5, 2023

cjekel commented Jan 7, 2023

cjekel Jan 7, 2023

nucccc Jan 7, 2023

cjekel Jan 7, 2023 •

edited

nucccc Jan 7, 2023

nucccc commented Jan 7, 2023

nucccc commented Jan 7, 2023

cjekel commented Jan 9, 2023

cjekel commented Jan 9, 2023

nucccc commented Jan 9, 2023

cjekel commented Jan 14, 2023

Improv perf #29

Improv perf #29

Conversation

nucccc commented Jan 5, 2023

cjekel commented Jan 7, 2023

cjekel Jan 7, 2023

Choose a reason for hiding this comment

nucccc Jan 7, 2023

Choose a reason for hiding this comment

cjekel Jan 7, 2023 • edited

Choose a reason for hiding this comment

nucccc Jan 7, 2023

Choose a reason for hiding this comment

nucccc commented Jan 7, 2023

nucccc commented Jan 7, 2023

cjekel commented Jan 9, 2023

cjekel commented Jan 9, 2023

nucccc commented Jan 9, 2023

cjekel commented Jan 14, 2023

cjekel Jan 7, 2023 •

edited