-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] New similarity search module #724
Conversation
…rimental/similarity_search
…an distance (#756) * Adding TopKSimilarity Search class with euclidean distance profiles * Removing old normalization attributes * Removing old normalization attributes in predict --------- Co-authored-by: MatthewMiddlehurst <m.middlehurst@uea.ac.uk>
…eon-toolkit/aeon into experimental/similarity_search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor points, maths in docstring main point. Forgot this was my own PR :)
…eon-toolkit/aeon into experimental/similarity_search
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…heck, some utils functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM great work !
Future PR suggestion to do:
1- Add test for base class with a dummy similarity search module ?
2- Add example in doc string of topksimsearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue with introducing the module, I have not really looked at the specifics of how it is implemented and am content to let it evolve as usage and implementations increase.
Most of the comments are regarding the documentation, I did not check the examples but may come back to do so (or make a PR later).
aeon/similarity_search/distance_profiles/tests/test_naive_euclidean.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR should now be in a stable state after the modifications from the previous review. If everyone agree, we can move forward and progress on it in future PR.
this PR introduces a new (experimental) module: similarity search by @baraline
Overview of the content:
Changed base shape expected for X to (n_samples, n_channels, n_timestamps), and for Q to (n_channels, q_length) to handle the more general case.
The choice of how a candidate of X is matched with Q is left to the child classes (e.g. TopKSimilaritySearch class will ask for a k argument and return top-k matches). A possibility would be to make a base class for distance profiles to force them having the same API and have attributes indicating if they are normalized or not.
Normalization happens in the predict method every time it is called. It stores the mean and std of each channels of Q and of all possible candidates of X as attributes (e.g _X_means). They are stored as argument to be called in the child classes of the base class when the distance profile is normalized. It may be possible to avoid recomputing it by checking if the length of Q and the stored input _X have not changed.
Added naive Euclidean and normalized naive Euclidean distance profiles with basic tests
Added top-k similarity search class with basic tests. The return of predict is an array of tuple as (with k=1) [(id_sample, id_timestamp)]
TODOs left for future PR :
Make documentation for module and estimator
Decide how to handle/check when there is not input or query length change to avoid computing normalization again
Refine testing scenarios and add them to the CI pipeline (if not made automaticaly ?)
Add non-naïve Euclidean distance profile with Mueen Algorithm.
Add a template for new cases