Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] New similarity search module #724

Merged
merged 53 commits into from
Oct 30, 2023
Merged

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented Sep 5, 2023

this PR introduces a new (experimental) module: similarity search by @baraline

Overview of the content:

Changed base shape expected for X to (n_samples, n_channels, n_timestamps), and for Q to (n_channels, q_length) to handle the more general case.
The choice of how a candidate of X is matched with Q is left to the child classes (e.g. TopKSimilaritySearch class will ask for a k argument and return top-k matches). A possibility would be to make a base class for distance profiles to force them having the same API and have attributes indicating if they are normalized or not.
Normalization happens in the predict method every time it is called. It stores the mean and std of each channels of Q and of all possible candidates of X as attributes (e.g _X_means). They are stored as argument to be called in the child classes of the base class when the distance profile is normalized. It may be possible to avoid recomputing it by checking if the length of Q and the stored input _X have not changed.
Added naive Euclidean and normalized naive Euclidean distance profiles with basic tests
Added top-k similarity search class with basic tests. The return of predict is an array of tuple as (with k=1) [(id_sample, id_timestamp)]

TODOs left for future PR :

Make documentation for module and estimator
Decide how to handle/check when there is not input or query length change to avoid computing normalization again
Refine testing scenarios and add them to the CI pipeline (if not made automaticaly ?)
Add non-naïve Euclidean distance profile with Mueen Algorithm.
Add a template for new cases

TonyBagnall and others added 6 commits September 5, 2023 15:15
…an distance (#756)

* Adding TopKSimilarity Search class with euclidean distance profiles

* Removing old normalization attributes

* Removing old normalization attributes in predict

---------

Co-authored-by: MatthewMiddlehurst <m.middlehurst@uea.ac.uk>
@TonyBagnall TonyBagnall changed the title [ENH] Experimental new similarity search [ENH] New similarity search module Sep 26, 2023
Copy link
Contributor Author

@TonyBagnall TonyBagnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor points, maths in docstring main point. Forgot this was my own PR :)

aeon/similarity_search/top_k_similarity.py Outdated Show resolved Hide resolved
aeon/similarity_search/base.py Outdated Show resolved Hide resolved
aeon/similarity_search/base.py Outdated Show resolved Hide resolved
aeon/similarity_search/base.py Outdated Show resolved Hide resolved
aeon/similarity_search/distance_profiles/_commons.py Outdated Show resolved Hide resolved
@TonyBagnall TonyBagnall marked this pull request as ready for review October 7, 2023 17:24
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

hadifawaz1999
hadifawaz1999 previously approved these changes Oct 25, 2023
Copy link
Member

@hadifawaz1999 hadifawaz1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM great work !

Future PR suggestion to do:

1- Add test for base class with a dummy similarity search module ?
2- Add example in doc string of topksimsearch

@MatthewMiddlehurst MatthewMiddlehurst added enhancement New feature, improvement request or other non-bug code enhancement implementing framework Implementing frameworks for new learning tasks labels Oct 25, 2023
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with introducing the module, I have not really looked at the specifics of how it is implemented and am content to let it evolve as usage and implementations increase.

Most of the comments are regarding the documentation, I did not check the examples but may come back to do so (or make a PR later).

aeon/registry/tests/test_lookup.py Outdated Show resolved Hide resolved
aeon/similarity_search/base.py Show resolved Hide resolved
aeon/similarity_search/base.py Outdated Show resolved Hide resolved
docs/index.md Show resolved Hide resolved
docs/api_reference/similarity_search.rst Show resolved Hide resolved
docs/getting_started.md Show resolved Hide resolved
Copy link
Member

@baraline baraline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR should now be in a stable state after the modifications from the previous review. If everyone agree, we can move forward and progress on it in future PR.

@TonyBagnall TonyBagnall merged commit de2da96 into main Oct 30, 2023
18 checks passed
@TonyBagnall TonyBagnall deleted the experimental/similarity_search branch October 30, 2023 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, improvement request or other non-bug code enhancement implementing algorithms Implementing new algorithms/estimators implementing framework Implementing frameworks for new learning tasks similarity search Similarity search package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants