-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] Update similarity search with new base classes #1243
Comments
definitely, let me know if you want any input. BaseSeriesEstimator is still experimental, so we can adapt to your use case if necessary. |
In my mind there is 4 main cases :
All collection of series warp around the single case. In the case of top K matches, you simply update while iterating on the collection or compute them independently in parallel for all single series before isolating the top K of the collection. I'll try it out and keep you updated, but as the module is more aimed toward data analysis, I think the BaseSeriesEstimator looks fine as it is for now. |
Pasting the message I made on slack here to keep track of it. Hi everyone, regarding PR #1310 and the update of the similarity search module, and to prepare for issue #1311 and the future additions, I would like your inputs on the following proposition: Module structure :
Expected data and internal input conversion :We could accept series/collection in numpy and series in pd.Series data as input, but we would ideally convert all of it to numpy collection, and make use of the axis argument we introduced in other modules to avoid the channel problem: For query search, we would implement heavy computation numba functions in a series case and loop over it with the collection. This can for example allow passing down (between series of a collection X) best-so-far values when doing early abandon or pruning with lower bounds.
Interaction with other modules:We would still use the distance module for the naive search cases, which are the one without speed-ups (which can lead to exact or approximative results). It would also be nice to offer some visualisation through the visualisation module. Documentation and notebooks:I would like to continue to have 3 type of notebooks for similarity search :
Additionally, should we do wrappers around stumpy for Euclidean distance cases and focus on other distances ? Or try to have our own implementation to see if we end up with similar results as stumpy ?(I would ofc split the creation of each sub-modules into different PRs). Looking forward to your inputs ! If some other people want to work on this, I can create smaller issues to pick-up. |
Describe the issue
As new base classes were introduced by #996 , we should update the similarity search module to use these classes.
We are still in the context of a single query matching a series or a collection. The case of a collection of queries will come later.
Suggest a potential alternative/fix
Redisgn the API to use BaseSeriesEstimators as a base case and warp around it for collections.
Additional context
No response
The text was updated successfully, but these errors were encountered: