Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/info/<entry-endpoint> missing sortable key under each property #273

Closed
CasperWA opened this issue May 18, 2020 · 4 comments · Fixed by #274
Closed

/info/<entry-endpoint> missing sortable key under each property #273

CasperWA opened this issue May 18, 2020 · 4 comments · Fixed by #274
Assignees
Labels
help wanted Extra attention is needed priority/high Issue or PR with a consensus of high priority

Comments

@CasperWA
Copy link
Member

Since the server supports sorting via the sort query parameter, according to the spec here(https://github.com/Materials-Consortia/OPTIMADE/blob/develop/optimade.rst#entry-listing-url-query-parameters) and here, we MUST add the sortable key with value true for the field properties that can be used for sorting under the relevant /info/<entry-endpoint> endpoint.

We already have this correctly in the models, it is simply a matter of adding this to the example server.

@CasperWA CasperWA added the priority/high Issue or PR with a consensus of high priority label May 18, 2020
@CasperWA CasperWA self-assigned this May 18, 2020
@CasperWA CasperWA added the help wanted Extra attention is needed label May 18, 2020
@CasperWA
Copy link
Member Author

@ml-evs you seem to have a good deal of knowledge and know-how concerning MongoDB.
I was trying to figure out what types of data MongoDB can sort, and it seems mostly everything (well, everything as far as I understand), however, there could be an issue if sorting on too many fields, where one would need to either pre-create a compound document with an index that depends on the multiple sorting parameters OR one can combine sort with limit.
The issue seems to be a self-imposed in-memory limit of 32 MB, which is set aside to deal with a non-indexed sort.

So my question is now: Should we handle this in our example server in some way? Even though with the test data it will most likely never exceed an in-memory usage over 32 MB.
Furthermore, do you know more about if there are some types that cannot be sorted?

@CasperWA
Copy link
Member Author

Testing it locally, I have found that all of structures attributes can be sorted.
Lists will be sorted according to their first entry only (and the first str or number value it can find), no matter the length of the list or anything else.

This way, MongoDB actually allows sorting on all properties, but it may not be what a user expects. However, there is no definition on what to expect in the spec or anywhere else. So maybe it's fine? Or should we try and do something clever and sort on Mongo meta data?

@ml-evs
Copy link
Member

ml-evs commented May 18, 2020

@ml-evs you seem to have a good deal of knowledge and know-how concerning MongoDB.

I can only apologise for giving you that impression!

I was trying to figure out what types of data MongoDB can sort, and it seems mostly everything (well, everything as far as I understand), however, there could be an issue if sorting on too many fields, where one would need to either pre-create a compound document with an index that depends on the multiple sorting parameters OR one can combine sort with limit.
Furthermore, do you know more about if there are some types that cannot be sorted?

My understand is that it can sort anything it can store, i.e. any BSON type which is also implied from the sorting order by type (the only exception seems to be some mysterious "javascript" type). Though I think some of the type comparisons wouldn't be particularly useful in our context (e.g. arrays).

The issue seems to be a self-imposed in-memory limit of 32 MB, which is set aside to deal with a non-indexed sort.
So my question is now: Should we handle this in our example server in some way? Even though with the test data it will most likely never exceed an in-memory usage over 32 MB.

I don't have any experience of this getting triggered either, and have only ever sorted over simple indexes.

I'd prefer a conservative approach which allows database providers to specify sortable fields as configuration options (i.e. sortable=False by default). I'm not sure the effort of supporting multiple sort fields is worth it, at least not with the current OPTIMADE properties (in which case we need to throw an error)?

I guess for our default implementation we could enable sorting over something simple like nelements, just to show how it can be set up, maybe with the app code checking for the existence of the index and making one if not found.

@CasperWA
Copy link
Member Author

CasperWA commented May 18, 2020

I'd prefer a conservative approach which allows database providers to specify sortable fields as configuration options (i.e. sortable=False by default). I'm not sure the effort of supporting multiple sort fields is worth it, at least not with the current OPTIMADE properties (in which case we need to throw an error)?

I don't know that it makes sense to put it in the configuration, since it's very backend-dependent, i.e., if you're using MongoDB, you'd probably most likely use this example server as is (almost), while if you have another backend there are a number of changes you would need to implement anyway - so another one for sorting should not be an issue.

I guess for our default implementation we could enable sorting over something simple like nelements, just to show how it can be set up, maybe with the app code checking for the existence of the index and making one if not found.

I should probably have started by writing that we already have sorting enabled - and at the moment for any property/field.
This issue is merely adressing the discrepency between that fact and that the spec explicitly specifices that the server MUST then also supply the sortable key with value True under /info/<sortable-entry-endpoint>.

However that said, we should probably still do something Mongo-specific in the entry collection for handling this more robustly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed priority/high Issue or PR with a consensus of high priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants