Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mongo length operator functionality with length aliases #222

Merged
merged 20 commits into from Mar 30, 2020

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Mar 10, 2020

This PR is a WIP/proof of concept for the introduction of "length aliases" to attempt to fix the issues in #86.

This PR adds a post-processing function to the MongoTransformer which recursively descends into the transformed filter and tries to change any {"x": {"$size": 3}} queries with the value of a value of corresponding aliased field, e.g. elements->nelements. The benefits are minimal when just performing LENGTH = value queries (as $size can do this rapidly anyway), but this will greatly speed up LENGTH > 3 queries, which mongo does not support directly (see #86 discussion). Length operator queries that do not have a defined length alias will now fallback to using the existence test approach, i.e. LENGTH > 3 becomes "does element 4 exist?".

Extra changes needed to support this:

  • splitting of entry_collections into submodules (entry_collections.entry_collections and entry_collections.mongo).
  • addition of LENGTH_ALIASES attribute to mappers and config.
  • addition of PROVIDER_FIELDS to mapper to mirror other aliases.
  • use of $exists for fallback length operations
  • moved all aliasing code out of collections and into the transformer, as post-processing
  • all of the spec-defined length aliases are now part of the default StructureMapper, but will be ignored unless a Transformer makes use of them.

@ml-evs ml-evs added enhancement New feature or request priority/medium Issue or PR with a consensus of medium priority transformer/MongoDB Related to filter transformer for MongoDB labels Mar 10, 2020
@codecov
Copy link

codecov bot commented Mar 10, 2020

Codecov Report

Merging #222 into master will increase coverage by 0.42%.
The diff coverage is 95.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #222      +/-   ##
==========================================
+ Coverage   87.11%   87.53%   +0.42%     
==========================================
  Files          43       45       +2     
  Lines        1862     1941      +79     
==========================================
+ Hits         1622     1699      +77     
- Misses        240      242       +2     
Flag Coverage Δ
#unittests 87.53% <95.58%> (+0.42%) ⬆️
Impacted Files Coverage Δ
...made/server/entry_collections/entry_collections.py 88.88% <88.88%> (ø)
optimade/filtertransformers/mongo.py 96.36% <97.29%> (+0.71%) ⬆️
optimade/server/config.py 100.00% <100.00%> (ø)
optimade/server/entry_collections/__init__.py 100.00% <100.00%> (ø)
optimade/server/entry_collections/mongo.py 86.73% <100.00%> (ø)
optimade/server/mappers/entries.py 98.11% <100.00%> (+0.33%) ⬆️
optimade/server/mappers/structures.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7768f62...9be0f61. Read the comment docs.

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting implementation. Very nice. I have some minor suggested changes.

optimade/server/mappers/entries.py Outdated Show resolved Hide resolved
optimade/server/mappers/entries.py Outdated Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Mar 13, 2020

Interesting implementation. Very nice. I have some minor suggested changes.

Are you happy with the proposed structure then, or would you prefer it in MongoCollection? I'm on the fence myself...

@ml-evs
Copy link
Member Author

ml-evs commented Mar 13, 2020

I've just realised another feature that could use this approach, filtering on relationships. I've implemented this on my branch ml-evs/relationships_filtering. I can make a PR once we're ironed out this one.

@CasperWA
Copy link
Member

Interesting implementation. Very nice. I have some minor suggested changes.

Are you happy with the proposed structure then, or would you prefer it in MongoCollection? I'm on the fence myself...

Yeah. I see what you mean. I don't know. I would probably prefer it in the collection class, however, it may also be useful for other backends? But I'm not entirely sure.

@ml-evs
Copy link
Member Author

ml-evs commented Mar 17, 2020

Interesting implementation. Very nice. I have some minor suggested changes.

Are you happy with the proposed structure then, or would you prefer it in MongoCollection? I'm on the fence myself...

Yeah. I see what you mean. I don't know. I would probably prefer it in the collection class, however, it may also be useful for other backends? But I'm not entirely sure.

I've completely rewritten this process to make it more general across different post-processing operations, but think the actual descent into the query is going to very mongo specific. As it is, we can definitely lift this out and put it into the collection class.

@CasperWA
Copy link
Member

(...) As it is, we can definitely lift this out and put it into the collection class.

do it

@ml-evs
Copy link
Member Author

ml-evs commented Mar 17, 2020

image

This is perhaps the worst feature GH have ever added, the reddit-ification of the "workplace" continues unbounded

@ml-evs ml-evs force-pushed the ml-evs/length_operator_queries branch 2 times, most recently from bdc1205 to b4f528f Compare March 19, 2020 14:38
@ml-evs ml-evs force-pushed the ml-evs/length_operator_queries branch 3 times, most recently from 2e034a1 to 51cf0cd Compare March 25, 2020 11:21
shyamd
shyamd previously approved these changes Mar 25, 2020
Copy link
Contributor

@shyamd shyamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ml-evs
Copy link
Member Author

ml-evs commented Mar 26, 2020

I'll give @CasperWA a while (~week) to look before merging this.

Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking awesome, thanks @ml-evs!

Only a few minor comments.
I am truly enjoying that the Collection has been trimmed a bit, and the Transformer can handle all transformation-related aspects of the query, instead of having a bit here and there.

optimade/filtertransformers/mongo.py Outdated Show resolved Hide resolved
optimade/filtertransformers/mongo.py Outdated Show resolved Hide resolved
optimade/filtertransformers/mongo.py Show resolved Hide resolved
optimade/filtertransformers/mongo.py Show resolved Hide resolved
optimade/server/mappers/entries.py Show resolved Hide resolved
optimade/server/mappers/entries.py Show resolved Hide resolved
tests/filtertransformers/test_mongo.py Show resolved Hide resolved

def test_unaliased_length_operator(self):
transformer = MongoTransformer()
parser = LarkParser(version=self.version, variant=self.variant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as the previous comment. And even the transformer doesn't need to be redefined, since it should be t from line 16?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformer in these tests needs to take a mapper as an argument, so the same one can't be reused. This means the parser also can't be reused (its out of scope in the methods, we only expose the transform method which is tied to the transformer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see the def setUp(), which runs just prior of this test method, why wouldn't that suffice for the tests in this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformer in these tests needs to take a mapper as an argument (...)

But in this exact test method, the transformer doesn't take a mapper as an argument? I am confused :( I see that it makes sense to instantiate a new transformer for the other test methods you added, but not for this one.

And how is the parser out of scope? It doesn't depend on the transformer as far as I know? Or? :|

Copy link
Member

@CasperWA CasperWA Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformer in these tests needs to take a mapper as an argument, so the same one can't be reused. This means the parser also can't be reused (its out of scope in the methods, we only expose the transform method which is tied to the transformer).

AAAHHH I see the whole problem now - sorry!! Yeah, we set both the parser and transformer in the setUp method, but only keep the transform in the instance..... right, right, right.

So would it make sense then to expose the instantiated transformer and parser?
Or maybe create a transform_mapped lambda function?

Also, I think the setUp() should be setUpClass(), right? Is there a need to instantiate this before each test method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to the wrong comment initially, sorry! I've amended my commit with your suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would it make sense then to expose the instantiated transformer and parser?

The other two new tests can't use this anyway, so I think it's now fine as is

Copy link
Member

@CasperWA CasperWA Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll stop poking at this.

(although, just tested locally changing this to setUpClass(), and it works only if one then makes transform an actual separate class method, and consequently "saves" the instantiated parser and transformer as class attributes.)

tests/filtertransformers/test_mongo.py Show resolved Hide resolved
tests/filtertransformers/test_mongo.py Show resolved Hide resolved
@ml-evs ml-evs requested a review from CasperWA March 30, 2020 16:19
Co-authored-by: Casper Welzel Andersen <CasperWA@users.noreply.github.com>
@ml-evs ml-evs force-pushed the ml-evs/length_operator_queries branch from 7884dea to 9be0f61 Compare March 30, 2020 16:22
Copy link
Member

@CasperWA CasperWA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est magnifique !

Thanks for the additions and updates. I think this quite an elegant solution in the end 👍
Please merge at your leisure.

@ml-evs ml-evs merged commit 6d9a179 into master Mar 30, 2020
@ml-evs ml-evs deleted the ml-evs/length_operator_queries branch March 30, 2020 16:49
@ml-evs ml-evs mentioned this pull request Mar 31, 2020
@CasperWA CasperWA mentioned this pull request Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority/medium Issue or PR with a consensus of medium priority transformer/MongoDB Related to filter transformer for MongoDB
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants