Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 FIX distinct queries #5654

Merged
merged 8 commits into from
Sep 23, 2022
Merged

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Sep 22, 2022

Fixes #5535

The key change in this PR is changing (approximately):

query = session.query(starting_element.id)
# add projections, joins, ...

to:

query = session.query().select_from(starting_element)
# add projections, joins, ...

In this way, we no longer need to deal with the "dummy" projection,
which was causing issues with the identification of distinct items.
(see also: https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.select_from)

I also took the opportunity to do some refactoring,
principally to make the SqlaQueryBuilder._build code more understandable.

@chrisjsewell chrisjsewell mentioned this pull request Sep 22, 2022
@chrisjsewell chrisjsewell linked an issue Sep 22, 2022 that may be closed by this pull request
@sphuber
Copy link
Contributor

sphuber commented Sep 22, 2022

Cheers @chrisjsewell , seems the new regression test passes, just a minor thing with pylint but otherwise things should be good. Is this ready for review, or do you still need to work on this. Would be good to merge a.s.a.p. so I can include it in the v2.0.4 release.

@chrisjsewell
Copy link
Member Author

Just cycling home from football, then I'll be finishing it 😉

@chrisjsewell chrisjsewell marked this pull request as ready for review September 22, 2022 20:39
@chrisjsewell
Copy link
Member Author

see the updated initial comment @sphuber

Comment on lines -158 to +159
# we discard the first item of the result row,
# which was what the query was initialised with and not one of the requested projection (see self._build)
result = result[1:]

if len(result) != self._requested_projections:
raise AssertionError(
f'length of query result ({len(result)}) does not match '
f'the number of specified projections ({self._requested_projections})'
)
# SQLA will return a Row, if only certain columns are requested,
# or a database model if that is requested
if not isinstance(result, Row):
result = (result,)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only part I was a little unsure of; since there can now be only a single projection (as opposed to >=2 before), the return type can be different, whether you are returning a whole row e.g. Node or a single column e.g Node.id.
Without this, the tests were failing, but with it, they appear fine 🤷

sphuber
sphuber previously approved these changes Sep 23, 2022
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisjsewell . Looks good, just have some really minor comments. Would also just suggest to describe the fix well in the commit message when merging, because with all the additional refactoring, it is not that easy to see what the actual fix was.

tests/orm/test_querybuilder.py Show resolved Hide resolved
Comment on lines -328 to -334
"""A method that returns an valid SQLAlchemy expression.

:param operator: The operator provided by the user ('==', '>', ...)
:param value: The value to compare with, e.g. (5.0, 'foo', ['a','b'])
:param column: an instance of sqlalchemy.orm.attributes.InstrumentedAttribute or

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to remove the docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a copy of the parent docstring, so then I think it is just general practice to have it "inherit" that?

@sphuber sphuber self-requested a review September 23, 2022 14:44
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go, thanks

@chrisjsewell chrisjsewell merged commit 9fa2d88 into aiidateam:main Sep 23, 2022
@chrisjsewell chrisjsewell deleted the fix-query branch September 23, 2022 15:23
sphuber pushed a commit that referenced this pull request Sep 23, 2022
This commit principally fixes issues with distinct queries.
Since refactoring for sqlalchemy 1.4,
the query has been initialised with a starting "dummy" projection from which to join.

```python
query = session.query(starting_table.id)
```

The returned projection was then removed.
This is problematic, though, when requesting distinct result rows,
because the dummy projection is also used to calculate uniqueness.

This has now been changed to use the `select_from` method:
https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.select_from,
such that now we can initialise without any projection.

```python
query = session.query().select_from(starting_table)
```

The backend QueryBuilder code is also refactored,
principally to make the `SqlaQueryBuilder._build` logic more understandable.

Cherry-pick: 9fa2d88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryBuilder is broken in v2.0 when joining Group onto Node
2 participants