New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peewee should not gen SQL it knows is invalid #964

Closed
keredson opened this Issue Jun 6, 2016 · 12 comments

Comments

Projects
None yet
2 participants
@keredson
Contributor

keredson commented Jun 6, 2016

Post.select().where(Person.name=='derek').get()

generates:

SELECT "t1"."id", "t1"."title", "t1"."author_id" FROM "post" AS t1 
WHERE ("t2"."name" = ?) 
LIMIT 1 OFFSET 0

which is then executed by the database, which throws an error, as "t2"."name" doesn't exist.

(IMO) it should throw some sort of unreferenced table exception, rather than relying on the database to throw the error, since it already knows that it'll never work.

this code example is admittedly contrived (and obvious what the bug is), but for more complicated queries an error thrown earlier makes the API easier to use (as the error is more explicit about the root cause). and the check is cheap.

i have a fix for this locally. it's intertwined with some other stuff, but i'll be happy to pull it into a PR if it's agreed this is the correct behavior.

@coleifer

This comment has been minimized.

Owner

coleifer commented Jun 6, 2016

How would peewee do this in a performant, reliable manner?

Furthermore, if the error gets handled by the database, why is that so much worse than having it raised by Peewee?

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 6, 2016

How would peewee do this in a performant, reliable manner?

so in https://github.com/coleifer/peewee/blob/master/peewee.py#L1485 AliasMap offers a dictionary lookup which has a side effect of adding the object to the dict if it's not already there. this is useful for generating the aliases in the first place in the from/join clauses, but the where clause should not be triggering new table aliases. in my fix i added a second lookup method that didn't have the side effect, only called by _parse_field.

Furthermore, if the error gets handled by the database, why is that so much worse than having it raised by Peewee?

i'm not writing the SQL, peewee is. and peewee knows it's writing SQL that should never work. and errors like no such column: t2.name are opaque. in this simple example it's obvious, but outside of that it's not. Something like UnreferencedTableException: Table 'Person' isn't being selected from, therefore condition 'Person.name==X' isn't a valid constraint. is more useful.

I'm not saying this is some major make or break thing here. It's just a minor tweak to make it a little better.

@coleifer

This comment has been minimized.

Owner

coleifer commented Jun 6, 2016

but the where clause should not be triggering new table aliases

Thats a pretty good idea, my concern is if it actually holds true in all cases. If you had a correlated subquery in the where clause, for instance? Not sure but there may be other edge cases?

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 6, 2016

i think so, but it's possible i don't understand the full scope of what _parse_field does. simple inner queries work at least:

Post.select().where(Post.title << Post.select(Post.title)).get()
('SELECT "t1"."id", "t1"."title", "t1"."author_id" FROM "post" AS t1 WHERE ("t1"."title" IN (SELECT "t2"."title" FROM "post" AS t2)) LIMIT 1 OFFSET 0', [])

as does:

post1 = Post.alias()
post2 = Post.alias()
post1.select().where(post1.title << post2.select(post2.title).where(post1.id==post2.id)).get()
('SELECT "t1"."id", "t1"."title", "t1"."author_id" FROM "post" AS t1 WHERE ("t1"."title" IN (SELECT "t2"."title" FROM "post" AS t2 WHERE ("t1"."id" = "t2"."id"))) LIMIT 1 OFFSET 0', [])

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 6, 2016

i found in from_() it was relying on the side effect from selecting the fields. but that was an easy fix.

also a bunch of unit test failures along the lines of:

AssertionError:
'SELECT 1 FROM (SELECT "t1"."alpha" FROM "alpha" AS t1 WHERE ("t1"."alpha" < ?) UNION SELECT "t2"."beta" FROM "beta" AS t2 WHERE ("t2"."beta" < ?) UNION SELECT "t4"."beta" FROM "beta" AS t4 WHERE ("t4"."beta" > ?)) AS cq'
 != 
'SELECT 1 FROM (SELECT "t1"."alpha" FROM "alpha" AS t1 WHERE ("t1"."alpha" < ?) UNION SELECT "t2"."beta" FROM "beta" AS t2 WHERE ("t2"."beta" < ?) UNION SELECT "t3"."beta" FROM "beta" AS t3 WHERE ("t3"."beta" > ?)) AS cq'

sigh..... :)

anyway, since you're not opposed i'll go through all these and make a dedicated PR.

@coleifer

This comment has been minimized.

Owner

coleifer commented Jun 6, 2016

Hold off on fixing the tests .. maybe just a diff of the actual code changes? I'd like to think about it before you spend too much effort on a clean patch.

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 6, 2016

can do.

derek@derek-hvst:~/projects/peewee2$ git diff
diff --git a/peewee.py b/peewee.py
index 070328d..1fb7144 100644
--- a/peewee.py
+++ b/peewee.py
@@ -1416,6 +1416,11 @@ class AliasMap(object):
             self.add(obj)
         return self._alias_map[obj]

+    def lookup(self, obj):
+        if obj in self._alias_map:
+            return self._alias_map[obj]
+        raise ProgrammingError('%s is not a part of this query' % obj)
+
     def __contains__(self, obj):
         return obj in self._alias_map

@@ -1566,7 +1571,7 @@ class QueryCompiler(object):
     def _parse_field(self, node, alias_map, conv):
         if alias_map:
             sql = '.'.join((
-                self.quote(alias_map[node.model_class]),
+                self.quote(alias_map.lookup(node.model_class)),
                 self.quote(node.db_column)))
         else:
             sql = self.quote(node.db_column)
@@ -1767,6 +1772,7 @@ class QueryCompiler(object):
             if query._from is None:
                 clauses.append(model.as_entity().alias(alias_map[model]))
             else:
+                [alias_map[x] for x in query._from] # seed the alias map
                 clauses.append(CommaClause(*query._from))

         if query._windows is not None:
@coleifer

This comment has been minimized.

Owner

coleifer commented Jun 7, 2016

Ahh, I see what you were talking about -- I wasn't clear on how you were going to tell, in _parse_field(), that you were in a place where new aliases shouldn't be added. And the way you've worked around is to pre-seed the alias map with the sources in _from. If you were wanting to take this approach, you'd probably want to look also at _joins.

I'm dubious of this approach because, to be really sure, you'd need to recursively check _from, _joins, _where, _select, to find any possibly valid sources that could be referenced.

I think I'll pass on this, but if you want it in your implementation, you can subclass Database and override the compiler_class attribute with a QueryCompiler subclass that has the above logic.

@coleifer coleifer closed this Jun 7, 2016

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 7, 2016

I'm dubious of this approach because, to be really sure, you'd need to recursively check _from, _joins, _where, _select, to find any possibly valid sources that could be referenced.

it would seem to be that _from and _join would be the only two places. _where and _select can't define new table aliases. inside the inner query they can obviously, but that would still be inside a from/join.

@coleifer

This comment has been minimized.

Owner

coleifer commented Jun 7, 2016

I was just thinking something like:

Post.select(Post.content, Tag.select(fn.GROUP_CONCAT(' ', Tag.tag)).where(Tag.post == Post.id))
@keredson

This comment has been minimized.

Contributor

keredson commented Jun 7, 2016

hmmm. looks ok:

SELECT "t1"."title", (SELECT GROUP_CONCAT(?, "t2"."tag") FROM "tag" AS t2 WHERE ("t2"."post_id" = "t1"."id")) FROM "post" AS t1'

alias still gets set/added inside the from.

keredson added a commit to keredson/peewee that referenced this issue Jun 7, 2016

@keredson

This comment has been minimized.

Contributor

keredson commented Jun 15, 2016

looks like this is going to be fixed via #983 (comment) 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment