New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't unnecessarily quote entities #950

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@keredson
Contributor

keredson commented May 31, 2016

as it makes the SQL not very human-friendly.

keredson added some commits May 31, 2016

make genned SQL easier on the eyes by removing unnecessary quotes (ex…
…: `select id from posts` instead of `select "id" from "posts"`)
Show outdated Hide outdated peewee.py Outdated
@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

Human-friendly? What the hell does that mean

Owner

coleifer commented Jun 1, 2016

Human-friendly? What the hell does that mean

in sqlite3 the fts content param must be quoted
example:
CREATE VIRTUAL TABLE manageddoc USING FTS4 (message TEXT NOT NULL, content="post"."message", tokenize="porter");
@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

example: select t1.id from posts as t1
instead of: select "t1"."id" from "posts" as "t1"

helps esp. w/ longer queries. (obviously identifiers that need them would still be quoted.)

Contributor

keredson commented Jun 1, 2016

example: select t1.id from posts as t1
instead of: select "t1"."id" from "posts" as "t1"

helps esp. w/ longer queries. (obviously identifiers that need them would still be quoted.)

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

I'd debate the merits of dropping the quotes...seems like they provide safety guarantees that would require some bad workarounds if dropped.

Owner

coleifer commented Jun 1, 2016

I'd debate the merits of dropping the quotes...seems like they provide safety guarantees that would require some bad workarounds if dropped.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

Plus, I'm working on a rewrite for 3.0 that will make all this work moot.

Owner

coleifer commented Jun 1, 2016

Plus, I'm working on a rewrite for 3.0 that will make all this work moot.

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

hmmm. i can't replicate the travis error locally:
ERROR: test_granular_transaction (playhouse.tests.test_sqlite_ext.TestUserDefinedCallbacks)
can you?

Plus, I'm working on a rewrite for 3.0 that will make all this work moot.

it wasn't supposed to be a lot of work - i've mostly been fighting the unit tests. (string comparisons are so frustratingly fragile...)

how extensive is this rewrite? would just about any enhancement be moot then? what's the timeframe?

I'd debate the merits of dropping the quotes...

from experience it is well worth it. the more you can make your sql look like what you'd normally type the easier it is to debug (both the lib and the code using the lib). even stuff like making the table aliases something other than t1...t8 goes a long way towards usability.

Contributor

keredson commented Jun 1, 2016

hmmm. i can't replicate the travis error locally:
ERROR: test_granular_transaction (playhouse.tests.test_sqlite_ext.TestUserDefinedCallbacks)
can you?

Plus, I'm working on a rewrite for 3.0 that will make all this work moot.

it wasn't supposed to be a lot of work - i've mostly been fighting the unit tests. (string comparisons are so frustratingly fragile...)

how extensive is this rewrite? would just about any enhancement be moot then? what's the timeframe?

I'd debate the merits of dropping the quotes...

from experience it is well worth it. the more you can make your sql look like what you'd normally type the easier it is to debug (both the lib and the code using the lib). even stuff like making the table aliases something other than t1...t8 goes a long way towards usability.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

from experience it is well worth it. the more you can make your sql look like what you'd normally type the easier it is to debug (both the lib and the code using the lib). even stuff like making the table aliases something other than t1...t8 goes a long way towards usability.

I think that a change like this, for the purposes of readability, could better be handled by a logging formatter or something similar.

Owner

coleifer commented Jun 1, 2016

from experience it is well worth it. the more you can make your sql look like what you'd normally type the easier it is to debug (both the lib and the code using the lib). even stuff like making the table aliases something other than t1...t8 goes a long way towards usability.

I think that a change like this, for the purposes of readability, could better be handled by a logging formatter or something similar.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

how extensive is this rewrite? would just about any enhancement be moot then? what's the timeframe?

Complete rewrite, more or less. From the point of view of the SQL generation, it's 100% new code.

Owner

coleifer commented Jun 1, 2016

how extensive is this rewrite? would just about any enhancement be moot then? what's the timeframe?

Complete rewrite, more or less. From the point of view of the SQL generation, it's 100% new code.

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

Complete rewrite, more or less. From the point of view of the SQL generation, it's 100% new code.

that's scary. why? (is there something broken w/ the current code?) is the API going to change too? also: where? i don't see an obvious branch for it.

you seem to have a sizable user base here. a 100% rewrite of the core of your ORM is a pretty drastic measure.

Contributor

keredson commented Jun 1, 2016

Complete rewrite, more or less. From the point of view of the SQL generation, it's 100% new code.

that's scary. why? (is there something broken w/ the current code?) is the API going to change too? also: where? i don't see an obvious branch for it.

you seem to have a sizable user base here. a 100% rewrite of the core of your ORM is a pretty drastic measure.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

that's scary. why? ...more questions...

Because I know how to implement it in a much cleaner, more robust way.

I'll try to preserve the APIs, though there are some that may change because the current APIs are confusing. Probably nothing major in the actual query building APIs but I'm not positive yet.

I haven't pushed the code because it's not ready for people to look at yet.

you seem to have a sizable user base here. a 100% rewrite of the core of your ORM is a pretty drastic measure.

I rewrote it for 2.0 and the improvements were massive.

Furthermore, the entire line of your questions comes off as extremely entitled and presumptuous.

Owner

coleifer commented Jun 1, 2016

that's scary. why? ...more questions...

Because I know how to implement it in a much cleaner, more robust way.

I'll try to preserve the APIs, though there are some that may change because the current APIs are confusing. Probably nothing major in the actual query building APIs but I'm not positive yet.

I haven't pushed the code because it's not ready for people to look at yet.

you seem to have a sizable user base here. a 100% rewrite of the core of your ORM is a pretty drastic measure.

I rewrote it for 2.0 and the improvements were massive.

Furthermore, the entire line of your questions comes off as extremely entitled and presumptuous.

@coleifer coleifer closed this Jun 1, 2016

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

Furthermore, the entire line of your questions comes off as extremely entitled and presumptuous.

wasn't intending any offense. just trying to understand the lay of the land and plan future development. just one more question then:

if there is a (currently totally private) pending 3.0 re-write, are you not accepting any external PRs until that is complete? and if so, is there an ETA on when 3.0 will be made public?

Contributor

keredson commented Jun 1, 2016

Furthermore, the entire line of your questions comes off as extremely entitled and presumptuous.

wasn't intending any offense. just trying to understand the lay of the land and plan future development. just one more question then:

if there is a (currently totally private) pending 3.0 re-write, are you not accepting any external PRs until that is complete? and if so, is there an ETA on when 3.0 will be made public?

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

I'm certainly accepting prs, but I don't think this particular pr is one I'd want to merge. No timeline but I'll definitely document the changes and make an announcement when it's closer.

Owner

coleifer commented Jun 1, 2016

I'm certainly accepting prs, but I don't think this particular pr is one I'd want to merge. No timeline but I'll definitely document the changes and make an announcement when it's closer.

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

fair enough. mind if i ask what could make it better? micro-benchmarks showing the quote method is still performant, or other improvements to the quality of the PR? or is it the goal of the PR you've no interest in?

Contributor

keredson commented Jun 1, 2016

fair enough. mind if i ask what could make it better? micro-benchmarks showing the quote method is still performant, or other improvements to the quality of the PR? or is it the goal of the PR you've no interest in?

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

Well, at first I noticed what seemed to me to be a potential performance issue. But when I thought about it, the implementation just doesn't seem elegant and the same could be easily accomplished without modifying existing logic...ie logging handler, custom query compiler etc

Owner

coleifer commented Jun 1, 2016

Well, at first I noticed what seemed to me to be a potential performance issue. But when I thought about it, the implementation just doesn't seem elegant and the same could be easily accomplished without modifying existing logic...ie logging handler, custom query compiler etc

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 1, 2016

Contributor

ignoring the unit test changes it's like 13 lines of code. that seems fairly simple to me. :)

a logging handler is a bad approach, as it's effectively going to have to implement a SQL parser to be accurate. plus the point of logging is to know exactly what was sent to the database, not some munged copy of it.

a custom query compiler is more palatable. but does anyone actually want all the quotes? i can't imagine why they would, and if that's true it makes more sense to just change the default behavior.

if the concern is safety, the quoting/no-quoting rule as implemented covers all the supported databases. See:

Postgres:

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($).

Mysql:

Backticks are to be used for table and column identifiers, but are only necessary when the identifier is a MySQL reserved keyword, or when the identifier contains whitespace characters or characters beyond a limited set (see below)

SQLite:
I can't find an official spec (http://www.sqlite.org/lang_createtable.html is the closest i found) but many stackoverflow comments suggest the same rule (alphanumeric+underscore, starting w/ an alpha).

i also just updated the reserved keyword set to have complete coverage over postgres/mysql/sqlite3/mssql: keredson@0068226

more generally, do you really not see the usability gain here? isn't this:

SELECT c1.id, c1.blog_id, c1.comment 
FROM comment AS c1 
WHERE (c1.blog_id IN (
  SELECT b1.pk 
  FROM blog AS b1 
  WHERE (b1.user_id IN (
    SELECT u1.id 
    FROM users AS u1 
    LEFT OUTER JOIN blog AS b2 ON (u1.id = b2.user_id) 
    ORDER BY u1.id
  ))
))

clearly more readable than this?

SELECT "t1"."id", "t1"."blog_id", "t1"."comment" 
FROM "comment" AS t1 
WHERE ("t1"."blog_id" IN (
  SELECT "t2"."pk" 
  FROM "blog" AS t2 
  WHERE ("t2"."user_id" IN (
    SELECT "t3"."id" 
    FROM "users" AS t3 
    LEFT OUTER JOIN "blog" AS t4 ON ("t3"."id" = "t4"."user_id") 
    ORDER BY "t3"."id"
  ))
))

i understand it's kind of tedious/shit programming and you've probably personally got better things to work on, but when you've got someone volunteering to shovel it for you... :)

Contributor

keredson commented Jun 1, 2016

ignoring the unit test changes it's like 13 lines of code. that seems fairly simple to me. :)

a logging handler is a bad approach, as it's effectively going to have to implement a SQL parser to be accurate. plus the point of logging is to know exactly what was sent to the database, not some munged copy of it.

a custom query compiler is more palatable. but does anyone actually want all the quotes? i can't imagine why they would, and if that's true it makes more sense to just change the default behavior.

if the concern is safety, the quoting/no-quoting rule as implemented covers all the supported databases. See:

Postgres:

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($).

Mysql:

Backticks are to be used for table and column identifiers, but are only necessary when the identifier is a MySQL reserved keyword, or when the identifier contains whitespace characters or characters beyond a limited set (see below)

SQLite:
I can't find an official spec (http://www.sqlite.org/lang_createtable.html is the closest i found) but many stackoverflow comments suggest the same rule (alphanumeric+underscore, starting w/ an alpha).

i also just updated the reserved keyword set to have complete coverage over postgres/mysql/sqlite3/mssql: keredson@0068226

more generally, do you really not see the usability gain here? isn't this:

SELECT c1.id, c1.blog_id, c1.comment 
FROM comment AS c1 
WHERE (c1.blog_id IN (
  SELECT b1.pk 
  FROM blog AS b1 
  WHERE (b1.user_id IN (
    SELECT u1.id 
    FROM users AS u1 
    LEFT OUTER JOIN blog AS b2 ON (u1.id = b2.user_id) 
    ORDER BY u1.id
  ))
))

clearly more readable than this?

SELECT "t1"."id", "t1"."blog_id", "t1"."comment" 
FROM "comment" AS t1 
WHERE ("t1"."blog_id" IN (
  SELECT "t2"."pk" 
  FROM "blog" AS t2 
  WHERE ("t2"."user_id" IN (
    SELECT "t3"."id" 
    FROM "users" AS t3 
    LEFT OUTER JOIN "blog" AS t4 ON ("t3"."id" = "t4"."user_id") 
    ORDER BY "t3"."id"
  ))
))

i understand it's kind of tedious/shit programming and you've probably personally got better things to work on, but when you've got someone volunteering to shovel it for you... :)

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 1, 2016

Owner

It's not so readable as to warrant the code changes, no. Not in my opinion. Having a list of SQL reserved words is awful. Using a regex on every single entity is awful. Dude, seriously. You can strip the quotes using a log handler or a custom query compiler. Jesus.

Owner

coleifer commented Jun 1, 2016

It's not so readable as to warrant the code changes, no. Not in my opinion. Having a list of SQL reserved words is awful. Using a regex on every single entity is awful. Dude, seriously. You can strip the quotes using a log handler or a custom query compiler. Jesus.

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 2, 2016

Contributor

Having a list of SQL reserved words is awful.

this is a database ORM. why is it being aware of SQL reserved words awful?

Using a regex on every single entity is awful.

just stating an "it's bad" opinion without any explanation of why is of little use. you've already conceded your original performance concern is likely unwarranted. you use re elsewhere, so you're clearly not opposed to them in general. regex seems a reasonable solution to a "does this string follow these rules?" conditional to me.

Contributor

keredson commented Jun 2, 2016

Having a list of SQL reserved words is awful.

this is a database ORM. why is it being aware of SQL reserved words awful?

Using a regex on every single entity is awful.

just stating an "it's bad" opinion without any explanation of why is of little use. you've already conceded your original performance concern is likely unwarranted. you use re elsewhere, so you're clearly not opposed to them in general. regex seems a reasonable solution to a "does this string follow these rules?" conditional to me.

@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer

coleifer Jun 2, 2016

Owner

I never concede! It's still an issue. But even if it weren't, these changes still are bad.

Owner

coleifer commented Jun 2, 2016

I never concede! It's still an issue. But even if it weren't, these changes still are bad.

@keredson

This comment has been minimized.

Show comment
Hide comment
@keredson

keredson Jun 2, 2016

Contributor

I never concede! It's still an issue.

joking aside, that's a concern i can work with! :)

here's with the quoting regex (and set check):

$ python test_quote_speed.py 
10000 loops, 11.901063 total time, 0.001190 per iteration
$ python test_quote_speed.py 
10000 loops, 11.715667 total time, 0.001172 per iteration
$ python test_quote_speed.py 
10000 loops, 11.997566 total time, 0.001200 per iteration

here's without it (rev 7670c9e):

$ python test_quote_speed.py 
10000 loops, 11.948870 total time, 0.001195 per iteration
$ python test_quote_speed.py 
10000 loops, 11.784187 total time, 0.001178 per iteration
$ python test_quote_speed.py 
10000 loops, 11.700121 total time, 0.001170 per iteration

you can see it's within the statistical noise.

test code:

import logging, os, timeit
import peewee as pw

db = pw.PostgresqlDatabase('test_quoting_speed')

class Person(pw.Model):
  first_name = pw.CharField()
  class Meta:
    database = db

db.connect()
db.drop_table(Person)
db.create_tables([Person])

for i in range(100):
  Person.create(first_name='Me%i' % i)

def to_time():
  list(Person.select().where(~(Person.first_name << ('Me10', 'Me20'))))

count = 10000
seconds = timeit.timeit(to_time, number=count)
print '%i loops, %f total time, %f per iteration' % (count, seconds, seconds / count)
Contributor

keredson commented Jun 2, 2016

I never concede! It's still an issue.

joking aside, that's a concern i can work with! :)

here's with the quoting regex (and set check):

$ python test_quote_speed.py 
10000 loops, 11.901063 total time, 0.001190 per iteration
$ python test_quote_speed.py 
10000 loops, 11.715667 total time, 0.001172 per iteration
$ python test_quote_speed.py 
10000 loops, 11.997566 total time, 0.001200 per iteration

here's without it (rev 7670c9e):

$ python test_quote_speed.py 
10000 loops, 11.948870 total time, 0.001195 per iteration
$ python test_quote_speed.py 
10000 loops, 11.784187 total time, 0.001178 per iteration
$ python test_quote_speed.py 
10000 loops, 11.700121 total time, 0.001170 per iteration

you can see it's within the statistical noise.

test code:

import logging, os, timeit
import peewee as pw

db = pw.PostgresqlDatabase('test_quoting_speed')

class Person(pw.Model):
  first_name = pw.CharField()
  class Meta:
    database = db

db.connect()
db.drop_table(Person)
db.create_tables([Person])

for i in range(100):
  Person.create(first_name='Me%i' % i)

def to_time():
  list(Person.select().where(~(Person.first_name << ('Me10', 'Me20'))))

count = 10000
seconds = timeit.timeit(to_time, number=count)
print '%i loops, %f total time, %f per iteration' % (count, seconds, seconds / count)
@coleifer

This comment has been minimized.

Show comment
Hide comment
@coleifer
Owner

coleifer commented May 17, 2017

Explanation #949 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment