You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use sqlparse via django-debug-toolbar. I was noticing some very slow performance when rendering some of our queries, in particular those having the form SELECT ... FROM ... WHERE id IN (<a list of several hundred integers>).
After doing some more in-depth performance analysis, I found that there were at least two sources of quadratic performance when formatting a SQL statement with a filter stack having grouping enabled:
In the TokenList.group_tokens() method, when called with extend=True, the value of the group is recomputed each time it is extended. Computing the value of the group is linear with the number of tokens in the group. For queries of the form referenced above, the group_identifier_list() function ends up calling TokenList.group_tokens() with extend=True once for each of the integers in the IN clause, resulting in quadratic behavior.
In the TokenList._token_matching() method, a slice of the self.tokens list is taken. Taking the slice is roughly linear with the number of tokens, and there are several places where TokenList._token_matching() is also called linearly with respect to the number of tokens, resulting in quadratic behavior.
I use sqlparse via django-debug-toolbar. I was noticing some very slow performance when rendering some of our queries, in particular those having the form
SELECT ... FROM ... WHERE id IN (<a list of several hundred integers>)
.After doing some more in-depth performance analysis, I found that there were at least two sources of quadratic performance when formatting a SQL statement with a filter stack having grouping enabled:
In the
TokenList.group_tokens()
method, when called withextend=True
, the value of the group is recomputed each time it is extended. Computing the value of the group is linear with the number of tokens in the group. For queries of the form referenced above, thegroup_identifier_list()
function ends up callingTokenList.group_tokens()
withextend=True
once for each of the integers in theIN
clause, resulting in quadratic behavior.In the
TokenList._token_matching()
method, a slice of theself.tokens
list is taken. Taking the slice is roughly linear with the number of tokens, and there are several places whereTokenList._token_matching()
is also called linearly with respect to the number of tokens, resulting in quadratic behavior.Here is a link to an IPython notebook demonstrating this quadratic performance: https://gist.github.com/living180/ad9f83b6e1fb494e1305a281d4b552b3
I have separate fixes for each of these issues which I will submit as pull requests.
The text was updated successfully, but these errors were encountered: