Quadratic performance when grouping is enabled. #621

living180 · 2021-05-19T12:07:46Z

I use sqlparse via django-debug-toolbar. I was noticing some very slow performance when rendering some of our queries, in particular those having the form SELECT ... FROM ... WHERE id IN (<a list of several hundred integers>).

After doing some more in-depth performance analysis, I found that there were at least two sources of quadratic performance when formatting a SQL statement with a filter stack having grouping enabled:

In the TokenList.group_tokens() method, when called with extend=True, the value of the group is recomputed each time it is extended. Computing the value of the group is linear with the number of tokens in the group. For queries of the form referenced above, the group_identifier_list() function ends up calling TokenList.group_tokens() with extend=True once for each of the integers in the IN clause, resulting in quadratic behavior.
In the TokenList._token_matching() method, a slice of the self.tokens list is taken. Taking the slice is roughly linear with the number of tokens, and there are several places where TokenList._token_matching() is also called linearly with respect to the number of tokens, resulting in quadratic behavior.

Here is a link to an IPython notebook demonstrating this quadratic performance: https://gist.github.com/living180/ad9f83b6e1fb494e1305a281d4b552b3

I have separate fixes for each of these issues which I will submit as pull requests.

The text was updated successfully, but these errors were encountered:

This was referenced May 19, 2021

Avoid tokens copy #622

Merged

Compute TokenList.value dynamically #623

Closed

dvchristianbors mentioned this issue Apr 6, 2022

fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance apache/superset#19572

Open

9 tasks

This was referenced Mar 28, 2023

Compute TokenList.value dynamically (v2) #710

Open

Performance improvement on str(tokenlist) #281

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic performance when grouping is enabled. #621

Quadratic performance when grouping is enabled. #621

living180 commented May 19, 2021

Quadratic performance when grouping is enabled. #621

Quadratic performance when grouping is enabled. #621

Comments

living180 commented May 19, 2021