New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow fix-tree queries on RDS Postgres #6522

Open
ecmendenhall opened this Issue Sep 17, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@ecmendenhall

ecmendenhall commented Sep 17, 2018

Summary

Hi DjangoCMS team! We recently upgraded our DjangoCMS site from 3.4.2 to 3.5.2 and ran into similar issues with very slow queries during the fix-tree management command as reported in #5883. I know that issue was closed as an upstream issue in Treebeard, but we ran into an interesting interaction between RDS Postgres and the CMS plugin tree and I wanted to write it up here for the benefit of others.

Here's the whole long story:

We have about ~75k rows in the cms_cmsplugin table. The fix-tree command took about 90 minutes to complete running against an RDS m4.4xlarge, including one particularly gnarly 17-minute query:

SELECT tbn1.path, tbn1.numchild, (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____') FROM "cms_cmsplugin" AS tbn1 WHERE tbn1.numchild != (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____')

We have a statement timeout set to kill long-running queries and this one was killed right away the first time we tried it in our staging environment.

Here's the query plan Postgres returned when I ran an EXPLAIN:

gc2_101=> EXPLAIN SELECT tbn1.path, tbn1.numchild, (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____') FROM "cms_cmsplugin" AS tbn1 WHERE tbn1.numchild != (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____')
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Seq Scan on cms_cmsplugin tbn1  (cost=0.00..279954175.25 rows=67567 width=19)
   Filter: (numchild <> (SubPlan 2))
   SubPlan 1
     ->  Aggregate  (cost=2066.45..2066.47 rows=1 width=8)
           ->  Seq Scan on cms_cmsplugin tbn2  (cost=0.00..2065.61 rows=340 width=0)
                 Filter: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
   SubPlan 2
     ->  Aggregate  (cost=2066.45..2066.47 rows=1 width=8)
           ->  Seq Scan on cms_cmsplugin tbn2_1  (cost=0.00..2065.61 rows=340 width=0)
                 Filter: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
(10 rows)

Note that those LIKE queries aren't using an index, even though I think they should be. (The note in the Treebeard docs here indicates the same thing):

The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

After observing a few other queries, I noticed none of our LIKE queries seemed to be using indexes. After reading up on LIKE queries and indexes in the Postgres docs, I found the following in the locale documentation:

The locale settings influence the following SQL features:

Sort order in queries using ORDER BY or the standard comparison operators on textual data

The upper, lower, and initcap functions

Pattern matching operators (LIKE, SIMILAR TO, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressions

The to_char family of functions

The ability to use indexes with LIKE clauses

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Sure enough, the locale on our RDS instance was set to en_US.UTF-8, meaning none of our LIKE queries use indexes. It doesn't seem to be documented, but it looks like this is a default locale for RDS PostgreSQL.

Changing our locale wasn't an option, but we did manage to speed up this query by adding the pg_trgm trigram index extension. This is conveniently already available in django.contrib.postgres: https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/operations/#trigramextension.

First we enabled the extension:

from __future__ import unicode_literals

from django.db import migrations
from django.contrib.postgres.operations import TrigramExtension


class Migration(migrations.Migration):

    dependencies = [
        ('app', '0126_userdropdownmenumodel'),
    ]

    operations = [
        TrigramExtension()
    ]

...then added an index with a RunSQL operation:

from __future__ import unicode_literals

from django.db import migrations
from django.db.migrations.operations import RunSQL


class Migration(migrations.Migration):

    dependencies = [
        ('app', '0127_add_postgres_trigram_extension'),
    ]

    operations = [
        RunSQL(
            'CREATE INDEX cms_cmsplugin_path_idx ON cms_cmsplugin USING GIN (path gin_trgm_ops);',
            'DROP INDEX cms_cmsplugin_path_idx;'
        )
    ]

This brought the query down to ~10s. Here's the query plan after adding the index:

gc2_101=> EXPLAIN ANALYZE SELECT tbn1.path, tbn1.numchild, (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____') FROM "cms_cmsplugin" AS tbn1 WHERE tbn1.numchild != (SELECT COUNT(1) FROM "cms_cmsplugin" AS tbn2 WHERE tbn2.path LIKE tbn1.path||'____');
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on cms_cmsplugin tbn1  (cost=0.00..99434642.58 rows=67622 width=19) (actual time=10320.180..10320.180 rows=0 loops=1)
   Filter: (numchild <> (SubPlan 2))
   Rows Removed by Filter: 67962
   SubPlan 1
     ->  Aggregate  (cost=733.36..733.37 rows=1 width=8) (never executed)
           ->  Bitmap Heap Scan on cms_cmsplugin tbn2  (cost=18.64..732.51 rows=340 width=0) (never executed)
                 Recheck Cond: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
                 ->  Bitmap Index Scan on cms_cmsplugin_idx_path  (cost=0.00..18.55 rows=340 width=0) (never executed)
                       Index Cond: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
   SubPlan 2
     ->  Aggregate  (cost=733.36..733.37 rows=1 width=8) (actual time=0.150..0.151 rows=1 loops=67962)
           ->  Bitmap Heap Scan on cms_cmsplugin tbn2_1  (cost=18.64..732.51 rows=340 width=0) (actual time=0.142..0.149 rows=1 loops=67962)
                 Recheck Cond: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
                 Rows Removed by Index Recheck: 1
                 Heap Blocks: exact=96681
                 ->  Bitmap Index Scan on cms_cmsplugin_idx_path  (cost=0.00..18.55 rows=340 width=0) (actual time=0.137..0.137 rows=2 loops=67962)
                       Index Cond: ((path)::text ~~ ((tbn1.path)::text || '____'::text))
 Planning time: 0.169 ms
 Execution time: 10320.251 ms
(19 rows)

I'm omitting "expected" and "actual" below—it would be great for this command to be as fast as possible, but I primarily wanted to write this up for other Postgres RDS users who may encounter a similar issue! I think this is an issue at the intersection of RDS, treebeard and DjangoCMS, and the best fix here might just be to document it.

Environment

  • Python version: 3.5.2
  • Django version: 1.11.14
  • django CMS version: 3.5.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment