Add new check for indexing with atomic=True #251

DavidCain · 2023-03-09T22:21:47Z

Briefly, this check helps prevent a long-running index creation from
preventing all reads and writes to a table.

This supplements an existing check

There is an existing check, CREATE_INDEX which warns about creating
indices without the CONCURRENTLY argument. It's a helpful check!

However, there are some valid reasons to prefer non-concurrent index
creation. For example, concurrent index creation may struggle to
complete when applied to write-heavy tables (since Postgres will have to
make repeated passes on the table to accommodate the new reads since the
last round).

The danger of `EXCLUSIVE` locks

If you do want to create indices nonconcurrently, there is still a
huge footgun to be aware of -- locks which are held in a transaction
are not released until the transaction closes.

Dangerous Django defaults

By default, any time you add a new Django field with db_index=True or
unique=True, the automatically generated Migration has:

implicit setting of atomic=True (transactions are on)
an ALTER TABLE...ADD COLUMN call (exclusively locks the table)
a CREATE INDEX invocation (hopefully quick, due to nullable
columns)

It's even more dangerous if you combine an ALTER TABLE call with
CREATE INDEX of an existing, non-nullable column.

Favor `atomic=False` if you must index

Index-creating migrations should have few operations. Since index
creation does not modify row-level data (and only requires locking
against writes), there's no reason to use transactions.

It's prudent to bypass the default atomic=True for index creation.

DavidCain · 2023-06-26T16:34:21Z

@David-Wobrock - is this repository still maintained and/or open to contributions? Do you have thoughts about this proposed check?

django_migration_linter/sql_analyser/postgresql.py

DavidCain · 2023-06-28T12:41:29Z

Thanks for the review, @fevral13 ! I tried to leave some comments explaining the counterintuitive nature of why even a fast ALTER TABLE operation is problematic when it comes before index creation.

David-Wobrock · 2023-06-30T15:20:23Z

Thanks for the PR! I'll try to have a look in the next days 😇

I need to find some time to give some love to this repo again! Many requests and improvements that can be made 💪

David-Wobrock

Great contribution @DavidCain Thanks a lot! 💯

I have two little comments to address: handle concurrently index creation, and change the check to a warning. With these changes, we should be good to merge it IMO 👍

David-Wobrock · 2023-07-02T06:57:32Z

tests/unit/test_sql_analyser.py

@@ -23,7 +23,7 @@ def assertValidSql(self, sql, allow_warnings=False):
        errors, _, warnings = self.analyse_sql(sql)
        self.assertEqual(0, len(errors), f"Found errors in sql: {errors}")
        if not allow_warnings:
-            self.assertEqual(0, len(warnings), f"Found warnings in sql: {errors}")
+            self.assertEqual(0, len(warnings), f"Found warnings in sql: {warnings}")


Good catch 👍 Thanks!

David-Wobrock · 2023-07-02T07:05:42Z

django_migration_linter/sql_analyser/postgresql.py

+        #     https://www.postgresql.org/docs/current/sql-altertable.html
+        # (Most common example is `ALTER TABLE... ADD COLUMN`, then later `CREATE INDEX`)
+        if sql.startswith("ALTER TABLE"):
+            return has_create_index(sql_statements[i + 1 :])


The issue re-using this function here, is that we are not detecting some cases you are describing. For instance if we had this unit test:

diff --git a/tests/unit/test_sql_analyser.py b/tests/unit/test_sql_analyser.py index 01a2bb3..7553796 100644 --- a/tests/unit/test_sql_analyser.py +++ b/tests/unit/test_sql_analyser.py @@ -320,6 +320,15 @@ class PostgresqlAnalyserTestCase(SqlAnalyserTestCase): ] self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE") + def test_create_concurrently_index_exclusive(self): + sql = [ + "BEGIN;", + 'ALTER TABLE "users" ADD COLUMN "email" varchar(254) NULL;', + 'CREATE CONCURRENTLY INDEX "user_email" ON "users" ("email");', + "COMMIT;", + ] + self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE") + def test_create_index_exclusive_no_lock(self): sql = [ 'ALTER TABLE "users" ADD COLUMN "email" varchar(254) NULL;',

It does not trigger an error:

Found 19 test(s). System check identified no issues (0 silenced). ..F................ ====================================================================== FAIL: test_create_concurrently_index_exclusive (tests.unit.test_sql_analyser.PostgresqlAnalyserTestCase.test_create_concurrently_index_exclusive) ---------------------------------------------------------------------- Traceback (most recent call last): File "./django-migration-linter/tests/unit/test_sql_analyser.py", line 330, in test_create_concurrently_index_exclusive self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE") File "./django-migration-linter/tests/unit/test_sql_analyser.py", line 30, in assertBackwardIncompatibleSql self.assertNotEqual(0, len(errors), "Found no errors in sql") AssertionError: 0 == 0 : Found no errors in sql ---------------------------------------------------------------------- Ran 19 tests in 0.003s FAILED (failures=1)

I guess we can modify a bit the check to also take that into account :) Since, as you said, creating an index concurrently can be a long running operation.

Great callout and this is just oversight on my part. I'll make sure that CREATE INDEX CONCURRENTLY is flagged as well.

David-Wobrock · 2023-07-02T07:17:44Z

django_migration_linter/sql_analyser/postgresql.py

+            fn=has_create_index_in_transaction,
+            message="CREATE INDEX prolongs transaction, delaying lock release",
+            mode=CheckMode.TRANSACTION,
+            type=CheckType.ERROR,


I think this is the main point I see with this contribution.

The check makes total sense, and thanks for adding it 👍 We definitely want to merge it :)

1/ It's not a silver bullet. I don't think by default we should raise an error during migration linting when doing this. To me, we should warn the user instead, that it could create a long-running locking transaction. Basically, stay in the same vein as the other indexing operations which also lock the table. It's the same risk => we know it will lock the table, but we can only warn that it is potentially a long-running lock, since we don't know about the real size and load of the table.

In short:

Suggested change

type=CheckType.ERROR,

type=CheckType.WARNING,

😁

2/ It feels like a subset of CREATE_INDEX, since it checks CREATE_INDEX + being in a transaction + having an ALTER TABLE operation. I feel like there is room for something more generic, and not only when linked to index creation.
Basically, any combination of "transaction + exclusive lock operation + potential long-running operation" should trigger such a warning.
I'm completely fine starting with this specific check, we don't have to change the approach, but it's more to keep in mind that we could enhance this check to something more generic 🙂

You're absolutely right to point out that this could have some false positives (i.e. indexing a small table will generally be fast & thus there's little cause for concern). I opted for an ERROR because I can't really think of many situations in which you would ever need to be creating an index within a transaction. I understand your rationale, though -- we should probably only call it an error if it will always be the source of problems.

I'll happily make this a WARNING and reach for --warnings-as-errors to elevate the warning for our use cases.

Also totally agreed that this could be made generic for detecting any long-running queries that prolong lock release. I think the reason I targeted CREATE INDEX is that:

index creation is a very common migration operation

you can pretty much always move index creation out of the transaction. Other schema-modifying queries may not share that easy resolution.

I would also be very happy to start with this type of check, and replace it entirely with something more generic at a later time!

Thanks for the review & the thorough explanations! I'll be applying both your suggested changes, I think they'll help a ton.

I'll happily make this a WARNING and reach for --warnings-as-errors to elevate the warning for our use cases.

That would be great! Thanks a lot 🙇
You can always use --warnings-as-errors CREATE_INDEX_EXCLUSIVE to mark only this one warning as an error. I think that should work 🤞

I would also be very happy to start with this type of check, and replace it entirely with something more generic at a later time!

Entirely agreed 👍 It was just an opening for the future, I'm completely fine with the current version, handling CREATE INDEX [CONCURRENTLY].

David-Wobrock · 2023-07-03T19:06:46Z

An additional nice touch would be to add a mention of this change in the CHANGELOG.md file, as a new feature. 😇

Once you did all the changes, we'll be able to merge this PR - and we'll release 5.0.0 of the linter 😄 It's been a long time since the last release.

@David-Wobrock

Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): 3YOURMIND#251 (comment)

DavidCain · 2023-07-03T20:52:34Z

Thanks again for the feedback! I believe the outstanding requests have been addressed (it's a WARNING now, I test CONCURRENTLY, and I added a quick note to the CHANGELOG.

Thanks again for a really pleasant first PR experience. 😄 I'd be delighted to make any other changes you as see fit.

We should log the *warnings* found, not errors!

Briefly, this check helps prevent a long-running index creation from preventing all reads and writes to a table. This supplements an existing check ================================== There is an existing check, `CREATE_INDEX` which warns about creating indices without the `CONCURRENTLY` argument. It's a helpful check! However, there are some valid reasons to prefer non-concurrent index creation. For example, concurrent index creation may struggle to complete when applied to write-heavy tables (since Postgres will have to make repeated passes on the table to accommodate the new reads since the last round). The danger of `EXCLUSIVE` locks =============================== If you *do* want to create indices nonconcurrently, there is still a **huge footgun** to be aware of -- locks which are held in a transaction are not released until the transaction closes. Dangerous Django defaults ========================= By default, any time you add a new Django field with `db_index=True` or `unique=True`, the automatically generated `Migration` has: - implicit setting of `atomic=True` (transactions are on) - an `ALTER TABLE...ADD COLUMN` call (exclusively locks the table) - a `CREATE INDEX` invocation (*hopefully* quick, due to nullable columns) It's even more dangerous if you combine an `ALTER TABLE` call with `CREATE INDEX` of an existing, non-nullable column. Favor `atomic=False` if you must index ====================================== Index-creating migrations should have few `operations`. Since index creation does not modify row-level data (and only requires locking against writes), there's no reason to use transactions. It's prudent to bypass the default `atomic=True` for index creation.

While it's always a dangerous *pattern* to create an index while a transaction is open & holds an exclusive lock, in practice it's not always a big deal (for example, indexing a table with very few rows can be fast). Accordingly, downgrade to a warning.

5.0.0 will be released soon, be sure that we document this new check!

@David-Wobrock

Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): 3YOURMIND#251 (comment)

David-Wobrock

Great work @DavidCain 💯
Thanks a lot for the contribution 🙇

codecov-commenter · 2023-07-09T12:33:59Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.12 🎉

Comparison is base (2d15ea8) 93.97% compared to head (1f339d1) 94.10%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #251      +/-   ##
==========================================
+ Coverage   93.97%   94.10%   +0.12%     
==========================================
  Files          78       81       +3     
  Lines        1992     2034      +42     
==========================================
+ Hits         1872     1914      +42     
  Misses        120      120

Impacted Files	Coverage Δ
tests/test_project/settings.py	`100.00% <ø> (ø)`
django_migration_linter/sql_analyser/postgresql.py	`100.00% <100.00%> (ø)`
tests/fixtures.py	`100.00% <100.00%> (ø)`
tests/functional/test_migration_linter.py	`100.00% <100.00%> (ø)`
..._create_index_exclusive/migrations/0001_initial.py	`100.00% <100.00%> (ø)`
...eate_index_exclusive/migrations/0002_user_email.py	`100.00% <100.00%> (ø)`
.../test_project/app_create_index_exclusive/models.py	`100.00% <100.00%> (ø)`
tests/unit/test_sql_analyser.py	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@David-Wobrock

Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): #251 (comment)

fevral13 reviewed Jun 28, 2023

View reviewed changes

django_migration_linter/sql_analyser/postgresql.py Show resolved Hide resolved

DavidCain requested a review from fevral13 June 28, 2023 12:41

David-Wobrock requested changes Jul 2, 2023

View reviewed changes

David-Wobrock self-assigned this Jul 2, 2023

DavidCain requested a review from David-Wobrock July 3, 2023 20:51

DavidCain and others added 5 commits July 9, 2023 14:27

Fix error output when warnings are found

d590489

We should log the *warnings* found, not errors!

Add a CHANGELOG entry for the new check

86b5c4c

5.0.0 will be released soon, be sure that we document this new check!

David-Wobrock approved these changes Jul 9, 2023

View reviewed changes

Move CHANGELOG line to new features section.

1f339d1

David-Wobrock force-pushed the dcain-create-index-exclusive branch from 114bcf7 to 1f339d1 Compare July 9, 2023 12:31

Apply updated pre-commit hooks.

a71410f

David-Wobrock merged commit a856f0d into 3YOURMIND:main Jul 9, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new check for indexing with atomic=True #251

Add new check for indexing with atomic=True #251

DavidCain commented Mar 9, 2023

DavidCain commented Jun 26, 2023

DavidCain commented Jun 28, 2023

David-Wobrock commented Jun 30, 2023

David-Wobrock left a comment

David-Wobrock Jul 2, 2023

David-Wobrock Jul 2, 2023

DavidCain Jul 2, 2023

David-Wobrock Jul 2, 2023

DavidCain Jul 2, 2023

David-Wobrock Jul 3, 2023

David-Wobrock commented Jul 3, 2023

DavidCain commented Jul 3, 2023

David-Wobrock left a comment

codecov-commenter commented Jul 9, 2023

Add new check for indexing with atomic=True #251

Add new check for indexing with atomic=True #251

Conversation

DavidCain commented Mar 9, 2023

This supplements an existing check

The danger of EXCLUSIVE locks

Dangerous Django defaults

Favor atomic=False if you must index

DavidCain commented Jun 26, 2023

DavidCain commented Jun 28, 2023

David-Wobrock commented Jun 30, 2023

David-Wobrock left a comment

Choose a reason for hiding this comment

David-Wobrock Jul 2, 2023

Choose a reason for hiding this comment

David-Wobrock Jul 2, 2023

Choose a reason for hiding this comment

DavidCain Jul 2, 2023

Choose a reason for hiding this comment

David-Wobrock Jul 2, 2023

Choose a reason for hiding this comment

DavidCain Jul 2, 2023

Choose a reason for hiding this comment

David-Wobrock Jul 3, 2023

Choose a reason for hiding this comment

David-Wobrock commented Jul 3, 2023

DavidCain commented Jul 3, 2023

David-Wobrock left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 9, 2023

Codecov Report

The danger of `EXCLUSIVE` locks

Favor `atomic=False` if you must index