Upgrading to 2.7.11 breaks older sqlite databases #8417

hateyouinfinity · 2023-02-06T16:21:32Z

First check

I added a descriptive title to this issue.
I used the GitHub search to find a similar issue and didn't find it.
I searched the Prefect documentation for this issue.
I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

After upgrading to 2.7.11 orion started crashlooping, logs showed sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) duplicate column name: has_data.
Rolled back the migrations, downgraded to 2.7.10, tried to investigate.

Alembic upgrade never terminates (see Error).
The culpable revision is f92143d30c25, introduced in PR #8164.
It prepares flow_run_state and task_run_state tables for the subsequent migration of artifact data (in f92143d30c26)
The relevant part is that it calculates and sets the (boolean) value for has_data column based on the value of data column.
data can either be a NULL, a string 'null', or a string containing a json value.
The logic in populate_flow_has_data_in_batches is, I believe, intended to set has_data to 1 if data contains a json value and to 0 otherwise.
The current logic does not achieve that goal.

Problems

Comparing anything with a NULL via an (in)equality operator results in a NULL (see 2.)
Since the two parts of the condition are exclusive, disjunction leads to the condition evaluating to 1 for data='null' (same would happen for data=NULL if not for problem 1)

	Goal	data IS NOT NULL	data != 'null'	data IS NOT NULL or data != 'null'
NULL	0	0	NULL	NULL
'null'	0	1	0	1
JSON	1	1	1	1

Problem 2 results in false positives, migrating data from every row to artifact table. This affects runs created on every version, though I'm not sure what the impact is besides bloat.
Problem 1 results in the upgrade never terminating if >500 rows with data=NULL are present in a table.
Seems like in versions>=2.6.0 data defaults to 'null', so this should only affect databases with runs created on <=2.5.0.

Solution

Use IS NOT instead of != (again, see 2.)
Use and instead of or

	Goal	data IS NOT NULL	data IS NOT 'null'	data IS NOT NULL or data IS NOT 'null'	data IS NOT NULL and data IS NOT 'null'
NULL	0	0	1	1	0
'null'	0	1	0	1	0
JSON	1	1	1	1	1

Workaround

Stamp database with f92143d30c25, downgrade to bb38729c471a, replace two lines in revision f92143d30c25, run migrations.

Reproduction

SQL:

import sqlite3
conn = sqlite3.connect(":memory:")
cur = conn.cursor()

stmt = """
        SELECT (PLACEHOLDER IS NOT NULL),
        (PLACEHOLDER != 'null'), 
        (PLACEHOLDER IS NOT 'null'), 
        (PLACEHOLDER IS NOT NULL or PLACEHOLDER != 'null'), 
        (PLACEHOLDER IS NOT NULL and PLACEHOLDER != 'null'), 
        (PLACEHOLDER IS NOT NULL or PLACEHOLDER IS NOT 'null'), 
        (PLACEHOLDER IS NOT NULL and PLACEHOLDER IS NOT 'null')
"""
cur.execute(stmt.replace("PLACEHOLDER", "NULL"))
print(cur.fetchall())
cur.execute(stmt.replace("PLACEHOLDER", "'null'"))
print(cur.fetchall())
cur.execute(stmt.replace("PLACEHOLDER", "'JSON'"))
print(cur.fetchall())

Complete:

pip install prefect==2.4.2
export PREFECT_ORION_DATABASE_CONNECTION_URL=sqlite+aiosqlite:///test.db
export PREFECT_ORION_DATABASE_ECHO=True
Start orion, start an agent, upload a deployment, run it at least once
pip install prefect==2.7.11
You need to have more rows with data=NULL in flow_run_state than batch_size used here, lower the value manually.
Start orion, migrations will never terminate and the server will never start. Restarting the server will leave database in a broken state since has_data has already been added, but migrations terminated prematurely.

Error

2023-02-06 13:14:13,734 INFO sqlalchemy.engine.Engine BEGIN (implicit; DBAPI should not BEGIN due to autocommit mode)
13:14:13.734 | INFO    | sqlalchemy.engine.Engine - BEGIN (implicit; DBAPI should not BEGIN due to autocommit mode)
2023-02-06 13:14:13,735 INFO sqlalchemy.engine.Engine 
            UPDATE flow_run_state
            SET has_data = (data IS NOT NULL or data != 'null')
            WHERE flow_run_state.id in (SELECT id FROM flow_run_state WHERE (has_data IS NULL) LIMIT 500);
        
13:14:13.735 | INFO    | sqlalchemy.engine.Engine - 
            UPDATE flow_run_state
            SET has_data = (data IS NOT NULL or data != 'null')
            WHERE flow_run_state.id in (SELECT id FROM flow_run_state WHERE (has_data IS NULL) LIMIT 500);
        
2023-02-06 13:14:13,735 INFO sqlalchemy.engine.Engine [generated in 0.00069s] {}

<...>

2023-02-06 13:17:51,928 INFO sqlalchemy.engine.Engine 
            UPDATE flow_run_state
            SET has_data = (data IS NOT NULL or data != 'null')
            WHERE flow_run_state.id in (SELECT id FROM flow_run_state WHERE (has_data IS NULL) LIMIT 500);
        
13:17:51.928 | INFO    | sqlalchemy.engine.Engine - 
            UPDATE flow_run_state
            SET has_data = (data IS NOT NULL or data != 'null')
            WHERE flow_run_state.id in (SELECT id FROM flow_run_state WHERE (has_data IS NULL) LIMIT 500);
        
2023-02-06 13:17:51,928 INFO sqlalchemy.engine.Engine [cached since 218.2s ago] {}
13:17:51.928 | INFO    | sqlalchemy.engine.Engine - [cached since 218.2s ago] {}

Versions

Version:             2.7.11
API version:         0.8.4
Python version:      3.10.7
Git commit:          6b27b476
Built:               Thu, Feb 2, 2023 7:22 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         hosted

Additional context

No response

The text was updated successfully, but these errors were encountered:

zanieb · 2023-02-06T16:37:28Z

Thank you for the very thorough issue!

hateyouinfinity added bug Something isn't working status:triage labels Feb 6, 2023

zanieb added great writeup This is a wonderful example of our standards priority:high and removed status:triage labels Feb 6, 2023

anticorrelator mentioned this issue Feb 6, 2023

Properly limit artifact migration to states that have non-null data #8420

Merged

zanieb closed this as completed Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading to 2.7.11 breaks older sqlite databases #8417

Upgrading to 2.7.11 breaks older sqlite databases #8417

hateyouinfinity commented Feb 6, 2023 •

edited

Loading

zanieb commented Feb 6, 2023

Upgrading to 2.7.11 breaks older sqlite databases #8417

Upgrading to 2.7.11 breaks older sqlite databases #8417

Comments

hateyouinfinity commented Feb 6, 2023 • edited Loading

First check

Bug summary

Problems

Solution

Workaround

Reproduction

Error

Versions

Additional context

zanieb commented Feb 6, 2023

hateyouinfinity commented Feb 6, 2023 •

edited

Loading