Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A db migration does not take Query#apply_auto_limit in count #6767

Open
eugene-nikolaev opened this issue Feb 15, 2024 · 4 comments
Open

A db migration does not take Query#apply_auto_limit in count #6767

eugene-nikolaev opened this issue Feb 15, 2024 · 4 comments

Comments

@eugene-nikolaev
Copy link

Issue Summary

I am trying to upgrade a Redash 10.1.0 instance to the current preview version
A new version is a "preview" docker image pushed at Feb 10th of 2024.

After migration:

  • my queries with scheduled refreshes stopped to refresh
  • "refresh button" makes a refresh in UI but query result but after page refresh the query still do not have a query result.

I've discovered that this db migration:
https://github.com/getredash/redash/blob/master/migrations/versions/1038c2174f5d_make_case_insensitive_hash_of_query_text.py
does not take in count the "apply_auto_limit" setting of Queries.

So, if a Query's options contains apply_auto_limit: true, this migration converts a query hash without applying the limit. But the QueryRunner applies it as usually generates query results adding the LIMIT 1000. Then the result could never be associated with their queries anymore.

I guess this db migration should be rewritten and use smth like this:

def apply_auto_limit(self, query_text, should_apply_auto_limit):

Steps to Reproduce

  1. Install 10.1.0. E.g. I've launched https://hub.docker.com/layers/redash/redash/10.1.0.b50633/images/sha256-f3e8d95656255d9684051604a586dfd50035fa1e297e68379dc1b557f1885c0f?context=explore. Or it could be pulled from github 🤷

  2. Create a datasource (it could be a Redash's database as well)

  3. Create and publish query, e.g. select current_timestamp;.

  4. Set it to be refreshed once a minute.

  5. Ensure the query results are refreshed by refreshing a page after a few minutes.

  6. Stop the instance, update the version (use a preview image or pull to the current master or whatever you should do), using the same internal Redash's database.

  7. Run db migrations.

  8. Run the redash.

  9. Check the query after a few minutes - it wouldn't be refreshed. Not "Refresh" button will lead to the expected behaviour.

Technical details:

@eugene-nikolaev
Copy link
Author

eugene-nikolaev commented Feb 16, 2024

This is how I patched it to work, but it seems far from best practices and I haven't much context to provide a good fix.
Should there be some helper extracted?

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))

def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))


    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str.lower)

@eugene-nikolaev
Copy link
Author

NOTE: my migration makes different hashes if upgrade->revert. Looking into.

@eugene-nikolaev
Copy link
Author

I have tight time so ended up with just making the migration irreversible 😬
Turns out there is a hassle with Snowflake queries and I don't want to mess with that at the moment.

def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

@eugene-nikolaev
Copy link
Author

I've encountered issues while migrating real production data - it was failing on queries which had no statements (e.g. empty or everything commented out), so hacked it that way:

"""Make case insensitive hash of query text

Revision ID: 1038c2174f5d
Revises: fd4fc850d7ea
Create Date: 2023-07-16 23:10:12.885949

"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.sql import table

from redash.utils import gen_query_hash

from redash.query_runner import BaseSQLQueryRunner

import json
import sqlparse

# revision identifiers, used by Alembic.
revision = '1038c2174f5d'
down_revision = 'fd4fc850d7ea'
branch_labels = None
depends_on = None

def is_empty_statement(stmt):
    # copy statement object. `copy.deepcopy` fails to do this, so just re-parse it
    st = sqlparse.engine.FilterStack()
    st.stmtprocess.append(sqlparse.filters.StripCommentsFilter())
    stmt = next(st.run(str(stmt)), None)
    if stmt is None:
        return True
    return str(stmt).strip() == ""


def query_hash(query_text, options):
    options_json = json.loads(options)
    apply_auto_limit = options_json.get('apply_auto_limit', False)
    runner = BaseSQLQueryRunner({})
    limit_aware_query = runner.apply_auto_limit(query_text, apply_auto_limit)
    return gen_query_hash(limit_aware_query)


def change_query_hash(conn, table, query_text_to):
    for record in conn.execute(table.select()):
        if is_empty_statement(record.query):
            continue
        query_text = query_text_to(record.query)
        options = record.options
        conn.execute(
            table
            .update()
            .where(table.c.id == record.id)
            .values(query_hash=query_hash(query_text, options)))


def upgrade():
    queries = table(
        'queries',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('query', sa.Text),
        sa.Column('query_hash', sa.String(length=10)),
        sa.Column('options', sa.Text))

    conn = op.get_bind()
    change_query_hash(conn, queries, query_text_to=str)


def downgrade():
    # This is irreversible mainly because of Snowflake queries which have a grayed out "auto-limit" option in 10.1.0.
    # But in a "preview" version it is available and Snowflake queries suddenly become "auto-limited"
    # and their hashes becomes different.
    # Also it would take a hash correction because BaseSQLQueryRunner's apply_auto_limit in "upgrade branch" has
    # own opinion on semicolons.
    # To sum up - it is easier to just restore a dump than mess with it.
    raise Exception("Irreversible migration")

😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant