Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Updates some database columns to MediumText #27119

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

michael-s-molina
Copy link
Member

@michael-s-molina michael-s-molina commented Feb 14, 2024

SUMMARY

Similarly to #24911, #24510, #20936, #20779, #5618, this PR updates the type of some database columns to MediumText to accommodate bigger text contents such as JSON, CSS or SQL. We have found some cases where SQL queries shared by users were truncated due to insufficient storage space. You can check the referenced PRs for more context on why the change is important.

TESTING INSTRUCTIONS

1 - Execute superset db upgrade and check the affected tables
2 - Make sure the nullable state is preserved
3 - Execute superset db downgrade and check the affected tables

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the risk:db-migration PRs that require a DB migration label Feb 14, 2024
Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (152cd70) 67.16% compared to head (472e27f) 69.49%.
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #27119      +/-   ##
==========================================
+ Coverage   67.16%   69.49%   +2.32%     
==========================================
  Files        1900     1900              
  Lines       74447    74449       +2     
  Branches     8297     8297              
==========================================
+ Hits        50005    51737    +1732     
+ Misses      22387    20657    -1730     
  Partials     2055     2055              
Flag Coverage Δ
hive 53.80% <100.00%> (?)
mysql 78.02% <100.00%> (+<0.01%) ⬆️
postgres 78.12% <100.00%> (+<0.01%) ⬆️
presto 53.75% <100.00%> (?)
python 83.09% <100.00%> (+4.82%) ⬆️
sqlite 77.64% <100.00%> (+<0.01%) ⬆️
unit 56.43% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


def upgrade():
for column in TABLE_COLUMNS:
with op.batch_alter_table(column.split(".")[0]) as batch_op:
Copy link
Member

@zhaoyongjie zhaoyongjie Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina, I encountered a database deadlock error when running the database migration script a few days ago. We are using PostgreSQL as Superset metadata. This change only affects MySQL, so could you please add a condition to detect if MySQL is used as metadata? If not, pass the upgrade and downgrade functions.

bind = op.get_bind()

if isinstance(bind.dialect, MySQLDialect):
    // the upgrade ......

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaoyongjie are these migrations a no-op if PostgreSQL doesn't differentiate between TEXT and MEDIUMTEXT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley PG only has TEXT type for string-like column, I'm not sure what should be done under the SQLAlchemy core.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley According to ChatGPT:

No, the alter_column function in SQLAlchemy is not a no-op if the existing type is the same as the new type. When you use the alter_column function in SQLAlchemy to alter a column, it will generate an ALTER TABLE statement that modifies the column definition in the database schema. Even if the existing type is the same as the new type, SQLAlchemy will still execute the ALTER TABLE statement to ensure that the column definition matches the intended type.
This behavior ensures consistency and helps to handle any other modifications or changes associated with the column, such as constraints or defaults.

import sqlalchemy as sa
from alembic import op

from superset.utils.core import MediumText
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to look this up, but MediumText here is:

def MediumText() -> Variant:  # pylint:disable=invalid-name
    return Text().with_variant(MEDIUMTEXT(), "mysql")

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure we only run this DDL for MySQL since MEDIUMTEXT is a MySQL-only construct? Something like if engine == 'mysql': within both upgrade and downgrade (?)

Unless SQLAlchemy is smart enough to understand this is a no-op for Postgres and others(?)

@mistercrunch
Copy link
Member

Ooops just realized after my review that similar comments had been made :)

Side note - if that's helpful - we now have a utils module that can expose methods to be reused across migrations (somehow it was tricky to introduce this) -> https://github.com/apache/superset/blob/master/superset/migrations/migration_utils.py

Meaning you can create more complex or engine-specific logic here that we'll be able to re-use in the future.

@michael-s-molina
Copy link
Member Author

so could you please add a condition to detect if MySQL is used as metadata?

Can we make sure we only run this DDL for MySQL since MEDIUMTEXT is a MySQL-only construct?

@zhaoyongjie @mistercrunch Added the MySQL condition 😉

Side note - if that's helpful - we now have a utils module that can expose methods to be reused across migrations

Nice! Thanks for sharing.

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the upgrading.

@michael-s-molina michael-s-molina merged commit 3d645fd into apache:master Feb 15, 2024
41 checks passed
grvoicu added a commit to grvoicu/superset that referenced this pull request Feb 20, 2024
…-to-the-embedded-dashboard

* master: (1182 commits)
  fix(ci): mypy pre-commit issues (apache#27161)
  feat(Alerts and Reports): Modal redesign (apache#26202)
  refactor: Migrate ErrorBoundary to typescript (apache#27143)
  chore(tests): Remove unnecessary explicit Flask-SQLAlchemy session expunges (apache#27136)
  fix(plugins): Apply dashboard filters to comparison query in BigNumber with Time Comparison chart (apache#27138)
  fix: Duplicated toast messages (apache#27135)
  docs: add Geotab to users list (apache#27134)
  fix: Plain error message when visiting a dashboard via permalink without permissions (apache#27132)
  fix: ID param for DELETE ssh_tunnel endpoint (apache#27130)
  chore(hail mary): Update package-lock.json via npm-audit-fix (apache#26693)
  chore: lower cryptography min version to 41.0.2 (apache#27129)
  docs(miscellaneous): Export Datasoruces: export datasources exports to ZIP (apache#27120)
  fix(pivot-table-v2): Added forgotten translation pivot table v2 (apache#22840)
  fix: RLS modal overflow (apache#27128)
  refactor: Updates some database columns to MediumText (apache#27119)
  fix: gevent upgrade to 23.9.1 (apache#27112)
  fix: removes old deprecated sqllab endpoints (apache#27117)
  feat(storybook): Co-habitating/Upgrading Storybooks to v7 (dependency madness ensues) (apache#26907)
  fix: bump grpcio, urllib3 and paramiko (apache#27124)
  chore(internet_port): added new ports and removed unnecessary string class (apache#27078)
  ...
sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.0.0 labels Apr 17, 2024
john-bodley added a commit to john-bodley/superset that referenced this pull request May 13, 2024
john-bodley added a commit to john-bodley/superset that referenced this pull request May 13, 2024
john-bodley added a commit that referenced this pull request May 13, 2024
aehanno pushed a commit to kosmos-education/superset that referenced this pull request May 13, 2024
john-bodley added a commit to john-bodley/superset that referenced this pull request May 14, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
john-bodley added a commit that referenced this pull request May 20, 2024
Vitor-Avila pushed a commit to Vitor-Avila/superset that referenced this pull request May 28, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request May 31, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration size/L 🚢 4.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants