Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[migration] Stripping leading and trailing whitespace #8261

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Sep 19, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Similar to #7084 this PR strips leading and trailing whitespace (produced when using the forms) from specific non-NULL string columns. Previously #7084 dealt only with the cases where the values contained only whitespace.

We ran into this issue when trying to debug a security issue were the table names weren't matching due to trailing whitespace. It seems prudent that the records in Superset's schema be trustworthy and it's more desirable to clean this up via a migration than having to call .strip() every time we dealing with records from the database.

Note depending on the state of the database stripping whitespace may lead to duplicate names which could violate uniqueness constraints and cause the migration to fail. Manually intervention may be required. For example the following MySQL query illustrates SQL metrics which are duplicates after spaces (including tabs) are removed:

SELECT 
    table_id, 
    TRIM(TRIM(BOTH '\t' FROM metric_name)) AS metric_name,
    COUNT(1) AS cnt
FROM 
    sql_metrics 
GROUP BY 
    table_id,
    TRIM(TRIM(BOTH '\t' FROM metric_name))
HAVING 
    cnt > 1

TEST PLAN

Ran superset db upgrade (and superset db downgrade) and verified that the leading and trailing whitespaces were removed.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

to: @etr2460 @graceguo-supercat @michellethomas @mistercrunch

@john-bodley john-bodley added the risk:db-migration PRs that require a DB migration label Sep 19, 2019
@john-bodley john-bodley force-pushed the john-bodley--forms-strip-leading-and-trailing-whitespace branch from 5eef600 to d05fef3 Compare September 19, 2019 22:47
@john-bodley john-bodley force-pushed the john-bodley--forms-strip-leading-and-trailing-whitespace branch from d05fef3 to 6c54f8f Compare September 19, 2019 23:17
if not col.primary_key:
value = getattr(record, col.name)

if value is not None and re.search(r"^\s+|\s+$", value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we just call strip on everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how SQLAlchemy's ORM works and how it tracks modifications, but I wanted to make sure we were only updating those records which need mutating.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, lgtm

@john-bodley john-bodley merged commit d465107 into apache:master Sep 23, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration size/L 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants