Skip to content

perf(csv): avoid regex in CSV value escaping#40195

Open
dhimasardinata wants to merge 2 commits into
apache:masterfrom
dhimasardinata:dhimas/optimize-csv-escape
Open

perf(csv): avoid regex in CSV value escaping#40195
dhimasardinata wants to merge 2 commits into
apache:masterfrom
dhimasardinata:dhimas/optimize-csv-escape

Conversation

@dhimasardinata
Copy link
Copy Markdown

@dhimasardinata dhimasardinata commented May 17, 2026

SUMMARY

Avoid running two regular expressions for every CSV cell passed through escape_value(). The CSV export path can call this for every object-valued cell and header, so replacing the regex checks with direct prefix checks keeps the existing CSV injection behavior while reducing per-cell overhead.

The implementation preserves the currently tested cases for formula prefixes, doubled-quote prefixes, leading whitespace, pipe escaping, and negative numeric values. It also handles whitespace-prefixed doubled-quote formulas such as ""=10+2.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Not applicable.

TESTING INSTRUCTIONS

Ran locally:

python3 -m py_compile superset/utils/csv.py tests/unit_tests/utils/csv_tests.py
git diff --check

Also ran a local microbenchmark comparing the previous regex logic with this implementation over mixed dashboard/CSV export values:

old:     per_call_us min/med/max=2.006/2.746/4.318
patched: per_call_us min/med/max=1.617/2.015/2.696
patched_speedup_vs_old=1.36x

python3 -m pytest tests/unit_tests/utils/csv_tests.py -q was not runnable in this local checkout because pytest is not installed.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot Bot added the data:csv Related to import/export of CSVs label May 17, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 17, 2026

Code Review Agent Run #c0d5f4

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 32bb734..32bb734
    • superset/utils/csv.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/utils/csv.py Outdated
@dhimasardinata
Copy link
Copy Markdown
Author

/resolve

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 17, 2026

Code Review Agent Run #e6a089

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 32bb734..20b8442
    • superset/utils/csv.py
    • tests/unit_tests/utils/csv_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 42.10526% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.15%. Comparing base (8d2b655) to head (20b8442).
⚠️ Report is 51 commits behind head on master.

Files with missing lines Patch % Lines
superset/utils/csv.py 42.10% 6 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40195      +/-   ##
==========================================
- Coverage   64.16%   64.15%   -0.01%     
==========================================
  Files        2591     2591              
  Lines      138162   138175      +13     
  Branches    32048    32052       +4     
==========================================
+ Hits        88647    88650       +3     
- Misses      47986    47992       +6     
- Partials     1529     1533       +4     
Flag Coverage Δ
hive 39.45% <21.05%> (-0.01%) ⬇️
mysql 59.14% <42.10%> (-0.01%) ⬇️
postgres 59.23% <42.10%> (-0.01%) ⬇️
presto 41.14% <21.05%> (-0.01%) ⬇️
python 60.66% <42.10%> (-0.01%) ⬇️
sqlite 58.86% <42.10%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@sadpandajoe sadpandajoe added review:checkpoint Last PR reviewed during the daily review standup and removed review:checkpoint Last PR reviewed during the daily review standup labels May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:csv Related to import/export of CSVs size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants