Skip to content

fix(exports,email,logs): csv formula escaping, subject CRLF stripping, UTC log pruning#40645

Merged
sha174n merged 3 commits into
masterfrom
fix/email-crlf-log-utc-csv-formula
Jun 2, 2026
Merged

fix(exports,email,logs): csv formula escaping, subject CRLF stripping, UTC log pruning#40645
sha174n merged 3 commits into
masterfrom
fix/email-crlf-log-utc-csv-formula

Conversation

@rusackas
Copy link
Copy Markdown
Member

@rusackas rusackas commented Jun 2, 2026

SUMMARY

Three small correctness/hardening fixes across the email, logs, and CSV export layers:

  1. Email subject CRLF stripping (superset/utils/core.py, send_email_smtp): the subject was assigned to msg["Subject"] verbatim. CR/LF characters are now stripped (\r removed, \n collapsed to a space, then trimmed) before setting the header, so the subject value cannot fold or split into additional email headers.

  2. UTC log pruning (superset/commands/logs/prune.py): the retention cutoff used a timezone-naive datetime.now(). It now uses datetime.now(tz=timezone.utc) so retention matches how Log.dttm is stored (UTC) and stays correct regardless of the server's local timezone.

  3. CSV formula-prefix escaping on export (superset/utils/csv.py): formula-prefix escaping already existed (escape_value / df_to_escaped_csv, wired into viz.py, query_context_processor.py, and sql_lab/export.py). The existing matcher covered -, @, +, |, =, %. It is extended to also treat a leading tab or carriage return as dangerous, since some spreadsheet software trims that leading whitespace and then evaluates the remainder. Numeric columns remain untouched (only string/object cells are escaped).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Unit tests added/updated:

  • tests/unit_tests/utils/test_core.py::test_send_email_smtp_strips_crlf_from_subject — CR/LF stripped from subject.
  • tests/unit_tests/commands/logs/prune_test.py::test_prune_cutoff_is_tz_aware_utc — prune cutoff is timezone-aware UTC.
  • tests/unit_tests/utils/csv_tests.pyescape_value escapes leading tab/CR; df_to_escaped_csv escapes =cmd() while leaving numeric columns untouched.

Run:

python -m pytest tests/unit_tests/utils/csv_tests.py \
  tests/unit_tests/commands/logs/prune_test.py \
  tests/unit_tests/utils/test_core.py::test_send_email_smtp_strips_crlf_from_subject -q

All pass. ruff check clean on changed files.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

🤖 Generated with Claude Code

…, UTC log pruning

- Strip CR/LF from email subjects in send_email_smtp so the value cannot
  fold/split into additional email headers.
- Compute the log prune retention cutoff in timezone-aware UTC to match how
  Log.dttm is stored, keeping retention correct regardless of server timezone.
- Extend CSV export formula-prefix escaping to also treat a leading tab or
  carriage return as dangerous, alongside the existing -, @, +, |, =, %
  handling. Numeric columns remain untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 2, 2026

Code Review Agent Run #e9b572

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • tests/unit_tests/utils/csv_tests.py - 1
Review Details
  • Files reviewed - 7 · Commit Range: 15a8bb1..15a8bb1
    • superset/commands/logs/prune.py
    • superset/utils/core.py
    • superset/utils/csv.py
    • tests/unit_tests/commands/logs/__init__.py
    • tests/unit_tests/commands/logs/prune_test.py
    • tests/unit_tests/utils/csv_tests.py
    • tests/unit_tests/utils/test_core.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 2, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 15a8bb1
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a1e203e9802a5000892e7de
😎 Deploy Preview https://deploy-preview-40645--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.94%. Comparing base (2b8e31b) to head (96f375f).
⚠️ Report is 33 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #40645   +/-   ##
=======================================
  Coverage   63.94%   63.94%           
=======================================
  Files        2658     2658           
  Lines      143011   143012    +1     
  Branches    32866    32866           
=======================================
+ Hits        91454    91455    +1     
  Misses      49994    49994           
  Partials     1563     1563           
Flag Coverage Δ
hive 39.75% <50.00%> (-0.01%) ⬇️
mysql 58.40% <100.00%> (+<0.01%) ⬆️
postgres 58.47% <100.00%> (+<0.01%) ⬆️
presto 41.36% <50.00%> (-0.01%) ⬇️
python 59.96% <100.00%> (+<0.01%) ⬆️
sqlite 58.13% <100.00%> (+<0.01%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Three small hardening fixes across email, log pruning, and CSV export paths, plus accompanying unit tests.

Changes:

  • send_email_smtp strips \r/\n from the subject before assigning the Subject header, preventing email header injection.
  • LogPruneCommand computes its retention cutoff with datetime.now(tz=timezone.utc) instead of timezone-naive local time.
  • The CSV problematic_chars_re regex is extended so a leading tab or carriage return is itself treated as a dangerous prefix, in addition to the existing leading-whitespace-plus-formula-char case.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
superset/utils/core.py Strip CR/LF from email subject before setting the header.
superset/commands/logs/prune.py Switch retention cutoff to UTC-aware datetime.now.
superset/utils/csv.py Expand formula-escape regex to also cover leading \t and \r.
tests/unit_tests/utils/test_core.py New test asserting CRLF is stripped from email subject.
tests/unit_tests/commands/logs/prune_test.py New test asserting the prune cutoff is tz-aware UTC.
tests/unit_tests/commands/logs/init.py Adds package marker for the new test module.
tests/unit_tests/utils/csv_tests.py New assertions for \t/\r escaping and numeric-column preservation.

Comment thread superset/commands/logs/prune.py Outdated
Comment on lines +70 to +71
Log.dttm
< datetime.now(tz=timezone.utc) - timedelta(days=self.retention_period_days)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — switched from datetime.now(tz=timezone.utc) to datetime.utcnow() in LogPruneCommand. Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow), so comparing with an aware datetime would raise a type error on PostgreSQL. The test was updated to assert cutoff.tzinfo is None.

Comment thread tests/unit_tests/utils/csv_tests.py Outdated
Comment on lines +66 to +71
# A leading tab or carriage return is also treated as a dangerous prefix.
result = csv.escape_value("\t=10+2")
assert result == "'\t=10+2"

result = csv.escape_value("\r=10+2")
assert result == "'\r=10+2"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — those tests covered behavior that was already handled by the pre-existing \s{1,}(?=[...]) alternative. Updated the tests to use "\t10" and "\rfoo" instead, which specifically exercise the new ^[...\t\r] branch (tab/CR with no following dangerous char) added in this PR.

rusackas and others added 2 commits June 1, 2026 18:29
…mn type

Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow). Using
datetime.now(tz=timezone.utc) (timezone-aware) for the cutoff comparison
raises "operator does not exist: timestamp without time zone" on PostgreSQL.
Switched to datetime.utcnow() and updated the test to assert naive (tzinfo is
None) instead of aware UTC.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous tests used \t=10+2 and \r=10+2, which were already handled by
the pre-existing \s{1,}(?=[...]) alternative. The new tests use \t10 and
\rfoo (no following dangerous char) to specifically exercise the new
^[\-@+|=%\t\r] branch that was added in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #528b2e

Actionable Suggestions - 1
  • superset/commands/logs/prune.py - 1
    • Deprecated datetime.utcnow() usage · Line 70-72
Review Details
  • Files reviewed - 3 · Commit Range: 15a8bb1..96f375f
    • superset/commands/logs/prune.py
    • tests/unit_tests/commands/logs/prune_test.py
    • tests/unit_tests/utils/csv_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines 70 to 72
select_stmt = sa.select(Log.id).where(
Log.dttm < datetime.now() - timedelta(days=self.retention_period_days)
Log.dttm < datetime.utcnow() - timedelta(days=self.retention_period_days)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated datetime.utcnow() usage

Replace deprecated datetime.utcnow() with datetime.now(datetime.UTC) for timezone-aware datetime handling. Also add missing trailing comma after the closing parenthesis.

Code suggestion
Check the AI-generated fix before applying
Suggested change
select_stmt = sa.select(Log.id).where(
Log.dttm < datetime.now() - timedelta(days=self.retention_period_days)
Log.dttm < datetime.utcnow() - timedelta(days=self.retention_period_days)
)
select_stmt = sa.select(Log.id).where(
Log.dttm < datetime.now(datetime.UTC) - timedelta(days=self.retention_period_days),
)

Code Review Run #528b2e


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally leaving this as datetime.utcnow(): Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow), so switching to the aware datetime.now(datetime.UTC) would reintroduce the naive/aware comparison error on PostgreSQL that this PR fixes (see the earlier thread on this same line). The trailing comma is cosmetic and not required for a single-clause where().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion to use datetime.now(datetime.UTC) is intended to improve timezone awareness, but your assessment that it would reintroduce a naive/aware comparison error with the existing PostgreSQL Log.dttm column is correct. Given that the current implementation relies on naive UTC datetimes, maintaining datetime.utcnow() is appropriate to avoid this runtime error. You may safely ignore this suggestion.

@sha174n sha174n added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Jun 2, 2026
@sha174n sha174n merged commit 093b43c into master Jun 2, 2026
61 checks passed
@sha174n sha174n deleted the fix/email-crlf-log-utc-csv-formula branch June 2, 2026 17:32
@github-project-automation github-project-automation Bot moved this from Needs Review to Approved and/or Merged in Superset Review Help Wanted Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-if-green If approved and tests are green, please go ahead and merge it for me size/L

Projects

Status: Approved and/or Merged

Development

Successfully merging this pull request may close these issues.

4 participants