fix(exports,email,logs): csv formula escaping, subject CRLF stripping, UTC log pruning#40645
Conversation
…, UTC log pruning - Strip CR/LF from email subjects in send_email_smtp so the value cannot fold/split into additional email headers. - Compute the log prune retention cutoff in timezone-aware UTC to match how Log.dttm is stored, keeping retention correct regardless of server timezone. - Extend CSV export formula-prefix escaping to also treat a leading tab or carriage return as dangerous, alongside the existing -, @, +, |, =, % handling. Numeric columns remain untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #e9b572Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #40645 +/- ##
=======================================
Coverage 63.94% 63.94%
=======================================
Files 2658 2658
Lines 143011 143012 +1
Branches 32866 32866
=======================================
+ Hits 91454 91455 +1
Misses 49994 49994
Partials 1563 1563
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Three small hardening fixes across email, log pruning, and CSV export paths, plus accompanying unit tests.
Changes:
send_email_smtpstrips\r/\nfrom the subject before assigning theSubjectheader, preventing email header injection.LogPruneCommandcomputes its retention cutoff withdatetime.now(tz=timezone.utc)instead of timezone-naive local time.- The CSV
problematic_chars_reregex is extended so a leading tab or carriage return is itself treated as a dangerous prefix, in addition to the existing leading-whitespace-plus-formula-char case.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| superset/utils/core.py | Strip CR/LF from email subject before setting the header. |
| superset/commands/logs/prune.py | Switch retention cutoff to UTC-aware datetime.now. |
| superset/utils/csv.py | Expand formula-escape regex to also cover leading \t and \r. |
| tests/unit_tests/utils/test_core.py | New test asserting CRLF is stripped from email subject. |
| tests/unit_tests/commands/logs/prune_test.py | New test asserting the prune cutoff is tz-aware UTC. |
| tests/unit_tests/commands/logs/init.py | Adds package marker for the new test module. |
| tests/unit_tests/utils/csv_tests.py | New assertions for \t/\r escaping and numeric-column preservation. |
| Log.dttm | ||
| < datetime.now(tz=timezone.utc) - timedelta(days=self.retention_period_days) |
There was a problem hiding this comment.
Fixed — switched from datetime.now(tz=timezone.utc) to datetime.utcnow() in LogPruneCommand. Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow), so comparing with an aware datetime would raise a type error on PostgreSQL. The test was updated to assert cutoff.tzinfo is None.
| # A leading tab or carriage return is also treated as a dangerous prefix. | ||
| result = csv.escape_value("\t=10+2") | ||
| assert result == "'\t=10+2" | ||
|
|
||
| result = csv.escape_value("\r=10+2") | ||
| assert result == "'\r=10+2" |
There was a problem hiding this comment.
Good catch — those tests covered behavior that was already handled by the pre-existing \s{1,}(?=[...]) alternative. Updated the tests to use "\t10" and "\rfoo" instead, which specifically exercise the new ^[...\t\r] branch (tab/CR with no following dangerous char) added in this PR.
…mn type Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow). Using datetime.now(tz=timezone.utc) (timezone-aware) for the cutoff comparison raises "operator does not exist: timestamp without time zone" on PostgreSQL. Switched to datetime.utcnow() and updated the test to assert naive (tzinfo is None) instead of aware UTC. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous tests used \t=10+2 and \r=10+2, which were already handled by
the pre-existing \s{1,}(?=[...]) alternative. The new tests use \t10 and
\rfoo (no following dangerous char) to specifically exercise the new
^[\-@+|=%\t\r] branch that was added in this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #528b2e
Actionable Suggestions - 1
-
superset/commands/logs/prune.py - 1
- Deprecated datetime.utcnow() usage · Line 70-72
Review Details
-
Files reviewed - 3 · Commit Range:
15a8bb1..96f375f- superset/commands/logs/prune.py
- tests/unit_tests/commands/logs/prune_test.py
- tests/unit_tests/utils/csv_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| select_stmt = sa.select(Log.id).where( | ||
| Log.dttm < datetime.now() - timedelta(days=self.retention_period_days) | ||
| Log.dttm < datetime.utcnow() - timedelta(days=self.retention_period_days) | ||
| ) |
There was a problem hiding this comment.
Replace deprecated datetime.utcnow() with datetime.now(datetime.UTC) for timezone-aware datetime handling. Also add missing trailing comma after the closing parenthesis.
Code suggestion
Check the AI-generated fix before applying
| select_stmt = sa.select(Log.id).where( | |
| Log.dttm < datetime.now() - timedelta(days=self.retention_period_days) | |
| Log.dttm < datetime.utcnow() - timedelta(days=self.retention_period_days) | |
| ) | |
| select_stmt = sa.select(Log.id).where( | |
| Log.dttm < datetime.now(datetime.UTC) - timedelta(days=self.retention_period_days), | |
| ) |
Code Review Run #528b2e
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Intentionally leaving this as datetime.utcnow(): Log.dttm is stored as a naive UTC datetime (default=datetime.utcnow), so switching to the aware datetime.now(datetime.UTC) would reintroduce the naive/aware comparison error on PostgreSQL that this PR fixes (see the earlier thread on this same line). The trailing comma is cosmetic and not required for a single-clause where().
There was a problem hiding this comment.
The suggestion to use datetime.now(datetime.UTC) is intended to improve timezone awareness, but your assessment that it would reintroduce a naive/aware comparison error with the existing PostgreSQL Log.dttm column is correct. Given that the current implementation relies on naive UTC datetimes, maintaining datetime.utcnow() is appropriate to avoid this runtime error. You may safely ignore this suggestion.
SUMMARY
Three small correctness/hardening fixes across the email, logs, and CSV export layers:
Email subject CRLF stripping (
superset/utils/core.py,send_email_smtp): the subject was assigned tomsg["Subject"]verbatim. CR/LF characters are now stripped (\rremoved,\ncollapsed to a space, then trimmed) before setting the header, so the subject value cannot fold or split into additional email headers.UTC log pruning (
superset/commands/logs/prune.py): the retention cutoff used a timezone-naivedatetime.now(). It now usesdatetime.now(tz=timezone.utc)so retention matches howLog.dttmis stored (UTC) and stays correct regardless of the server's local timezone.CSV formula-prefix escaping on export (
superset/utils/csv.py): formula-prefix escaping already existed (escape_value/df_to_escaped_csv, wired intoviz.py,query_context_processor.py, andsql_lab/export.py). The existing matcher covered-,@,+,|,=,%. It is extended to also treat a leading tab or carriage return as dangerous, since some spreadsheet software trims that leading whitespace and then evaluates the remainder. Numeric columns remain untouched (only string/object cells are escaped).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
Unit tests added/updated:
tests/unit_tests/utils/test_core.py::test_send_email_smtp_strips_crlf_from_subject— CR/LF stripped from subject.tests/unit_tests/commands/logs/prune_test.py::test_prune_cutoff_is_tz_aware_utc— prune cutoff is timezone-aware UTC.tests/unit_tests/utils/csv_tests.py—escape_valueescapes leading tab/CR;df_to_escaped_csvescapes=cmd()while leaving numeric columns untouched.Run:
All pass.
ruff checkclean on changed files.ADDITIONAL INFORMATION
🤖 Generated with Claude Code