Skip to content

chore(excel): strip document metadata from Excel exports#40661

Merged
rusackas merged 1 commit into
masterfrom
chore/strip-excel-export-metadata
Jun 2, 2026
Merged

chore(excel): strip document metadata from Excel exports#40661
rusackas merged 1 commit into
masterfrom
chore/strip-excel-export-metadata

Conversation

@rusackas
Copy link
Copy Markdown
Member

@rusackas rusackas commented Jun 2, 2026

SUMMARY

When generating .xlsx exports via superset/utils/excel.py (df_to_excel), the underlying xlsxwriter engine embeds workbook document properties into docProps/core.xml — notably created/modified timestamps set to the actual generation time of the file. This change resets the workbook document properties before the file is written so exported workbooks carry a clean, neutral set of metadata rather than environment-specific details.

Specifically, df_to_excel now calls workbook.set_properties(...) to:

  • Clear the authoring/descriptive fields: title, subject, author, manager, company, category, keywords, comments, status.
  • Pin created/modified to a fixed, neutral timestamp (2000-01-01) instead of the real generation time.

The actual sheet data is untouched — only the workbook's core document properties are normalized. This is the single shared export path used by both the chart/dashboard data export (query_context_processor) and the legacy CSV/Excel view export (views/core.py).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Not applicable (no UI change).

Before — docProps/core.xml contained live generation timestamps:

<dcterms:created>2026-06-02T01:55:41Z</dcterms:created>
<dcterms:modified>2026-06-02T01:55:41Z</dcterms:modified>

After — neutral, fixed values:

<dcterms:created>2000-01-01T00:00:00Z</dcterms:created>
<dcterms:modified>2000-01-01T00:00:00Z</dcterms:modified>

TESTING INSTRUCTIONS

  • Run the unit test: pytest tests/unit_tests/utils/excel_tests.py
  • A new test test_document_properties_are_neutral generates an xlsx via df_to_excel, loads the bytes back with openpyxl, and asserts the core properties are empty/neutral and timestamps are pinned.
  • Manually: export a chart/dashboard to Excel and inspect docProps/core.xml inside the .xlsx (it is a zip); confirm authoring fields are empty and timestamps are neutral.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

🤖 Generated with Claude Code

Reset workbook document properties (authoring fields and
generation timestamps) when generating .xlsx exports so the
output files do not carry environment-specific details. Pins
created/modified to a fixed neutral timestamp and clears
title/subject/author/company/category/keywords/etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 2, 2026

Code Review Agent Run #4e7e5f

Actionable Suggestions - 0
Additional Suggestions - 1
  • superset/utils/excel.py - 1
    • Missing modified property key · Line 31-42
      The `NEUTRAL_DOCUMENT_PROPERTIES` dict does not include a 'modified' key, but `test_document_properties_are_neutral` (line 82) asserts `properties.modified == NEUTRAL_TIMESTAMP`. XlsxWriter's documented API only lists: title, subject, author, manager, company, category, keywords, comments, status. If 'modified' is being set via undocumented xlsxwriter behavior, add a comment to prevent future maintainers from removing what they perceive as unused code.
Review Details
  • Files reviewed - 2 · Commit Range: 1f6361e..1f6361e
    • superset/utils/excel.py
    • tests/unit_tests/utils/excel_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added data:csv Related to import/export of CSVs viz:charts:export Related to exporting charts labels Jun 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.94%. Comparing base (41da35e) to head (1f6361e).
⚠️ Report is 22 commits behind head on master.

Files with missing lines Patch % Lines
superset/utils/excel.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #40661   +/-   ##
=======================================
  Coverage   63.94%   63.94%           
=======================================
  Files        2658     2658           
  Lines      143011   143015    +4     
  Branches    32866    32866           
=======================================
+ Hits        91454    91457    +3     
- Misses      49994    49995    +1     
  Partials     1563     1563           
Flag Coverage Δ
hive 39.76% <75.00%> (+<0.01%) ⬆️
mysql 58.40% <75.00%> (+<0.01%) ⬆️
postgres 58.48% <75.00%> (+<0.01%) ⬆️
presto 41.36% <75.00%> (+<0.01%) ⬆️
python 59.96% <75.00%> (+<0.01%) ⬆️
sqlite 58.13% <75.00%> (+<0.01%) ⬆️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rusackas rusackas requested review from betodealmeida and sha174n June 2, 2026 05:21
@sadpandajoe sadpandajoe added the review:checkpoint Last PR reviewed during the daily review standup label Jun 2, 2026
@sha174n sha174n added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Jun 2, 2026
@rusackas rusackas merged commit 4d2b10d into master Jun 2, 2026
74 of 76 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to Approved and/or Merged in Superset Review Help Wanted Jun 2, 2026
@rusackas rusackas deleted the chore/strip-excel-export-metadata branch June 2, 2026 18:48
@sadpandajoe sadpandajoe removed the review:checkpoint Last PR reviewed during the daily review standup label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:csv Related to import/export of CSVs merge-if-green If approved and tests are green, please go ahead and merge it for me size/M viz:charts:export Related to exporting charts

Projects

Status: Approved and/or Merged

Development

Successfully merging this pull request may close these issues.

4 participants