Skip to content

Fix: Remove unused attachment data from ipynb files when attachment links are deleted #251152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jun 10, 2025

When users paste images into Jupyter notebook markdown cells and later delete the attachment references from the markdown content, the base64-encoded image data was still being saved in the .ipynb file. This caused unnecessary file bloat and potential data leakage.

Problem

The issue occurred in the serialization phase where createMarkdownCellFromNotebookCell() and createRawCellFromNotebookCell() functions blindly copied all attachments from cell metadata to the serialized output, regardless of whether those attachments were actually referenced in the cell content.

Solution

  • Added getReferencedAttachmentNames() helper function that uses regex to detect actual attachment references in markdown content
  • Modified serialization functions to filter attachments based on actual usage
  • Only attachments that are referenced with ![...](attachment:filename) syntax are included in the serialized output

Example

Before this fix:

# Cell content: "Here is a header" (attachment link removed)
# Metadata still contains: { attachments: { "screenshot.png": "base64data..." } }
# Serialized .ipynb still includes the unused attachment data

After this fix:

# Cell content: "Here is a header" (attachment link removed)  
# Metadata contains: { attachments: { "screenshot.png": "base64data..." } }
# Serialized .ipynb excludes unused attachment data ✅

Changes

  • serializers.ts: Added attachment filtering logic to createMarkdownCellFromNotebookCell() and createRawCellFromNotebookCell()
  • serializers.test.ts: Updated existing tests and added new test cases for unused/partially used attachments

Testing

  • Verified the exact issue scenario: attachment data is properly removed when references are deleted
  • Tested edge cases: multiple attachments, partial deletion, complex filenames with spaces
  • Ensured backward compatibility with existing attachment behavior

Fixes #200290.

Copilot AI and others added 2 commits June 10, 2025 21:56
Co-authored-by: Yoyokrazy <12552271+Yoyokrazy@users.noreply.github.com>
Co-authored-by: Yoyokrazy <12552271+Yoyokrazy@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Raw file contains image data even if it is not shown in the notebook Fix: Remove unused attachment data from ipynb files when attachment links are deleted Jun 10, 2025
@Copilot Copilot AI requested a review from Yoyokrazy June 10, 2025 22:00
Copilot finished work on behalf of Yoyokrazy June 10, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raw file contains image data even if it is not shown in the notebook
2 participants