Fix: Remove unused attachment data from ipynb files when attachment links are deleted #251152
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When users paste images into Jupyter notebook markdown cells and later delete the attachment references from the markdown content, the base64-encoded image data was still being saved in the
.ipynb
file. This caused unnecessary file bloat and potential data leakage.Problem
The issue occurred in the serialization phase where
createMarkdownCellFromNotebookCell()
andcreateRawCellFromNotebookCell()
functions blindly copied all attachments from cell metadata to the serialized output, regardless of whether those attachments were actually referenced in the cell content.Solution
getReferencedAttachmentNames()
helper function that uses regex to detect actual attachment references in markdown content
syntax are included in the serialized outputExample
Before this fix:
After this fix:
Changes
serializers.ts
: Added attachment filtering logic tocreateMarkdownCellFromNotebookCell()
andcreateRawCellFromNotebookCell()
serializers.test.ts
: Updated existing tests and added new test cases for unused/partially used attachmentsTesting
Fixes #200290.