Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of original filenames #7

gwiedeman opened this issue Jun 29, 2021 · 1 comment

Use of original filenames #7

gwiedeman opened this issue Jun 29, 2021 · 1 comment


Copy link

@gwiedeman gwiedeman commented Jun 29, 2021

In drafting the specification, we discovered multiple places where there is the potential for cross-platform filename issues. This is one downside to relying on filesystems for structure, as not all strings are valid file or directory names.

During the packaging of a mailbag, all messages are assigned a new Mailbag-Message-ID that must be filename-safe. These IDs can be UUIDs or merely be sequential numbers, and are used as filenames for new derivative files, such as when PDFs are created from an MBOX file. Unfortunately, we don’t feel that we can use the Message-ID field that’s usually included in emails as it may not be filesystem-safe.

The issue is that users may be packaging mailbags from EML files or even legacy use cases with PDF files and these files already have filenames for individual messages. While they should be coming from the same filesystem and thus be safe to use for derivative files as well, comments from the working meeting suggested that it may be simpler to even use Mailbag-Message-ID in these cases. Interestingly, commenters suggested that even filenames for EML files were not originally created by the user sending the messages. Still, we think Nicholas Garza and Gary Richardon in particular would be alarmed if these filenames were overridden during the creation of a mailbag. So we plan to keep the original filenames whenever possible.

Perhaps a bigger issue is for attachments. We plan to keep the original filenames for attachments, but files embedded within MBOX or EML files may not have been created in the same filesystem being used to package a mailbag. We still think it’s important to keep the original names here, but for cases when an attachment filename is invalid, it is now required to renamed the file using the Mailbag-Message-ID and document changes in an original_filenames.txt file.

Copy link

@nkrabben nkrabben commented Jul 21, 2021

We've found that any type of file system metadata that we would like to retain, including file names is far easier to preserve when copied to a separate text document like the proposed original_filenames.txt file.

Is the original-filenames.txt in the proposed spec yet? I'm curious whether it would be better to have a complete listing of all attachments, with option original-filenames, like an attachments.csv file with columns for mailbag-message-id, original-filename, mailbag-attachment-id.

@gwiedeman gwiedeman added this to In progress in Mailbag tool development Aug 16, 2021
@gwiedeman gwiedeman removed this from In progress in Mailbag tool development Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants