Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line endings and encoding for tag files #10

Open
gwiedeman opened this issue Jun 29, 2021 · 1 comment
Open

Line endings and encoding for tag files #10

gwiedeman opened this issue Jun 29, 2021 · 1 comment

Comments

@gwiedeman
Copy link
Contributor

@gwiedeman gwiedeman commented Jun 29, 2021

Bagit supports multiple encodings in tag files, and just specifies the encoding in the bagit.txt Tag-File-Character-Encoding field. Bagit-python only supports UTF-8, so it seems relatively appropriate to mandate UTF-8 in a mailbag.

Bagit is unhelpfully agnostic about line endings. It supports both LF and CRFL and does not contain a standard place to document which is used. It seems very problematic to try and detect line endings. Bagit-python only supports LF line endings, so any bag created with Bagit-python will have LF line endings. We discussed using a custom field in bag-info.txt for this, but decided that since Bagit-python mandates LF, we can require it in the specification.

The main problem is that CSV files most commonly use CRLF line endings as required by RFC4180.

Thus, currently mailbag mandates UTF-8 for all tag files, but requires CRLF line endings for mailbag.csv, and LF line endings for all other tag files. ¯\(ツ)

@nkrabben
Copy link

@nkrabben nkrabben commented Jul 21, 2021

Would it be useful to require LF only for those tag file defined in the RFC. This would reduce requirements on any additional tag files that might be added beyond the mailbag spec.

Since you mention the UTF-8 requirement, will there be a specification about whether or not the UTF-8 bytemark will be required? The Bagit spec is agnostic outside of bag-info.txt. https://datatracker.ietf.org/doc/html/rfc8493#section-2.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants