Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comments #101

Closed
ElDeveloper opened this issue Mar 24, 2020 · 2 comments
Closed

Support comments #101

ElDeveloper opened this issue Mar 24, 2020 · 2 comments

Comments

@ElDeveloper
Copy link

Technically this is not specified in Illumina's sample sheet specification, but it would be handy if the parser could handle "comments" i.e. lines that start with a # character. If these lines could be stored in the SampleSheet class and then saved when writing to a file that would be very handy.

@clintval
Copy link
Owner

clintval commented Apr 21, 2020

Thanks for the idea @ElDeveloper!

Did you know that you can make additional arbitrary key-value pairs in the [Header] section of the sample sheet and even encode all your metadata in the sample sheet proper? I would recommend doing that so all your information is visible to the reader of the sample sheet. Although the file specification allows lines that start with a #, these would show up in a CSV file as the first data field in a row starting with a # and would not be parsed as a comment.

The Illumina specification on this section:

The Header section is required, and must be located on the first line of the Sample Sheet file. The Header contains informational fields describing the context around which a sequencing run or analysis was performed (eg, date, workflow, library prep kit, chemistry, etc.).

Header records are represented as a series of key-value pairs. As such, each line requires exactly two fields. The first field in each line is the "key," which names the piece of metadata being recorded. Each key in the Header section must be unique.
The second field in each line is the "value," which is the actual piece of metadata being recorded. Values do not necessarily need to be unique.

Example of a legal "Header" section containing records describing "Date" and "Investigator":

[Header]
Date,2007-01-26
Workflow,GenerateFASTQ
Investigator,John Smith

Example of a legal "Header" section (with padded commas):

[Header],,,,,,
Date,2007-01-26,,,,,
Workflow,GenerateFASTQ,,,,,
Investigator,John Smith,,,,,

https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf

@ElDeveloper
Copy link
Author

Fair enough, thanks for the reply. I agree with you, moving "comments" to a dedicated section would likely be the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants