Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPRODUCIBILITY: Call external tools with <md5 checksum>.<ext> to make file headers stable #25

Open
HenrikBengtsson opened this issue Jul 13, 2015 · 1 comment

Comments

@HenrikBengtsson
Copy link
Owner

Some external HT-Seq tools stores the "call" string in the file header of the output file. For instance, when aligning a FASTQ file, the BAM read group field @CL stores the command call as a string. In order to maximize the chance for the generated BAM file to be identical (same md5 checksum) for the same input, the @CL string must be the same as well. To achieve this, the call should be made with input files being based on the file checksum of the input files rather than the (original) filename. This can be achieved by using symbol file links.

Question: Is this a good idea or will it make the @CL too hard to interpret.

@HenrikBengtsson
Copy link
Owner Author

For the same reason should the binary/executable be called without absolute paths, e.g. by creating a local link such that the path is the current directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant