Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-o option only works for sketching databases, but not samples #7

Closed
fplazaonate opened this issue Dec 19, 2023 · 3 comments
Closed

Comments

@fplazaonate
Copy link

Hi @bluenote-1577,

-o option seems to be ignored while sketching samples.

Could you fix this?

@bluenote-1577
Copy link
Owner

This is somewhat of a tricky issue...

The way the CLI is designed, -o only works for genomes because all genomes are grouped together, so they can all be renamed at once. There is no ambiguity.

But because sylph can sketch reads and genomes with the sketch option, it's not clear how -o should work for reads when genomes are also present. This is why -d is reserved for reads and -o for genomes.

In sylph v0.5, I am adding an option called --sample-names so that users can rename read sketch files to a list of sample names. This is probably what one wants for the -o option for reads.

If you have specific ideas on what -o should output for reads, let me know. For now, I will add a warning for when the user only uses -o for sketching reads.

@fplazaonate
Copy link
Author

IMO, sylph sketch should process reads by considering they come from a single sample and generate a single sylsp file, no matter the number of fastq files provided.
In this case, multiple fastq files would be multiple sequencing runs of the same library.

My lab and others generate multiple fastq files per sample to reach a target sequencing depth. Currently, sylph interface is not very convenient for that purpose.
The solution is to extract all the files on the fly:
-r <(zcat *.fastq.gz)

At the end, the output file as the name of the file descriptor (e.g: 63.sylsp) that has to renamed later.

@bluenote-1577
Copy link
Owner

Hmm very interesting. Thanks for the input.

I think I will keep this format for now because most software I'm aware of only processes one read pair per sample. What you're saying makes sense, perhaps as an optional mode of input.

I will add an option for renaming in sylph v0.5 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants