Skip to content

258/9 - Keep self-alignments and add duckdb allocation limits#260

Merged
rbdavid merged 12 commits into
nextflow-testfrom
258-keep-self-alignment-lines-in-all-by-all-processing
Feb 13, 2026
Merged

258/9 - Keep self-alignments and add duckdb allocation limits#260
rbdavid merged 12 commits into
nextflow-testfrom
258-keep-self-alignment-lines-in-all-by-all-processing

Conversation

@rbdavid
Copy link
Copy Markdown
Contributor

@rbdavid rbdavid commented Feb 11, 2026

closes #258 and #259

For #258, removed the guilty lines in the sql template files that skip over lines in the .tab.parquet and 1.out.parquet files where qseqid = sseqid. Those are important to keep, whether the initial sequence set was reduced to representative sequences or not.

For #259, the prereduce-template.sql file did not have template lines to enable us to specify a memory limit and a temp dir where duckdb would write temp files. Apparently, when a memory limit is not set, duckdb uses 80% of the physical RAM of the system; when duckdb is running on a shared compute resource, this is very likely to be waaaaayyyyy over the allocated RAM amount.

@rbdavid
Copy link
Copy Markdown
Contributor Author

rbdavid commented Feb 11, 2026

Here's the blog post that explains duckdb's memory management strategy: https://duckdb.org/2024/07/09/memory-management

Here's the duckdb documentation where the default memory limit is set: https://duckdb.org/docs/stable/operations_manual/limits

@rbdavid
Copy link
Copy Markdown
Contributor Author

rbdavid commented Feb 11, 2026

There will be some slight clashes with the #255 PR due to the few lines in all_by_all.nf that have been changed between those two branches. Those should be easy to resolve.

@rbdavid rbdavid marked this pull request as ready for review February 11, 2026 15:35
@rbdavid rbdavid requested a review from nilsoberg February 11, 2026 15:35
@rbdavid rbdavid self-assigned this Feb 11, 2026
Copy link
Copy Markdown
Collaborator

@nilsoberg nilsoberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was important to add these arguments to duckdb.
There are some issues with escaping, and I suggested changes.
Could you document with concrete results that these changes fixed the potential issue?

Comment thread pipelines/est/subworkflows/all_by_all.nf Outdated
Comment thread pipelines/est/subworkflows/all_by_all.nf Outdated
Comment thread pipelines/est/subworkflows/all_by_all.nf Outdated
Comment thread pipelines/est/subworkflows/all_by_all.nf Outdated
Comment thread pipelines/est/subworkflows/reporting.nf Outdated
Comment thread pipelines/est/subworkflows/reporting.nf Outdated
Comment thread pipelines/est/subworkflows/all_by_all.nf Outdated
Comment thread pipelines/est/templates/prereduce-template.sql
rbdavid and others added 5 commits February 11, 2026 17:21
Use `task.hash` instead of the `params.job_id` value. This avoids potential clashes when multiple subprocesses are running duckdb calls.

Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
Use `task.hash` instead of `params.job_id` to avoid potential clashes in duckdb calls.

Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
Use `task.hash` instead of `params.job_id` to avoid clashes in duckdb calls.

Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
@rbdavid rbdavid requested a review from nilsoberg February 13, 2026 00:22
@rbdavid rbdavid merged commit bd0aa67 into nextflow-test Feb 13, 2026
@rbdavid rbdavid deleted the 258-keep-self-alignment-lines-in-all-by-all-processing branch February 13, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

blastreduce code should not remove self-alignment lines

2 participants