258/9 - Keep self-alignments and add duckdb allocation limits#260
Merged
rbdavid merged 12 commits intoFeb 13, 2026
Merged
Conversation
Contributor
Author
|
Here's the blog post that explains duckdb's memory management strategy: https://duckdb.org/2024/07/09/memory-management Here's the duckdb documentation where the default memory limit is set: https://duckdb.org/docs/stable/operations_manual/limits |
Contributor
Author
|
There will be some slight clashes with the #255 PR due to the few lines in |
nilsoberg
reviewed
Feb 11, 2026
Collaborator
nilsoberg
left a comment
There was a problem hiding this comment.
It was important to add these arguments to duckdb.
There are some issues with escaping, and I suggested changes.
Could you document with concrete results that these changes fixed the potential issue?
Use `task.hash` instead of the `params.job_id` value. This avoids potential clashes when multiple subprocesses are running duckdb calls. Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
Use `task.hash` instead of `params.job_id` to avoid potential clashes in duckdb calls. Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
Use `task.hash` instead of `params.job_id` to avoid clashes in duckdb calls. Co-authored-by: Nils Oberg <37158181+nilsoberg@users.noreply.github.com>
…l-by-all-processing
nilsoberg
approved these changes
Feb 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #258 and #259
For #258, removed the guilty lines in the sql template files that skip over lines in the
.tab.parquetand1.out.parquetfiles whereqseqid = sseqid. Those are important to keep, whether the initial sequence set was reduced to representative sequences or not.For #259, the
prereduce-template.sqlfile did not have template lines to enable us to specify a memory limit and a temp dir where duckdb would write temp files. Apparently, when a memory limit is not set, duckdb uses 80% of the physical RAM of the system; when duckdb is running on a shared compute resource, this is very likely to be waaaaayyyyy over the allocated RAM amount.