-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of annotation db creation, querying #1869
Conversation
[NEW] implemented using slots, so it takes less memory than the dict it replaces. It supports dictionary style indexing, so old code will work. It also aliases certain old keys. [CHANGED] updated return type hints to reflect the new class [CHANGED] updated tests to reflect these changes
[CHANGED] this frees parser from having to always start at the top of a file to figure out the format version.
[CHANGED] previously, this was reversed if a feature was on the minus strand.
[CHANGED] this was a private method on GffAnnotationDb but has been made a function to facilitate chunked reading of Gff files.
[CHANGED] iter_line_blocks() now supports num_lines=None, which results in all lines being returned.
[CHANGED] just calls bound sqlitedb's close method
[CHANGED] incomplete records in a GFF database can be updated
…tations [CHANGED] we achieve a ~75% reduction in RAM for creating a GffAnnotationDb for the human genome by combining iter_line_blocks(), which uses iter_splitlines(), merged_gff_records() and GffAnnotationDb.update_record_spans(). The load_annotations(lines_per_block=500_000) argument controls how many lines are read before the insert is done. We track all record name's that have been inserted and update their existing spans.
[NEW] builds indexes for standard columns, biotype, seqid, start, etc..
Pull Request Test Coverage Report for Build 9106127053Details
💛 - Coveralls |
[NEW] thanks to comment in code review by khiron, added # codacy:ignore[sql-injection] - limited SQL injection exposure to silence this codacy warning. As this is purely in a test, it doesn't seem to have much risk.
src/cogent3/parse/gff.py
Outdated
@functools.singledispatch | ||
def is_gff3(f) -> bool: | ||
"""True if gff-version is 3""" | ||
raise TypeError(f"unsopported type type {type(f)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uns_u_pported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and "type" x 2! Jeez
[CHANGED] seems comment ws incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one spelling error needs fixing other than that it's good
[CHANGED] this is from the bandit tool, which indicates B608 as the error for hardcoded_sql_expressions
No description provided.