Improve performance of annotation db creation, querying #1869

GavinHuttley · 2024-05-15T22:32:19Z

No description provided.

[NEW] implemented using slots, so it takes less memory than the dict it replaces. It supports dictionary style indexing, so old code will work. It also aliases certain old keys. [CHANGED] updated return type hints to reflect the new class [CHANGED] updated tests to reflect these changes

[CHANGED] this frees parser from having to always start at the top of a file to figure out the format version.

[CHANGED] previously, this was reversed if a feature was on the minus strand.

[CHANGED] this was a private method on GffAnnotationDb but has been made a function to facilitate chunked reading of Gff files.

[CHANGED] iter_line_blocks() now supports num_lines=None, which results in all lines being returned.

[CHANGED] just calls bound sqlitedb's close method

[CHANGED] incomplete records in a GFF database can be updated

…tations [CHANGED] we achieve a ~75% reduction in RAM for creating a GffAnnotationDb for the human genome by combining iter_line_blocks(), which uses iter_splitlines(), merged_gff_records() and GffAnnotationDb.update_record_spans(). The load_annotations(lines_per_block=500_000) argument controls how many lines are read before the insert is done. We track all record name's that have been inserted and update their existing spans.

[NEW] builds indexes for standard columns, biotype, seqid, start, etc..

coveralls · 2024-05-15T23:21:33Z

Pull Request Test Coverage Report for Build 9106127053

Details

139 of 153 (90.85%) changed or added relevant lines in 2 files are covered.
10 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.03%) to 91.905%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/cogent3/parse/gff.py	96	110	87.27%

Files with Coverage Reduction	New Missed Lines	%
src/cogent3/parse/gff.py	10	84.81%

Totals
Change from base Build 9088317130:	-0.03%
Covered Lines:	30247
Relevant Lines:	32911

💛 - Coveralls

tests/test_core/test_annotation_db.py

[NEW] thanks to comment in code review by khiron, added # codacy:ignore[sql-injection] - limited SQL injection exposure to silence this codacy warning. As this is purely in a test, it doesn't seem to have much risk.

khiron · 2024-05-16T03:21:45Z

src/cogent3/parse/gff.py

+@functools.singledispatch
+def is_gff3(f) -> bool:
+    """True if gff-version is 3"""
+    raise TypeError(f"unsopported type type {type(f)}")


uns_u_pported

and "type" x 2! Jeez

[CHANGED] seems comment ws incorrect

khiron

one spelling error needs fixing other than that it's good

[CHANGED] this is from the bandit tool, which indicates B608 as the error for hardcoded_sql_expressions

GavinHuttley added 12 commits May 15, 2024 17:43

MAINT: improve type hints in parse/gff.py

e7962a2

API: added gff3 flag argumnent to gff_parser

e7453b8

[CHANGED] this frees parser from having to always start at the top of a file to figure out the format version.

MAINT: gff parser now always return feature coordinates start < stop

b0b5f17

[CHANGED] previously, this was reversed if a feature was on the minus strand.

ENH: gff.merged_gff_records function combines coords for the same ID

dd57654

[CHANGED] this was a private method on GffAnnotationDb but has been made a function to facilitate chunked reading of Gff files.

MAINT: update type hints to use cogent3.util.io.PathType

f986e1d

MAINT: modify default values tests for io.iter_splitlines etc...

14ce190

[CHANGED] iter_line_blocks() now supports num_lines=None, which results in all lines being returned.

ENH: AnnotationDb.close() method for GFF and Genbank

fdd70f0

[CHANGED] just calls bound sqlitedb's close method

MAINT: tweak test

524667c

API: added GffAnnotationDb.update_record_spans() method

db49b0d

[CHANGED] incomplete records in a GFF database can be updated

ENH: added AnnotationDb.make_indexes() to improve query speed

624714f

[NEW] builds indexes for standard columns, biotype, seqid, start, etc..

GavinHuttley requested a review from khiron May 15, 2024 22:32

MAINT: fix singledispatch usage for py 3.9 compatability

d6873e2

GavinHuttley removed the request for review from khiron May 15, 2024 22:49

GavinHuttley added 2 commits May 16, 2024 08:52

MAINT: fixed accidental recursion... whoops!

23f9e93

MAINT: fixed another py 3.9 test issue

4976b16

GavinHuttley requested a review from khiron May 15, 2024 23:20

khiron reviewed May 16, 2024

View reviewed changes

tests/test_core/test_annotation_db.py Outdated Show resolved Hide resolved

TST: skip codacy check on test sql query construction

e0296a3

[NEW] thanks to comment in code review by khiron, added # codacy:ignore[sql-injection] - limited SQL injection exposure to silence this codacy warning. As this is purely in a test, it doesn't seem to have much risk.

khiron reviewed May 16, 2024

View reviewed changes

MAINT: removed comment line to turn of codacy warning

7bd4bd8

[CHANGED] seems comment ws incorrect

khiron approved these changes May 16, 2024

View reviewed changes

GavinHuttley added 2 commits May 16, 2024 13:39

MAINT: fix typo

b12a1ba

MAINT: another attempt to turn of sql-injection warning

db712ed

[CHANGED] this is from the bandit tool, which indicates B608 as the error for hardcoded_sql_expressions

GavinHuttley merged commit ea44afc into cogent3:develop May 16, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of annotation db creation, querying #1869

Improve performance of annotation db creation, querying #1869

GavinHuttley commented May 15, 2024

coveralls commented May 15, 2024 •

edited

Loading

khiron May 16, 2024

GavinHuttley May 16, 2024

khiron left a comment

Improve performance of annotation db creation, querying #1869

Improve performance of annotation db creation, querying #1869

Conversation

GavinHuttley commented May 15, 2024

coveralls commented May 15, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9106127053

Details

💛 - Coveralls

khiron May 16, 2024

Choose a reason for hiding this comment

GavinHuttley May 16, 2024

Choose a reason for hiding this comment

khiron left a comment

Choose a reason for hiding this comment

coveralls commented May 15, 2024 •

edited

Loading