Merging changes on develop #1874

GavinHuttley · 2024-05-22T06:42:21Z

No description provided.

[NEW] implemented using slots, so it takes less memory than the dict it replaces. It supports dictionary style indexing, so old code will work. It also aliases certain old keys. [CHANGED] updated return type hints to reflect the new class [CHANGED] updated tests to reflect these changes

[CHANGED] this frees parser from having to always start at the top of a file to figure out the format version.

[CHANGED] previously, this was reversed if a feature was on the minus strand.

[CHANGED] this was a private method on GffAnnotationDb but has been made a function to facilitate chunked reading of Gff files.

[CHANGED] iter_line_blocks() now supports num_lines=None, which results in all lines being returned.

[CHANGED] just calls bound sqlitedb's close method

[CHANGED] incomplete records in a GFF database can be updated

…tations [CHANGED] we achieve a ~75% reduction in RAM for creating a GffAnnotationDb for the human genome by combining iter_line_blocks(), which uses iter_splitlines(), merged_gff_records() and GffAnnotationDb.update_record_spans(). The load_annotations(lines_per_block=500_000) argument controls how many lines are read before the insert is done. We track all record name's that have been inserted and update their existing spans.

[NEW] builds indexes for standard columns, biotype, seqid, start, etc..

[NEW] thanks to comment in code review by khiron, added # codacy:ignore[sql-injection] - limited SQL injection exposure to silence this codacy warning. As this is purely in a test, it doesn't seem to have much risk.

[CHANGED] seems comment ws incorrect

[CHANGED] this is from the bandit tool, which indicates B608 as the error for hardcoded_sql_expressions

Improve performance of annotation db creation, querying

NEW: abstract base class for views, fixes #1865

[NEW] MolType.is_compatible_alphabet() checks that the characters in an alphabet match those in one of the members of the MolType.alphabets. The argument strict=False (the default) means the exact ordering of elements must match. [NEW] AlphabetGroup.iter_alphabets() yields individual alphabets from the group.

This reverts commit d384bca, reversing changes made to 3a72310.

Revert "Merge branch 'seq-collections-refactor' into develop"

coveralls · 2024-05-22T08:34:44Z

Pull Request Test Coverage Report for Build 9200124195

Details

248 of 265 (93.58%) changed or added relevant lines in 5 files are covered.
10 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.04%) to 91.91%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/cogent3/core/sequence.py	102	103	99.03%
src/cogent3/core/moltype.py	6	8	75.0%
src/cogent3/parse/gff.py	96	110	87.27%

Files with Coverage Reduction	New Missed Lines	%
src/cogent3/parse/gff.py	10	84.81%

Totals
Change from base Build 9090395347:	-0.04%
Covered Lines:	30278
Relevant Lines:	32943

💛 - Coveralls

DOC: Add IndelMap param docstring

GavinHuttley and others added 30 commits May 15, 2024 17:43

MAINT: improve type hints in parse/gff.py

e7962a2

API: added gff3 flag argumnent to gff_parser

e7453b8

[CHANGED] this frees parser from having to always start at the top of a file to figure out the format version.

MAINT: gff parser now always return feature coordinates start < stop

b0b5f17

[CHANGED] previously, this was reversed if a feature was on the minus strand.

ENH: gff.merged_gff_records function combines coords for the same ID

dd57654

[CHANGED] this was a private method on GffAnnotationDb but has been made a function to facilitate chunked reading of Gff files.

NEW: added SliceRecordABC

1751f20

MAINT: update type hints to use cogent3.util.io.PathType

f986e1d

MAINT: modify default values tests for io.iter_splitlines etc...

14ce190

[CHANGED] iter_line_blocks() now supports num_lines=None, which results in all lines being returned.

ENH: AnnotationDb.close() method for GFF and Genbank

fdd70f0

[CHANGED] just calls bound sqlitedb's close method

MAINT: tweak test

524667c

API: added GffAnnotationDb.update_record_spans() method

db49b0d

[CHANGED] incomplete records in a GFF database can be updated

ENH: added AnnotationDb.make_indexes() to improve query speed

624714f

[NEW] builds indexes for standard columns, biotype, seqid, start, etc..

MAINT: fix singledispatch usage for py 3.9 compatability

d6873e2

MAINT: fixed accidental recursion... whoops!

23f9e93

MAINT: fixed another py 3.9 test issue

4976b16

TST: skip codacy check on test sql query construction

e0296a3

[NEW] thanks to comment in code review by khiron, added # codacy:ignore[sql-injection] - limited SQL injection exposure to silence this codacy warning. As this is purely in a test, it doesn't seem to have much risk.

MAINT: removed comment line to turn of codacy warning

7bd4bd8

[CHANGED] seems comment ws incorrect

MAINT: fix typo

b12a1ba

MAINT: another attempt to turn of sql-injection warning

db712ed

[CHANGED] this is from the bandit tool, which indicates B608 as the error for hardcoded_sql_expressions

Merge pull request #1869 from GavinHuttley/develop

ea44afc

Improve performance of annotation db creation, querying

DOC: make expected construction clear in docstring

427348d

MAINT: removed unused import

177a114

MAINT: remove unexpected kw arg 'sliced' from copy

ee211cf

DEV: SeqView seq is now keyword arg

43368e4

DEV: type hint, abstractions, and tweaks for SliceRecordABC

aefcfa8

DOC: fix docstring

388b3e7

DEV: SliceRecordABC now requires _get_init_kwargs for unique args

1934af4

STY: black

d90ffef

Merge pull request #1870 from KatherineCaley/develop

ac809bf

NEW: abstract base class for views, fixes #1865

fredjaya and others added 2 commits May 22, 2024 16:22

DOC: Add IndelMap param docstring

8cffa9e

GavinHuttley requested a review from KatherineCaley May 22, 2024 06:43

KatherineCaley and others added 3 commits May 22, 2024 17:10

Merge branch 'seq-collections-refactor' into develop

d384bca

Revert "Merge branch 'seq-collections-refactor' into develop"

ba843df

This reverts commit d384bca, reversing changes made to 3a72310.

Merge pull request #1875 from KatherineCaley/develop

a474b08

Revert "Merge branch 'seq-collections-refactor' into develop"

Merge pull request #1873 from fredjaya/develop

2f7c86a

DOC: Add IndelMap param docstring

GavinHuttley merged commit b832dd4 into seq-collections-refactor May 23, 2024
31 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging changes on develop #1874

Merging changes on develop #1874

GavinHuttley commented May 22, 2024

coveralls commented May 22, 2024 •

edited

Merging changes on develop #1874

Merging changes on develop #1874

Conversation

GavinHuttley commented May 22, 2024

coveralls commented May 22, 2024 • edited

Pull Request Test Coverage Report for Build 9200124195

Details

💛 - Coveralls

coveralls commented May 22, 2024 •

edited