Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genbank parser fixes #389

Closed
wants to merge 16 commits into from
Closed

Genbank parser fixes #389

wants to merge 16 commits into from

Conversation

abondrn
Copy link
Contributor

@abondrn abondrn commented Oct 30, 2023

Changes in this PR

Clearly and concisely summarize the changes you are making. Bullet points are completely okay. Please be specific, saying "improves X" is not enough!

Why are you making these changes?

Explain why these changes are necessary. Link to GitHub issues here with the format fixes: #XXX to indicate this PR resolves the issue.

  • Addresses several minor issues related to genbank, which I believe to encompass a small yet cohesive featureset. Issues have been linked in the respective changes.

Are any changes breaking? (IMPORTANT)

Will merging this PR change poly's API in a non-backwards-compatible manner?

Yes

  • This changes the type of Features.Attributes from map[string]string to MultiMap[string, string], an alias for map[string]string defined in multimap.go. This can be seen via adjustments to unit tests.
  • In addition to expanding the set of Genbank files which can be validly parsed, the parser should be more vocal when it encounters unusual syntax in the "feature" section. This "fail fast" approach is better as the file in the new functional test triggered a codepath which would neither return a valid Genbank object or an error, and should hopefully lead to faster turnaround on parser fixes in the future.

Pre-merge checklist

All of these must be satisfied before this PR is considered
ready for merging. Mergeable PRs will be prioritized for review.

  • New packages/exported functions have docstrings.
  • New/changed functionality is thoroughly tested.
  • New/changed functionality has a function giving an example of its usage in the associated test file. See primers/primers_test.go for what this might look like.
  • Changes are documented in CHANGELOG.md in the [Unreleased] section.
  • All code is properly formatted and linted.
  • The PR template is filled out.

@abondrn abondrn changed the title Parser fixes Genbank parser fixes Oct 30, 2023
Copy link
Contributor

@Koeng101 Koeng101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few changes for more documentation, other than that, looking pretty good!

io/genbank/genbank.go Outdated Show resolved Hide resolved
io/genbank/genbank.go Outdated Show resolved Hide resolved
CONTRIBUTING.md Show resolved Hide resolved
data/NC_001141.2.gb Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
io/genbank/genbank.go Outdated Show resolved Hide resolved
io/genbank/genbank.go Outdated Show resolved Hide resolved
io/genbank/genbank.go Outdated Show resolved Hide resolved
io/genbank/genbank.go Outdated Show resolved Hide resolved
io/genbank/genbank_test.go Show resolved Hide resolved
@abondrn abondrn requested a review from Koeng101 October 31, 2023 01:20
Copy link
Contributor

@Koeng101 Koeng101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments! Also, I think there are some architectural decisions. I think we should remove polyjson, and just have a genbank json export function. lemme try to message tim and willow

Comment on lines 14 to 18
// defined as a simple type alias over a map of slices
// while not ideal (eg computing total number of items takes O(N))
// this has the advantage of being compatible with json.Marshal, cmp.Diff,
// pretty printing, and bracket indexing out of the box.
type MultiMap[K, V comparable] map[K][]V
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the advantages of:

  1. Compatible with json.Marhsal, cmp.Diff, pretty printing
  2. Generic types

Should be highlighted above. This little section was more convincing than O(1) lookup time, mainly because we can accomplish that with just maps. We can't get the guarantees around cmp.Diff, which are actually quite useful, and it also doesn't work on generic types - and we might want more of those in the future.

However, I think we should be more explicit about where this is used (imagine being a user reading the docs and you come across MultiMap). In this case, I think the multimap type should have very explicit docs about where it is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic types can be had with other implementations, but I agree and emphasized these benefits more. Also contextualized where the MultiMap is currently used, which I will update if we expand in the future.

Comment on lines 20 to 23
// create a new empty multimap
func NewMultiMap[K, V comparable]() MultiMap[K, V] {
return make(map[K][]V)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure the docstrings are supposed to start with the name of the function, or am I wrong there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are 100% right

Comment on lines 817 to 823
// testcase for subset of unusual file
// includes implicit genome range with partial
// and origin is replaced with contig
func TestParseS288C_IX(t *testing.T) {
_, err := ReadMulti("../../data/NC_001141.2_redux.gb")
assert.Nil(t, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically mention that this is a regression test for that particular test case.

io/polyjson/polyjson.go Show resolved Hide resolved
io/polyjson/polyjson.go Show resolved Hide resolved
@abondrn
Copy link
Contributor Author

abondrn commented Nov 5, 2023

Closed in favor of #394

@abondrn abondrn closed this Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants