Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Pull request overview
conftest.py
.spacy==3.2.1
anden-core-web-sm==3.2.0
.Details
conftest.py
changesFirst and foremost you'll see many changes in
conftest.py
. I've included fixtures for each of the individual documents, so they can be more easily used throughout the test suite from now on. Note thatdoc
andlong_doc
still exist, and this update is exclusively additions rather than changes.test_base.py
changesThe main changes here are within the
test_summary
function. Firstly, I've started using thelong_doc
fixture here, so the function itself doesn't need to worry about filepaths to data files etc. Beyond that, I've updated the expected results forspacy==3.2.1
anden-core-web-sm==3.2.0
. Because we rely so heavily on spaCy, we'll always run into situations where tests break, but at least now there is a comment specifying for which versions these tests pass. Lastly, I've stopped converting thesent_dist.phrases
set to a list, as I see no need to do that. Set order is (mathematically) undefined, and so I'd rather not rely on Python to ensure ordering.Then, I've also added my new fixtures to
test_multiple_summary
.test_positionrank.py
changesThis fix is a little bit of a hack, but works well for now. In essence, the issue here was that we expect
'Shanghai Shenhua'
to exist in one of the lists, but not in the other. However, with recent spaCy model changes, only'Shanghai Shenhua striker Odion Ighalo'
appears in the list. This causes a test failure, despite the chunk with'Shanghai Shenhua'
being ranked number 1 like expected.The hack is simply to convert the list of strings into one long string, and check whether
'Shanghai Shenhua'
is or is not a substring of this long list. This means that it will also check whether'Shanghai Shenhua'
is a substring of one of the chunks, which seems to be what we want.On a more general note, it might be prudent to have a central place where the specific versions of spaCy and its models are listed, so it becomes a bit easier to update the tests in the future. (Apologies if this already exists, and I couldn't find it)
Let me know if there are issues or comments - I'm always open to feedback.