Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved test suite; fixed tests #205

Merged
merged 1 commit into from
Feb 4, 2022
Merged

Improved test suite; fixed tests #205

merged 1 commit into from
Feb 4, 2022

Conversation

tomaarsen
Copy link
Contributor

Hello!

Pull request overview

  • Improved test suite: created easily accessible fixtures per document in conftest.py.
  • Fixed tests for spacy==3.2.1 and en-core-web-sm==3.2.0.

Details

conftest.py changes

First and foremost you'll see many changes in conftest.py. I've included fixtures for each of the individual documents, so they can be more easily used throughout the test suite from now on. Note that doc and long_doc still exist, and this update is exclusively additions rather than changes.

test_base.py changes

The main changes here are within the test_summary function. Firstly, I've started using the long_doc fixture here, so the function itself doesn't need to worry about filepaths to data files etc. Beyond that, I've updated the expected results for spacy==3.2.1 and en-core-web-sm==3.2.0. Because we rely so heavily on spaCy, we'll always run into situations where tests break, but at least now there is a comment specifying for which versions these tests pass. Lastly, I've stopped converting the sent_dist.phrases set to a list, as I see no need to do that. Set order is (mathematically) undefined, and so I'd rather not rely on Python to ensure ordering.

Then, I've also added my new fixtures to test_multiple_summary.

test_positionrank.py changes

This fix is a little bit of a hack, but works well for now. In essence, the issue here was that we expect 'Shanghai Shenhua' to exist in one of the lists, but not in the other. However, with recent spaCy model changes, only 'Shanghai Shenhua striker Odion Ighalo' appears in the list. This causes a test failure, despite the chunk with 'Shanghai Shenhua' being ranked number 1 like expected.

The hack is simply to convert the list of strings into one long string, and check whether 'Shanghai Shenhua' is or is not a substring of this long list. This means that it will also check whether 'Shanghai Shenhua' is a substring of one of the chunks, which seems to be what we want.


On a more general note, it might be prudent to have a central place where the specific versions of spaCy and its models are listed, so it becomes a bit easier to update the tests in the future. (Apologies if this already exists, and I couldn't find it)

Let me know if there are issues or comments - I'm always open to feedback.

  • Tom Aarsen

@ceteri
Copy link
Collaborator

ceteri commented Feb 4, 2022

Awesome, this looks great @tomaarsen!

So far, we hadn't been testing much at all in terms of different versions of spaCy
although we should be.

@ceteri ceteri merged commit f742f12 into DerwenAI:main Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants