Improved test suite; fixed tests #205

tomaarsen · 2022-02-01T10:15:21Z

Hello!

Pull request overview

Improved test suite: created easily accessible fixtures per document in conftest.py.
Fixed tests for spacy==3.2.1 and en-core-web-sm==3.2.0.

Details

`conftest.py` changes

First and foremost you'll see many changes in conftest.py. I've included fixtures for each of the individual documents, so they can be more easily used throughout the test suite from now on. Note that doc and long_doc still exist, and this update is exclusively additions rather than changes.

`test_base.py` changes

The main changes here are within the test_summary function. Firstly, I've started using the long_doc fixture here, so the function itself doesn't need to worry about filepaths to data files etc. Beyond that, I've updated the expected results for spacy==3.2.1 and en-core-web-sm==3.2.0. Because we rely so heavily on spaCy, we'll always run into situations where tests break, but at least now there is a comment specifying for which versions these tests pass. Lastly, I've stopped converting the sent_dist.phrases set to a list, as I see no need to do that. Set order is (mathematically) undefined, and so I'd rather not rely on Python to ensure ordering.

Then, I've also added my new fixtures to test_multiple_summary.

`test_positionrank.py` changes

This fix is a little bit of a hack, but works well for now. In essence, the issue here was that we expect 'Shanghai Shenhua' to exist in one of the lists, but not in the other. However, with recent spaCy model changes, only 'Shanghai Shenhua striker Odion Ighalo' appears in the list. This causes a test failure, despite the chunk with 'Shanghai Shenhua' being ranked number 1 like expected.

The hack is simply to convert the list of strings into one long string, and check whether 'Shanghai Shenhua' is or is not a substring of this long list. This means that it will also check whether 'Shanghai Shenhua' is a substring of one of the chunks, which seems to be what we want.

On a more general note, it might be prudent to have a central place where the specific versions of spaCy and its models are listed, so it becomes a bit easier to update the tests in the future. (Apologies if this already exists, and I couldn't find it)

Let me know if there are issues or comments - I'm always open to feedback.

Tom Aarsen

ceteri · 2022-02-04T19:59:25Z

Awesome, this looks great @tomaarsen!

So far, we hadn't been testing much at all in terms of different versions of spaCy
although we should be.

Improved test suite; fixed tests for spacy==3.2.1, en-core-web-sm==3.2.0

8ae8290

ceteri merged commit f742f12 into DerwenAI:main Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved test suite; fixed tests #205

Improved test suite; fixed tests #205

tomaarsen commented Feb 1, 2022

ceteri commented Feb 4, 2022

Improved test suite; fixed tests #205

Improved test suite; fixed tests #205

Conversation

tomaarsen commented Feb 1, 2022

Pull request overview

Details

conftest.py changes

test_base.py changes

test_positionrank.py changes

ceteri commented Feb 4, 2022

`conftest.py` changes

`test_base.py` changes

`test_positionrank.py` changes