Skip to content

Unit-test bypass args#533

Merged
hakunanatasha merged 20 commits intomasterfrom
test_bypass
May 5, 2022
Merged

Unit-test bypass args#533
hakunanatasha merged 20 commits intomasterfrom
test_bypass

Conversation

@hakunanatasha
Copy link
Copy Markdown
Collaborator

@hakunanatasha hakunanatasha commented May 1, 2022

Supercedes #398

@galtay @sg-wbi

Adds the bypassing arguments to do the following:

(1) Ignore an entire data split (ex: ignore test)
(2) Ignore a key for all data splits (ex: ignore "entities")
(3) Ignore a SPECIFIC key in a SPECIFIC split (ex: Ignore "events" in the "test" split)

Split/key/split-key pairs will NOT affect the statistics function; I think it is important to see this either way.

The defaults for all of these are empty lists, meaning nothing is affected.

You can test them with the bionlp_st_2013_pc dataset that has the following statistics:

==========
id: 260
document_id: 260
passages: 260
entities: 7855
normalized: 0
events: 5992
coreferences: 455
relations: 0

validation
==========
id: 90
document_id: 90
passages: 90
entities: 2734
normalized: 0
events: 2129
coreferences: 128
relations: 0

test
==========
id: 175
document_id: 175
passages: 175
entities: 5312
normalized: 0
events: 0
coreferences: 0
relations: 0

Supported tasks include event extraction + coreference resolution, both which are missing in the test set.

You can test the script as follows:

# This will fail
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py 

# This passes
# Omit "events" AND "coreferences" in test
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py --bypass_split_key_pairs test,events test,coreferences

# This passes, but does not check test
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py --bypass_splits test

# This passes, but does not check events + coreferences
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py --bypass_keys events coreferences

# This will fail because the issue is with test not train
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py --bypass_splits train

# This fails because both events and coreferences are missing
python -m tests.test_bigbio biodatasets/bionlp_st_2013_pc/bionlp_st_2013_pc.py --bypass_split_key_pairs test,coreferences

@sg-wbi I'd like to add (or if you don't mind, you can add) some 'dummy' data to ensure this behavior is working as intended prior to merging. Otherwise, I have tested this on live data.

@hakunanatasha hakunanatasha changed the title Test bypass Unit-test bypass args May 1, 2022
Copy link
Copy Markdown
Collaborator

@galtay galtay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

This was referenced May 2, 2022
@hakunanatasha hakunanatasha merged commit f113917 into master May 5, 2022
@galtay galtay deleted the test_bypass branch September 5, 2022 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants