tighten test_topic_word #3280

austereantelope · 2021-12-23T23:17:10Z

Hi,

The test test_topic_word in test_ldaseqmodel.py has an assertion bound (self.assertAlmostEqual(topics[0][0][1], expected_topic_word[0][1], places=2)) that is too loose. This means potential bug in the code could still pass the original test.

To quantify this I conducted some experiments where I generated multiple mutations of the source code under test and ran each mutant and the original code 100 times to build a distribution of their outputs. I used KS-test to find mutants that produced a different distribution from the original and use those mutants as a proxy for bugs that could be introduced. In the graph below I show the distribution of both the original code and also the mutants with a different distribution.

Here we see that the bound of 0.005 (same as round to 2 decimal place) is too loose since the original distribution (in orange) is much less than 0.005. Furthermore in this graph we can observe that there are many mutants (proxy for bugs) that are below the bound as well and that is undesirable since the test should aim to catch potential bugs in code. I quantify the "bug detection" of this assertion by varying the bound in a trade-off graph below.

In this graph, I plot the mutant catch rate (ratio of mutant outputs that fail the test) and the original pass rate (ratio of original output that pass the test). The original bound of 0.005 (red dotted line) has a catch rate of 0.16

To improve this test, I propose to tighten the bound to 0.0012 (the blue dotted line). The new bound has a catch rate of 0.375 (+0.215 increase compare to original) while still has >99 % pass rate (test is not flaky, I ran the updated test 500 times and observed >99 % pass rate). I think this is a good balance between improving the bug-detection ability of the test while keep the flakiness of the test low.

Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions or questions.

My Environment:

python=3.7.11
numpy=1.21.4

my Gensim Experiment SHA:
6e362663f23967f3c1931e2cb18d3d25f92aabb5

codecov · 2021-12-23T23:33:45Z

Codecov Report

Merging #3280 (4268860) into develop (ac3bbcd) will decrease coverage by 0.02%.
The diff coverage is 82.43%.

@@             Coverage Diff             @@
##           develop    #3280      +/-   ##
===========================================
- Coverage    81.43%   81.41%   -0.03%     
===========================================
  Files          122      122              
  Lines        21052    21107      +55     
===========================================
+ Hits         17144    17184      +40     
- Misses        3908     3923      +15

Impacted Files	Coverage Δ
gensim/utils.py	`71.60% <53.84%> (-0.38%)`	⬇️
gensim/models/lsimodel.py	`63.88% <60.00%> (-0.16%)`	⬇️
gensim/models/fasttext.py	`89.73% <80.00%> (-0.23%)`	⬇️
gensim/models/keyedvectors.py	`82.43% <90.24%> (+0.19%)`	⬆️
gensim/test/test_keyedvectors.py	`99.29% <100.00%> (+0.02%)`	⬆️
gensim/test/test_ldaseqmodel.py	`93.75% <100.00%> (ø)`
gensim/matutils.py	`77.49% <0.00%> (-0.64%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4808c1...4268860. Read the comment docs.

piskvorky · 2021-12-23T23:43:13Z

generated multiple mutations of the source code

I don't really understand what you mean by "mutants". But if the tests are tighter now while remaining stable, then that's great.

austereantelope · 2021-12-23T23:52:41Z

I don't really understand what you mean by "mutants".

Mutants here refers to small changes in source code. They are designed to act as a proxy for potential bug in the source code. I automatically generate them based on test coverage (only generate mutantions on the lines covered by the test).

piskvorky · 2021-12-24T00:26:48Z

Oh jesus. That's some next level investigation!

Could this be applied to other tests in Gensim? Or even other projects? Sounds really interesting.

austereantelope · 2021-12-24T00:34:44Z

Could this be applied to other tests in Gensim? Or even other projects? Sounds really interesting.

Thank you! Currently we are looking at other tests in Gensim and also other projects. We are doing more local testing to make sure the tightened bound does not make the test flaky as well

piskvorky · 2021-12-24T10:05:21Z

Nice. Who are "we"?

austereantelope · 2021-12-24T17:12:08Z

We are a group of researchers working on such non-deterministic tests in ML projects, particually on tests with loose bounds

austereantelope · 2022-01-11T23:54:11Z

@piskvorky @mpenkov Does it look good to merge?

…ic_word

mpenkov · 2022-03-22T11:30:35Z

LGTM, merging. Thank you @austereantelope !

tighten test_topic_word

95570b7

piskvorky requested a review from mpenkov December 23, 2021 23:39

piskvorky added this to the Next release milestone Feb 18, 2022

piskvorky assigned mpenkov Feb 25, 2022

mpenkov added 2 commits March 18, 2022 23:17

Merge remote-tracking branch 'upstream/develop' into tighten-test_top…

aec2862

…ic_word

Merge remote-tracking branch 'upstream/develop' into tighten-test_top…

4268860

…ic_word

mpenkov approved these changes Mar 22, 2022

View reviewed changes

mpenkov merged commit 546de20 into piskvorky:develop Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tighten test_topic_word #3280

tighten test_topic_word #3280

austereantelope commented Dec 23, 2021

codecov bot commented Dec 23, 2021 •

edited

Loading

piskvorky commented Dec 23, 2021

austereantelope commented Dec 23, 2021

piskvorky commented Dec 24, 2021

austereantelope commented Dec 24, 2021

piskvorky commented Dec 24, 2021

austereantelope commented Dec 24, 2021

austereantelope commented Jan 11, 2022

mpenkov commented Mar 22, 2022

tighten test_topic_word #3280

tighten test_topic_word #3280

Conversation

austereantelope commented Dec 23, 2021

codecov bot commented Dec 23, 2021 • edited Loading

Codecov Report

piskvorky commented Dec 23, 2021

austereantelope commented Dec 23, 2021

piskvorky commented Dec 24, 2021

austereantelope commented Dec 24, 2021

piskvorky commented Dec 24, 2021

austereantelope commented Dec 24, 2021

austereantelope commented Jan 11, 2022

mpenkov commented Mar 22, 2022

codecov bot commented Dec 23, 2021 •

edited

Loading