Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test_operator.test_correlation #10280

Closed
marcoabreu opened this issue Mar 27, 2018 · 16 comments
Closed

Flaky test_operator.test_correlation #10280

marcoabreu opened this issue Mar 27, 2018 · 16 comments

Comments

@marcoabreu
Copy link
Contributor

marcoabreu commented Mar 27, 2018

======================================================================

FAIL: test_operator.test_correlation

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/unittest/test_operator.py", line 2296, in test_correlation

    unittest_correlation((5,1,15,15), kernel_size = 1,max_displacement = 5,stride1 = 1,stride2 = 1,pad_size = 5,is_multiply = False, dtype = dtype)

  File "/work/mxnet/tests/python/unittest/test_operator.py", line 2266, in unittest_correlation

    assert_almost_equal(exe1.grad_dict['img1'].asnumpy(), grad1, rtol=1e-3, atol=1e-4)

  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 24.067389 exceeds tolerance rtol=0.001000, atol=0.000100.  Location of maximum error:(4, 0, 7, 1), a=85.000000, b=83.000000

 a: array([[[[ 97.,  93.,  47., ...,  47., 117.,  59.],

         [ 47.,  43.,  89., ...,  77., 107., 117.],

         [ 45.,  47.,  99., ...,  31.,  31.,  41.],...

 b: array([[[[ 97.,  93.,  47., ...,  47., 117.,  59.],

         [ 47.,  43.,  89., ...,  77., 107., 117.],

         [ 45.,  47.,  99., ...,  31.,  31.,  41.],...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=159153780 to reproduce.

--------------------- >> end captured logging << ---------------------

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/550/pipeline/
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10213/7/pipeline/

https://issues.apache.org/jira/browse/MXNET-239

@marcoabreu marcoabreu added this to To Do in Tests Improvement via automation Mar 27, 2018
@ThomasDelteil
Copy link
Contributor

@zheng-da
Copy link
Contributor

@zheng-da
Copy link
Contributor

It seems this one can be reproduced by certain seeds

@eric-haibin-lin
Copy link
Member

@haojin2 is looking into this

@szha szha self-assigned this Jun 23, 2018
@szha
Copy link
Member

szha commented Jul 2, 2018

Seems already fixed in #10135

@szha szha closed this as completed Jul 2, 2018
Tests Improvement automation moved this from To Do to Done Jul 2, 2018
@marcoabreu marcoabreu reopened this Jul 2, 2018
Tests Improvement automation moved this from Done to In progress Jul 2, 2018
@marcoabreu
Copy link
Contributor Author

@szha this issue was created after the pr has been merged

@haojin2
Copy link
Contributor

haojin2 commented Jul 2, 2018

@marcoabreu I think #10533 was fixing this flaky test

@szha
Copy link
Member

szha commented Jul 2, 2018

@marcoabreu my bad for including the wrong PR. Has there been more recent occurrences for this flaky test?

@haojin2
Copy link
Contributor

haojin2 commented Jul 2, 2018

@marcoabreu This actually reminds me of the issue that sometimes CI is not using the latest master branch automatically, did you have a chance to take a look at that problem?

@marcoabreu
Copy link
Contributor Author

CI only does not use the latest master if you trigger a rerun in the web interface instead of making a new commit. otherwise, it should be working fine.

@haojin2
Copy link
Contributor

haojin2 commented Jul 2, 2018

And I guess this issue could be closed now? I'm not seeing any new reports of this issue after the fix was merged.

@szha
Copy link
Member

szha commented Jul 2, 2018

@haojin2 could you report an issue on the CI problem you faced instead?

@marcoabreu is this issue still relevant? Do you have more errors on this test that we should look into?

@haojin2
Copy link
Contributor

haojin2 commented Jul 2, 2018

@marcoabreu Well, then I guess we shall inform everyone with the right to re-run builds about this issue then?

@marcoabreu
Copy link
Contributor Author

Everybody should be aware. There are not many people who have that permission.

I will check for data tomorrow

@haojin2
Copy link
Contributor

haojin2 commented Jul 2, 2018

@marcoabreu I just asked 2 commiters from my team and neither of them was aware of this issue, so I guess there's still the necessity of informing the related ones about this?
And, please close the issue once you can confirm there's no more recent occurrences.

@marcoabreu
Copy link
Contributor Author

I think I explained it on dev@ a few weeks ago. But yeah, in general, we don't want anybody to use the retrigger-function in the web interface and consider it as a hidden undocumented feature. I'll send out an email when I got time

Closing since there have not been any recent failures.

Tests Improvement automation moved this from In progress to Done Jul 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

6 participants