[WIP] Check for stable reproduction on bug witch FC subgraph by anko-intel · Pull Request #20878 · apache/mxnet

anko-intel · 2022-02-07T09:54:40Z

Description

Set seed on which test fails (see https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-20871/1/pipeline/294/)

PArt of the log:
[2022-02-03T22:17:49.230Z] FAIL: test_subgraph.test_pos_fc_sum
[2022-02-03T22:17:49.230Z] ----------------------------------------------------------------------
[2022-02-03T22:17:49.230Z] Traceback (most recent call last):
[2022-02-03T22:17:49.230Z] File "/usr/local/lib/python3.7/dist-packages/nose/case.py", line 198, in runTest
[2022-02-03T22:17:49.230Z] self.test(*self.arg)
[2022-02-03T22:17:49.230Z] File "/work/mxnet/tests/python/mkl/../unittest/common.py", line 218, in test_new
[2022-02-03T22:17:49.230Z] orig_test(*args, **kwargs)
[2022-02-03T22:17:49.230Z] File "/work/mxnet/tests/python/mkl/test_subgraph.py", line 912, in test_pos_fc_sum
[2022-02-03T22:17:49.230Z] check_quantize_fc_sum(net, inputs, out_type, attrs, quantize_mode = quantize_mode)
[2022-02-03T22:17:49.230Z] File "/work/mxnet/tests/python/mkl/test_subgraph.py", line 902, in check_quantize_fc_sum
[2022-02-03T22:17:49.230Z] assert_almost_equal_with_err(quantized_out[i].asnumpy(), ref_out[i].asnumpy(), rtol=0.1, atol=atol, etol=0.2)
[2022-02-03T22:17:49.230Z] File "/work/mxnet/python/mxnet/test_utils.py", line 818, in assert_almost_equal_with_err
[2022-02-03T22:17:49.230Z] raise AssertionError(msg)
[2022-02-03T22:17:49.230Z] AssertionError:
[2022-02-03T22:17:49.230Z] Items are not equal:
[2022-02-03T22:17:49.230Z] Error 9.747497 exceeds tolerance rtol=1.000000e-01, atol=6.566026e+00 (mismatch 100.000000%).
[2022-02-03T22:17:49.230Z] Location of maximum error: (0, 1), a=-63.16788483, b=33.04635620
[2022-02-03T22:17:49.230Z] ACTUAL: array([[ 0. , -63.167885 , 0. , ..., -33.621616 ,
[2022-02-03T22:17:49.230Z] 63.167885 , -0.5094184]], dtype=float32)
[2022-02-03T22:17:49.230Z] DESIRED: array([[-35.753857, 33.046356, -44.51747 , ..., 48.12758 , 31.732088,
[2022-02-03T22:17:49.230Z] 65.660255]], dtype=float32)
[2022-02-03T22:17:49.230Z] -------------------- >> begin captured stdout << ---------------------
[2022-02-03T22:17:49.230Z]
[2022-02-03T22:17:49.230Z] *** Maximum errors for vector of size 10: rtol=0.1, atol=6.566025543212891
[2022-02-03T22:17:49.230Z]
[2022-02-03T22:17:49.230Z] 1: Error 9.747497 Location of error: (0, 1), a=-63.16788483, b=33.04635620
[2022-02-03T22:17:49.230Z] 2: Error 9.069855 Location of error: (0, 6), a=-56.03602600, b=37.80991745
[2022-02-03T22:17:49.230Z] 3: Error 7.184352 Location of error: (0, 7), a=-33.62161636, b=48.12757874
[2022-02-03T22:17:49.230Z] 4: Error 5.038792 Location of error: (0, 9), a=-0.50941843, b=65.66025543
[2022-02-03T22:17:49.230Z] 5: Error 4.040514 Location of error: (0, 2), a=0.00000000, b=-44.51747131
[2022-02-03T22:17:49.230Z] 6: Error 3.525531 Location of error: (0, 0), a=0.00000000, b=-35.75385666
[2022-02-03T22:17:49.230Z] 7: Error 3.289533 Location of error: (0, 4), a=0.00000000, b=-32.18727493
[2022-02-03T22:17:49.230Z] 8: Error 3.227748 Location of error: (0, 8), a=63.16788483, b=31.73208809
[2022-02-03T22:17:49.230Z] 9: Error 1.703138 Location of error: (0, 3), a=25.98033905, b=12.64403915
[2022-02-03T22:17:49.230Z] 10: Error 1.104254 Location of error: (0, 5), a=0.00000000, b=-8.15059280

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

mxnet-bot · 2022-02-07T09:54:45Z

Hey @anko-intel , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, website, unix-gpu, clang, miscellaneous, edge, sanity, windows-cpu, unix-cpu, windows-gpu, centos-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

anko-intel · 2022-02-17T11:09:02Z

@mxnet-bot run ci [unix-cpu]

mxnet-bot · 2022-02-17T11:09:08Z

Jenkins CI successfully triggered : [unix-cpu]

Set random seed to have stable fail reproduction

2367f46

mseth10 added the pr-work-in-progress PR is still work in progress label Feb 7, 2022

anko-intel closed this Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Check for stable reproduction on bug witch FC subgraph#20878

[WIP] Check for stable reproduction on bug witch FC subgraph#20878
anko-intel wants to merge 1 commit intoapache:v1.xfrom
anko-intel:anko_v1x_fc_add_issue

anko-intel commented Feb 7, 2022

Uh oh!

mxnet-bot commented Feb 7, 2022

Uh oh!

anko-intel commented Feb 17, 2022

Uh oh!

mxnet-bot commented Feb 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anko-intel commented Feb 7, 2022

Description

Checklist

Essentials

Changes

Comments

Uh oh!

mxnet-bot commented Feb 7, 2022

Uh oh!

anko-intel commented Feb 17, 2022

Uh oh!

mxnet-bot commented Feb 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants