Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix nightly #15141

Merged
merged 3 commits into from Jun 4, 2019
Merged

fix nightly #15141

merged 3 commits into from Jun 4, 2019

Conversation

roywei
Copy link
Member

@roywei roywei commented Jun 3, 2019

Description

Fix nightly test failure: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/333/pipeline/143

  1. large tensor requires more shared memory (docker OOM)-> increase size to 10G
  2. tutorial test requires more GPU memory (cuda malloc OOM) -> change to P3 instances with larger GPU memory

We should really enforece nightly test to run for PRs changing nightly tests.
Fixing it afterwards take long time for contributors not familiar with CI

@roywei
Copy link
Member Author

roywei commented Jun 4, 2019

for amp tutorial, I have verified on AWS P3.8 instance nightly job passed. Same verification has been done in #15135

Copy link
Contributor

@jlcontreras jlcontreras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for helping fix this!

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 134a3e8 into apache:master Jun 4, 2019
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* fix nightly

* disable large tensor

* update issue link
@roywei roywei deleted the fix_nightly branch July 2, 2019 23:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants