Refactor kvstore test #13140

larroy · 2018-11-06T19:02:40Z

Description

Refactored kvstore test, skipping when there's no gpu and minor fixes to environment logic and code duplication

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

anirudhacharya

LGTM. @anirudh2290 for review and merge

anirudhacharya · 2018-11-06T22:31:37Z

@mxnet-label-bot [pr-awaiting-review]

vishaalkapoor · 2018-11-07T21:23:57Z

tests/python/gpu/test_device.py

@@ -19,35 +19,23 @@
 import numpy as np
 import unittest
 import os
+import logging
+
+from mxnet.test_utils import EnvManager

 shapes = [(10), (100), (1000), (10000), (100000), (2,2), (2,3,4,5,6,7,8)]
 keys = [1,2,3,4,5,6,7]
 num_gpus = len(mx.test_utils.list_gpus())


nit: you can use mx.context.num_gpus()

right, we should deprecate this other function that calls directly nvidia-smi then.

vishaalkapoor · 2018-11-07T21:24:11Z

LGTM

…3207)

This reverts commit d8d2d6e.

* Revert "Refactor kvstore test (#13140)" This reverts commit d8d2d6e. * Revert "[MXNET-793] Virtualized ARMv7 with Qemu CI integration (#13203)" This reverts commit fd3dedc. * Revert "Sphinx failure fixes (#13213)" This reverts commit 2e4d6c8.

* Change dependencies documentation opencv2-->opencv (#12654) * opencv2-->opencv; deleted duplicate content * update troubleshooting info * fix bug, issue 12613 (#12614) * [MXNET-780] Fix exception handling bug (#12051) * Fix exception handling bug * Trigger CI * Add test for exc handling * Trigger CI * Resolve conflicts * fix bug in prelu , issue 12061 (#12660) * fix bug in prelu * add unit test * add mentions of the gluon toolkits and links to resources (#12667) * add mentions of the gluon toolkits and links to resources * fix nlp and cv text * Remove fixed seed for test_ctc_loss (#12686) * remove apachecon promo (#12695) * Onnx version update from 1.2.1 to 1.3 in CI (#12633) * upgrade onnx * import helpers * upgrade version in ci * addressing comments * fix * test name changed * retrigger tests * adding comments * [MXNET-833] [R] Char-level RNN tutorial fix (#12670) * char RNN tutorial * nit fixes * Add documents for two new environment variables for memory pool. (#12668) * document env. * update. * update. * retrigger * update mshadow for omp acceleration when nvcc is not present (#12674) * update mshadow * bump * fix for test order (#12358) * [MXNET-951] Python dockerfiles built on pip binaries and build/release script (#12556) * Initial Commit for docker automation python * Fixes * change dir for tests * Fix more issues * fix docker tag command * cosmetic changes * update README * update test to fail on version mismatch * remove debug mode * Update README.md * Update README.md * update README * Add Licenses * Some review comments * Add Cuda80 and cuda92 dockerfiles and build steps * Add renamed and hence untracked files for cu90 * Update README * More ways to login * Update README with login options * Update README with links to test. test_mxnet link will work only after merge * [MXNET-500]Test cases improvement for MKLDNN on Gluon (#10921) * Rebase to align the latest changes on test_gluon.py * Referring the issue link to skip message * Retrigger the PRECI * Remove previous changes * Modify the cases trying to eliminate the errors on GPU * Resolving conflict * Further reduce the tensor size * minor changes * move to mkl * fix flaky case * Remove the test_mkldnn_gluon.py * Move the cases back to test_gluon.py * Enable test_gluon.test_export (#12688) * Update contribute.md (#12685) * Update contribute.md Fixed grammar * Update contribute.md Fixed Grammar * Update proposal_target.py (#12709) * [MXNET-637] Multidimensional LSTM example for MXNetR (#12664) * added R LSTM examples * added tutorial to whitelist * fix encoding * added seed and fixed few formatting issues * addressed PR comments * formatting fixes' * nit fixes * fix epochs * fixed tutorial link * import Julia binding - enable Jenkins CI build for Julia - add license headers to Julia source code - update links for Julia README * Fix static / dynamic linking of gperftools and jemalloc (#12714) * Disable test batchnorm slice (#12716) * [MXNET-860] Use emplace where helpful (#12694) * [MXNET-860] Use emplace where helpful * [MXNET-860] Add emplace as an error in clang-tidy * [MXNET-953] Correct ASAN cflags flag (#12659) * [MXNET-860] Remove std::moves that have no affect (#12730) * [MXNET-860] Remove std::moves that have no affect * [MXNET-860] Check for unneeded moves as errors * add FListInputNames attribute to softmax_cross_entropy (#12701) * Fix #12672, importing numpy scalars (zero-dimensional arrays) (#12678) * Fix https://github.com/apache/incubator-mxnet/issues/12672 Problem is in using np.ascontiguousarray, which is buggy for zero-dimensional arrays (see https://github.com/numpy/numpy/issues/5300 for details). Here I use the solution proposed by numpy team: switch to asarray with order='C'. Add some tests for this situation (for array() and for setitem too). * typo in tests * [MXNET-908] Speed up travis builds to avoid timeouts (#12706) This PR removes some redundant build tasks and removes some slow tests to try and decrease the number of TravisCI timeouts that would otherwise occur on large PRs. * Throw exception if MXSymbolInferShape fails. (#12733) * Throw exception if MXSymbolInferShape fails. * scala-package/native/src/main/native/org_apache_mxnet_native_c_api.cc: (Java_org_apache_mxnet_LibInfo_mxSymbolInferShape): throw IllegalArgumentException with the content of MXGetError if call to MXSymbolInferShape fails. * Remove stray space. * Don't throw in JNI. checkCall in scala code will do the right thing with a nonzero exit status. * Don't repeat the memory free code. Just wrap the FillSymbolInferShape calls in `if (ret == 0) { ... }`. * Fix too-long line. * [MXNET-716] Adding Scala Inference Benchmarks (#12721) * Adding Scala Inference Benchmark base class + an example of how to run it * Fixed scalastyle issues * Added platform check to the classpath * Formatting the metrics to print upto 2 decimal digits in float * Added bash script to fetch resnet-18 data and params * Added flag for cpu/gpu for running the script * Fixed duplicate if check * [MXNET-623] Fixing an integer overflow bug in large NDArray (#11742) * Fix integer overflow when the array size is too large * Update issue templates * Update issue templates * Remove files added by mistake * Fix compilation error after type index_t changed to int64_t * Explicity specify type in std::max template to avoid platform dependent compilation error * Add nightly test for large array * Update submodule mshadow * Fix compilation warning * Fix compilation warning * Change index variable type to size_t * Fix integer overflow when the array size is too large * Update issue templates * Remove files added by mistake * Fix compilation error after type index_t changed to int64_t * Explicity specify type in std::max template to avoid platform dependent compilation error * Add nightly test for large array * [MXNET-531] NeuralStyle Example for Scala (#11621) * add initial neuralstyle and test coverage * Add two more test and README * kill comments * patch on memory leaks fix * fix formatting issues * remove redundant files * disable the Gan example for now * add ignore method * add new download scheme to match the changes * Update submodule mshadow * Fix compilation warning * Fix compilation warning * Change index variable type to size_t * Change temp_size type from size_t to index_t * Fix lint error * Fix compilation error in GPU * Fix compilation error on GPU * Fix compilation error in cpp-package * Fix unit test in GPU * Change correct type for nnvmGraph * update mshadow submodule to local repo to verify * update mshadow submodule * change some data type to size_t * change unit test style * fix lint * fix compilation error in Windows * fix compilation error in Windows * use forked submodule to verify * temporarily update submodule to verify the fix * update mshadow submodule to use remote * add test to nightly test script * Change numpy version to 1.15.2 in python and docker install requirements (#12711) Default numpy version in The Python Package Index (PyPI) is 1.15.2 * Reenable test_gluon.test_conv (#12718) * reenable the test * Trigger CI * Refine mxnet python installation (#12696) * update the installation document * fix minor text * update the rename process * fix wording * remind users. of env and vs version * leave only the required dll * fix the link * update the R anchor * refine the description of step 7 * add missing . * fix spelling * update links and fix wording * Update packages and tests in the straight dope nightly (#12744) * [#12345] Enabling two tests in the Straight Dope Nightly. Two straight dope notebook tests were disabled due to a timeout so they were disabled. I've updated one of the notebooks (rnn-gluon) to use the gpu instead of the cpu so it takes ~ 5 minutes on a p3.2xl, and verified the other notebook takes a minute and was a false alarm (visual-qa). The PR in the Straight Dope is: https://github.com/zackchase/mxnet-the-straight-dope/pull/540 * Add dependency for IPython update. * Detect errors in notebook execution failure. * Clean up of naming in retry code. * Fix failing GPU test on single GPU host (kvstore) (#12726) Fixes #10977 * Add option for automatic downcasting dtype for cudnn to allow using Tensorcore for fp32 (#12722) * [MXNET-1026] [Perl] Sync with recent changes in Python's API (#12739) * * Added randn function * Internal SELU function on C++ layer * Predict now accepts ndarray as well * Gluon: Only warn when the blocks are unregistered. * Better sparse support for gluon * Gpu memory info via mxnet api call. * Gluon: Improved block summary. * Added validation docs for MXNet installation for Perl. * Flexible perl env for examples. * Gluon: Custom dtypes for the symbol block * Separate eval metric for the epoch level. * fixed typo. * fix benchmark on control flow operators. (#12693) * [MXNET-982] Provide example to illustrate usage of CSVIter in C++ API (#12636) * Adding the example to demonstrate the usage of CSVIter * Addressed the review comments to make the example configurable. Moved the unittests folder in 'examples' directory. * Updated the code to address the cpp lint errors. * Removed the author tag. * Fixing the lint errors and usage message. * Update README file for cpp-package and provide README file for example directory. * Revert "Update README file for cpp-package and provide README file for example directory." This reverts commit 02e784aaf927d465447d08a978b202bd5677a979. These files were part of fix for JIRA issue 1017. These files were mistakenly committed in this PR. * Addressed the review comments regarding usage of atoi and avoiding string copy. * Updated to use strtol instead of atoi * [MXNET-912] Refactoring ctc loss operator (#12637) * Implement ctc_loss as a normal operator * Update unit test * Update unit test and fix bug in backward * fix lint error * refactoring * Fix compilation error in CUDA * Fix CPU compilation error * Move ctc_include to nn folder and refactor * temporarily disable lint on 3rd party includes * move ctc_include to 3rdparty * remove contrib ctc_loss operator * revert a change by mistake * Fix a bug in kDevCPU * revert change by mistake * add alias to make it backward compatible * add unit test for backward compatibility * linting * Add new name to CONTRIBUTORS.md (#12763) * Add resnet50-v1 to benchmark_score (#12595) * add resnet50-v1 to benchmark_score * rename back and duplicated * rename v2 back to resnet.py * [MXNET-716][MIRROR #12723] Scala Benchmark Extension pack (#12758) * reflect the PR * add 1 more metric * Implement mkldnn convolution fusion and quantization. (#12530) * Implement mkldnn convolution fusion. Implement mkldnn convolution quantization. * Fix lint * Fix performance regression caused by mkldnn fallback. * clean up include * Fix msbuild on openmp pragma. * Fix quantization test, allow to use original op names as exclude layer for quantization. * Fix unittest. * Fix unittest * fix lint * Add post quantize fusion * add test case * add head license in test case * Remove GetBoolHash() * Remove mkldnn fallback change. * Address Haibin's comments. * Add TIsMKLDNN for _sg_mkldnn_conv temporarily. * Address reminisce's comments. * Handle the case that inplace fail. * pass unit test. * Add symbol api get_backend_symbol() * Retrigger ci * update the test case * Check subgraph index. * Use index as FAvoidQuantizeInput's parameter. * Add mkldnn_hwigo support as quantizaiton needs. * Address KellenSunderland's comments. * Handle input order change after subgraph pass. * Fix ci test * Introduction to Clojure-MXNet video link. (#12754) * [MXNET-915] Java Inference API core wrappers and tests (#12757) * Core Java API class commit * Update ScalaStyle max line length to 132 instead of 100 * Disabled flaky test: test_mkldnn.test_Deconvolution (#12770) * Add mkl-dnn to docker install method (#12643) * add mkl-dnn to docker install method * add mkl for gpu * add docker for windows * Improve mkldnn fallback. (#12663) * Fix regression in MKLDNN caused by PR 12019 (#12740) * add flag to elementwise_add * fix flatteng * retrigger * Fixed broken link for Baidu's WARP CTC (#12774) * Updated CONTRIBUTORS.md to include lebeg and gigasquid, moved mabreu to committers section (#12766) * Use modern onnx API to load model from file (#12777) * Update env_var.md (#12702) * fix cnn visualization tutorial (#12719) * [MXNET-979] Add fix_beta support in BatchNorm (#12625) * Add fix_beta support in BatchNorm CPU implementation * Fix lint checks. Update GPU tests * Fix gpu tests * make fix_beta not available for sparse. Update fix_beta for mkldnn * Make default fix_beta to False for backward compatibility * Add fix_beta to cudnn batchnorm operator * Add tests for missing fix_beta and fix_gamma params * fix indentation * Fix failing tests * simplify the cases with defaults for gamma, beta * [MXNET-947] Expand scala imclassification example with resnet (#12639) * [MXNET-947] Scala imclassification example with Resnet * R fix metric shape (#12776) * Revert "[MXNET-979] Add fix_beta support in BatchNorm (#12625)" (#12789) This reverts commit 0bab6d529343f0ce186859ba75c9bb02067e9cfe. Because master branch started to fail with this change. * Updated tvm submodule head (#12764) * Updated tvm submodule head * Remove FInplaceIdentity attr for cast and _backward_cast * Adagrad optimizer with row-wise learning rate (#12365) * Proximal Group Adagrad optimizer * Remove proximal implementation and rename to GroupAdagrad * Remove superfluous doc * Remove superfluous argument * Fix mismatch shapes (#12793) * mismatch shape switch * closing bracket * closing bracket * Make Gluon download function to be atomic (#12572) * use rename trick to achieve atomic write but didn't support python2 and windows * add test for multiprocess download * implement atomic_replace referred by https://github.com/untitaker/python-atomicwrites * change the number of testing process to 10 * add docstring and disable linter * half way to address some issue reviewer have * use warning instead of raise UserWarn * check for sha1 * Trigger CI * fix the logic of checking hash * refine the error message * add more comments and expose the error message to the user * delete trailing whitespace * rename _path_to_encode to _str_to_unicode * fix the error message bug and add remove when the movefile fail on windows * add remove temp file for non-windows os * handle the OSError caused by os.remove * Trigger CI * use finally to raise failure of atomic replace * add missing try except block for os.remove * add retries value to error message * Re-enables test_dropout (#12717) * [MXNET -1004] Poisson NegativeLog Likelihood loss (#12697) * PoissonNLLLoss function to compute negative log likelihood loss * Removing debugging print statements * Pylint code formatting problems addressed * Added Stirling approximation for factorial term in the denominator and test case for the same * Separated the test cases for Flag value for logits and compute_full * Added comments for package- numpy inclusion and some pylint formatting * Trigger CI * Markdown file updted. Added entry for Poissons NLLLoss * Fixing pending documentation issue * Documentation docstring changed * PR Comment to remove extra newline removed. * Symbol PI corrected * epsilon spellicng correction * More unit tests added - testing with mod.score() and mod.fit() * changed the number of epochs * PR Comments addressed added mod score tests and a newline * Empty line added * Adding hybridized test * Trigger CI * Variable names changed * Update osx.mk - Added "apple" to USE_BLAS comment (#12819) Added "apple" to USE_BLAS comment because it is one of the versions that are possible. Currently the comment only has "mkl, blas, atlas, openblas" that can be used * [MXNet-1002] Add GluonCV and NLP tookits, Keras, and developer wiki to navigation (#12704) * refactor and sync nav bar between desktop and mobile * update dev wiki url * bump file for CI * remove htaccess change from this pr * removing keras for now * bumping for CI * fixed symbols naming in RNNCell, LSTMCell, GRUCell (#12794) * fixed symbols naming in RNNCell and LSTMCell * fixed GRUCell as well * added test * fixed tests? * simplify mac mkldnn build (#12724) * remove guard that prevent omp flag in mac * udpate doc for mac make build * update docs * update readme * set opencv to 1 in instructions * remove disable opencv line * update mac docs * fix indent * Change the way NDArrayIter handle the last batch (#12545) * 1. move the shuffle to the reset 2. modify the roll_over behavior accordingly * refactor the concat part * refactor the code * implement unit test for last_batch_handle * refactor the getdata part * add docstring and refine the code according to linter * 1. add test case for NDArrayIter_h5py 2. refactor the implementation * update contributions doc * fix wording * update doc for roll_over * 1. add test for second iteration of roll_over 2. add shuffle test case * fix some wording and refine the variables naming * move utility function to new file * move utility function to io_utils.py * change shuffle function name to avoid redefining name * make io as a module * rename the utility functions * disable wildcard-import * fix the algorithm * refactor the code * test the NDArrayIter with different combinations of shuffle=True, data_source type and lables * add edge case of label data for csr NDArrayIter * trigger Travis CI * handle the 'list' of data source * check the list of data source * fix the extra blank * Trigger CI * add _ to the utility functions * Trigger CI * update several test cases * add test case for airbnb * fix the typo * fix wrong labels data shape * switch the order of condition to make more sense * [MXNET-707] Add unit test for mxnet to coreml converter (#11952) * Add unittest to coreml converter * Add unittest to coreml converter * Add docstring and remove unused method * updated test and removed unittest folder * remove unittest * Add coreml test to CI * fix lint * install mxnet-to-coreml for testing * exclude test that takes too long * linting to 100 max line width * Add embedding to print_summary (#12796) * Scala Docs - Replace old Symbol api usages (#12759) * [MXNET-892] ONNX export/import: DepthToSpace, SpaceToDepth operators (#12731) * ONNX export/import: DepthToSpace operator * ONNX import/export: SpaceToDepth operator * ONNX import/export: Tests for SpaceToDepth * R install instructions update for macOS (#12832) * add prereqs for R installation on Mac * pin openblas for mac R install to 0.3.1 * Fixed __setattr__ method of _MXClassPropertyMetaClass (#12811) * fixed indentation * simplified code * Fixed regex for matching platform type in Scala Benchmark scripts (#12826) * Added context object to run TestCharRnn example (#12841) * [MXNET-703] Show perf info for TensorRT during tests (#12656) This PR makes sure perf information printed during TensorRT test runs is correctly displayed when run in CI. * Update Operator Implementation Tutorial (#12230) * update op creation docs * add flakiness checker and link to gradient checking * address comments * update reference line number * fix comments * Fix broken links (#12856) * Fix Flaky Topk (#12798) * fix flaky topk * try to fix * remove the usage of IndexFill * fix * add docstring * Add Psroipooling CPU implementation (#12738) * add psroipooling cpu impl * minor fix * revert copyright * fix testcase * add openmp * no openmp for backward * ONNX export: Fully connected operator w/o bias, ReduceSum, Square (#12646) * ONNX export: Fully connected operator with no bias * ONNX export: Helper function to convert bool string attributes to int * ONNX export: ReduceSum operator * ONNX import/export: Make pow backward compatible * ONNX export: Square operator * Undefined name: load_model() --> utils.load_model() (#12867) * Undefined name: load_model() --> utils.load_model() As discussed at: * https://github.com/apache/incubator-mxnet/commit/815f36ce8b4ed16fe27d500f5c8c930cd10cee5c#r30956015 * Force a rebuild * Force a rebuild * ONNX export/import: Selu (#12785) * Sparse support for logic ops (#12860) * remove check * fix lint * fix gpu build * add a tutorial for the subgraph API. (#12698) * add tutorial. * update. * update. * update. * add test. * fix subgraph test. * update. * update. * update. * add comments. * remove test. * update image path. * update. * update. * update. * fix lint. * add link. * fix lint. * MKL-DNN Quantization Examples and README (#12808) * add gluoncv support * add ssd readme * improve ssd readme * add custom readme * add ssd model link * add squeezenet * add ssd quantization script * fix topo of args * improve custom readme * fix topo bug * fix squeezenet * add squeezenet accuracy * Add initializer for min max to support quantization * add dummy data inference * add test case for init_param * add subgraph docs * improve docs * add two models and fix default rgb_std to 1 * fix doc link * improve MKLDNN_README * add quantization for mobilenetv1 * fix ssd benchmark_score label shapes * add resnet101_v1 and inceptionv3 support * Refine some descriptions in the MKLDNN_README * improve docs * improve link in perf.md * [MXNET-1033] Fix a bug in MultiboxTarget GPU implementation (#12840) * remove num_labels check in multibox_target * add unit test * test both cpu and gpu * add contrib operator to GPU unit test * do not test all contrib operator in gpu * [MXNET-1107] Fix CPUPinned unexpected behaviour (#12031) * Fix CPUPinned unexpected behaviour * fix lint * add guards * Actually, this may affect perf * trigger ci * fix lint * fix documentation * fix for dist_sync_device * add guard * fix bug with memory * try fix for gluon mp interaction * blah * trigger jenkins * Try fix for gluon multiprocessing bug Thanks Nvidia! * edit * try nvidia fix * address Haibin and Lin's comments * get rid of blank line in Makefile * NativeResource Management in Scala (#12647) * add Generic MXNetHandle trait and MXNetHandlePhantomRef class that will be used by all MXNetObjects * Generic Handle with AutoCloseable * add NativeResource and NativeResourceManager with Periodic GC calling * use NativeResource trait in NDArray, Symbol and Executor * add run train mnist script * create a Generic ResourceScope that can collect all NativeResources to dispose at the end * modify NativeResource and ResourceScope, extend NativeResource in NDArray, Symbol and Executor * remove GCExecutor * deRegister PhantomReferences by when calling dispose() * add Finalizer(temporary) to NativeResource * refactor NativeResource.dispose() method * update NativeResource/add Unit Test for NativeResource * updates to NativeResource/NativeResourceRef and unit tests to NativeResource * remove redundant code added because of the object equality that was needed * add ResourceScope * Fix NativeResource to not remove from Scope, add Unit Tests to ResourceScope * cleanup log/print debug statements * use TreeSet inplace of ArrayBuffer to speedup removal of resources from ResourceScope Fix Executor dispose and make KVStore a NativeResource * fix segfault that was happening because of NDArray creation on the fly in Optimizer * Add comments for dispose(param:Boolean) * add/update infer_range docs (#12879) * Fix __all__ in optimizer/optimizer.py (#12886) * Add index_copy() operator (#12810) * add index_copy operator * add index_copy op * update index_copy op * add unittest for index_copy() * update index_copy * update index_copy * use mxnet_op::copy * update index_copy * update index_copy * update index_copy * update index_copy test * update index_copy test * sparse support for take(csr, axis=0) (#12889) * initial commit * add test cases for mode * fix bug * add comment * more comments * Add more models to benchmark_score (#12780) * add models to cnn benchmark * improve benchmark score * add benchmark_gluon * improve lint * improve lint * add licsence for script * improve script lint * mv benchmark_gluon to new location * support multi-gpus * Add a new parameter 'global batchsize' for the batch size multiplication for multi-gpu case * add batch size argument help * improve help and change default batchsize * simplify benchmark_gluon * [MXNET-1025] Add Jetpack 3.3 support to Jetson (#12735) * Fix Batch input issue with Scala Benchmark (#12848) * add initial change * add fix * improved usage of Shape as well as warning message on performance * change into parallel * drop dropBack * apply Andrew's comments * remove add dim inside img 2 pixel * addressed Naveen's comment * update comments * fix type inference in index_copy. (#12890) * Extending the DCGAN example implemented by gluon API to provide a more straight-forward evaluation on the generated image (#12790) * add inception_score to metric dcgan model * Update README.md * add two pic * updata readme * updata * Update README.md * add license * refine1 * refine2 * refine3 * fix review comments * Update README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * Update example/gluon/DCGAN/README.md * modify sn_gan file links to DCGAN * update pic links to web-data * update the pic path of readme.md * rm folder pic/, and related links update to https://github.com/dmlc/web-data/mxnet/example/gluon/DCGAN/ * Update README.md * [MXNET-674] Speed up GPU builds in CI (#12782) * [MXNET-674] Speed up GPU builds in CI * [MXNET-674] Refactor SMs into shell variable * [MXNET-674] Build CMake GPU CI jobs without PTX * [MXNET-793] ★ Virtualized testing in CI with QEMU ★ (#12094) * virtual testing with qemu * Add install procedure * update installation * Refine test run * use direct ssh * update readme * Fix uneccesary cp * Minor refinements * Refine error conditions in startup * requirements installed inside QEMU * Update base image * Fix license * Dockerfile rename fallout * license fixes * refine documentation * license fix * update readme * Update qemu base image and refine documentation * Address CR comments wrt shebangs. * Address CR comments wrt comments. * adjust vda2 -> vda1 * Disable SMP, bug with newer kernel * Remove commented out code * Fix licenses * CR comments addressed * increase ram to 4096mb * Revert dockerfile renaming * Fix undo rename of dockerfiles * Address CR comments * CR * [MXNET-1017] Updating the readme file for cpp-package and adding readme file for example directory. (#12773) * Updating the readme file for cpp-package and adding readme file for example directory. * Updating the readme file for cpp-package and adding readme file for example directory. * Addressed the review comments. * Addressed the review comments * Fail the broken link job when broken links are found (#12905) * Fix typo in formula in docstring for GRU cell and layer and add clarification to description (gluon.rnn) (#12896) * Fix typo in GRU cell and layers (gluon.rnn) docstring * empty * fix the paths issue for downloading script (#12913) * Ignore generated scala files. (#12928) * use ResourceScope in Model/Trainer/FeedForward.scala (#12882) * use ResourceScope in Model/Trainer/FeedForward.scala * add moveToOuterScope public method to move resources to a outerScope if it exists * fix memory leak in FeedForward.scala by making it a native resource and disposing argparams, auxParams in dispose() method * Disabled flaky test: test_gluon_gpu.test_slice_batchnorm_reshape_batchnorm (#12768) * Fix the operator API documentation (#12942) * Fix the operator API documentation * update message * fix indpt[0] for take(csr) (#12927) * getnnz operator for CSR matrix (#12908) * nnz * update err msg * skip nnz test on gpu * fix broken docs (#12871) * Add bytearray support back to imdecode (#12855, #12868) (#12912) 1. Avoid raise exception when input is bytearray. 2. Avoid OpenCV crash for empty input. 3. Added unittests. * Update tree lstm example (#12960) * update tree lstm example * update README.md * Update README.md * Update bilstm integer array sorting example (#12929) * Update the bilstm example to Gluon * Update formating * Update example/vae/VAE_example.ipynb Co-Authored-By: ThomasDelteil <thomas.delteil1@gmail.com> * Fix the bug of assigning large integer to NDArray (#12921) * remove num_labels check in multibox_target * add unit test * test both cpu and gpu * add contrib operator to GPU unit test * do not test all contrib operator in gpu * Fix the large int assign problem * Refactor mkldnn test files (#12410) * move mkldnn helper funcs to diff file * create test file to test helper functions * update comments in header * move helpers into include dir * fix lint * update comment * add stdlib headers * remove unused headers * add endif * add missing header * add inlines * fix lint * move copyfrom test to mkldnn_test * CudnnFind() usage improvements (#12804) * Add mx.context.gpu_memory_info() to python api for flexible tests. * Add test_gluon_gpu.py:test_large_models to show cudnnFind headroom issue. * Output model sizes tried by test_gluon_gpu.py:test_large_models. * Fix perl interface to MXGetGPUMemoryInformation. * Increase difficulty of test_gluon_gpu.py:test_large_models. * Forgot a file in fix for perl. * Modify test to pass on no-cudnn CI runner. * Mutex algo reg updates, serialize cudnnFind calls. * Fix for cudnnFind memory headroom issue. * Fix cpplint. * Respond to reviewers comments. * Guard against improper MXNET_GPU_MEM_LARGE_ALLOC_ROUND_SIZE values. * Fix potentially unassigned var. * fix mac r install and windows python build from source docs (#12919) * fix mac r install and windows python build from source docs * reorder macos r install instructions * enable batchnorm unit tests (#12986) * enable bn unit tests * travis timed out, trigger ci * Update CONTRIBUTORS.md (#12996) I have made two minor contributions with pull requests so far. I forgot to add my name here earlier. * fix Sphinx errors for tutorials and install ToCs (#12945) * missing line break fix for tutorials toc * fix the install index toc errors * [MXNET -1030] Cosine Embedding Loss (#12750) * COsine Embedding Loss function added * Added unit tests for Cosine Embedding Loss Function * Added Latex code for formula for cosine embedding loss * Fixing document rendering * Fixing documentation issue * PR Comments addressed for using F (NDArray or Symbol) to calculate norm, renaming parameters * Markdown file updated. Added entry for CosineEmbeddingLoss * Added a line after .. math:: to fix documentation * Documentation check - pylint fix * Formula update * Making the formula simpler for correct rendering incrementally - Update 1 * Making the formula simpler for correct rendering incrementally - Update 2 * Making the formula simpler for correct rendering incrementally - Update 3 * Making the formula simpler for correct rendering incrementally - Update 4 * Making the formula simpler for correct rendering incrementally - Update 5 * Trigger CI * making the utility function cosine similarity internal * Added a test case for label = -1, for dissimilar vectors * Refactored names of parameters to the loss functions and updated the formula in docstring * PR comments addressed changes in documentation * Added random input vectors and labelled tests * Renaming variables * Pylint issues fixed * Resolving conflicts * Pylint issues fixed * Style issues fixed trailing whitespaces removed * Review comment addressed, sample_weight added in the parameter * Trigger CI * Reordered Parameter description * comments addressed - spelling errors * nit comments addressed * Trigger CI * Trugger CI * Trigger CI * Trigger CI * [MXNET-1173] Debug operators - isfinite, isinf and isnan (#12967) * is_finite and is_inf implementation for front-end python api debug operator * updated unit-tests * updated test cases and incorporated is_nan function * solved index out of bounds issue and added comments * simplified abs function call and added isnan to contrib.py and all debug ops to doc * changed dimensions, added regular number, assert_equal instead of almost, removed ctx and added data.abs * [MXNET-1111] Remove CPUPinned in ImageRecordIter (#12666) * squash commit * get rid of argument * undo a lot of unnecessary changes * undo more changes * fix typo * fix lint * address comments and fix rebase mistake * fix typo made during rebase * revert cpu_pinned * revert changes, because works without needing to copy params to GPU. thanks @yuxihu for testing and @apeforest for raising this issue! * revert changes to comm and nccl * Added/changed file_name, brief description comments in some files (#13033) * sample_like operators (#13034) * [MXNET-1179] Enforce deterministic algorithms in convolution layers (#12992) * add env variable to choose deterministic cudnn alg * set default value to false * fix build failure in Windows GPU * revert the previous change * only check determinism in CUDNN 7.x release * Add cudnn version check * fix lint error * Add a deprecate message (#13042) * Fix the operator API documentation * update message * deprecate old command * Disable flaky test test_operator.test_dropout (#13057) * Disable flaky test test_prelu (#13060) * la_op_inline.h to la_op-inl.h for consistency (#13045) * la_op_inline.h to la_op-inl.h for consistency * operator/tensor left-over doc changes * Improve clojure tutorial (#12974) * Switch tutorial to dependency/ies that exist on Maven * Improve Clojure Module tutorial * Add namespace docstring * Bring verbiage up to date with https://mxnet.incubator.apache.org/api/clojure/module.html * Add newlines for readability and to keep line length <80 * Nix duplicated section in Clojure Symbol API docs "Multiple Outputs" is a (deprecated) repeat of "Group Multiple Symbols". * Improve Clojure Symbol tutorial * Add namespace docstring * Bring verbiage up to date with https://mxnet.incubator.apache.org/api/clojure/symbol.html * Add newlines for readability and to keep line length <80 * Fix missing end-code-block in Clojure NDArray API docs * Improve Clojure NDArray tutorial * Add namespace docstring * Bring verbiage up to date with https://mxnet.incubator.apache.org/api/clojure/ndarray.html * Add newlines for readability and to keep line length <80 * Improve Clojure KVStore tutorial * Add namespace docstring * Bring verbiage up to date with https://mxnet.incubator.apache.org/api/clojure/kvstore.html * Add newlines for readability and to keep line length <80 * [MXNET-1017] Updating the readme file for cpp-package and adding readme file for example directory. (#12773) * Updating the readme file for cpp-package and adding readme file for example directory. * Updating the readme file for cpp-package and adding readme file for example directory. * Addressed the review comments. * Addressed the review comments * Fail the broken link job when broken links are found (#12905) * Fix typo in formula in docstring for GRU cell and layer and add clarification to description (gluon.rnn) (#12896) * Fix typo in GRU cell and layers (gluon.rnn) docstring * empty * fix the paths issue for downloading script (#12913) * removed unused header (#13066) * Moves f16c autodetection to its own cmake module (#12331) * Set correct update on kvstore flag in dist_device_sync mode (#12786) * Set correct update on kvstore flag in dist_device_sync mode * Add warning message for batch-size change in dist mode * Empty commit * Fix lint issues * ONNX export: Cleanup (#12878) * ONNX export: Cleanup input retrieval - Create a common function to get inputs for conversion functions - Do not register functions if onnx is not found * ONNX export: Add helper for creating node * Maven Surefire bug workaround (#13081) * remove legacy installation of Roxygen2 5.0 and add R-specific clean target (#12993) (#12998) * remove installation of legacy Roxygen2 vers. 5.0 * add R-specific clean target (#12993) * fixup! remove installation of legacy Roxygen2 vers. 5.0 * fixup! remove installation of legacy Roxygen2 vers. 5.0 * Gluon LSTM Projection and Clipping Support (#13056) * support projection in LSTM * add tests * update rnn to use cudnn ex * extend cudnn test to handle different versions * add lstm clip * use CUDNN_VERSION * merge USE_CUDNN_LSTM_CLIP and USE_CUDNN_LSTM_PROJ * assign false value to clip nan explicitly to RNN and GRU * update test * fix readme (#13082) * [MXNET-1180] Scala Image API (#12995) * add image and image suite * apply toImage function and tests * bug fix * apply the commented change * add test to apply border * fix scalastyle * [MXNET-793] Virtual testing with Qemu, refinement and extract test results to root MXNet folder (#13065) * Improve Qemu infrastructure Add documentation about running it interactively * Separate provision * Improve provisioning * Refine provisioning and interactive * Cant provision when the volumes arent mounted * Fix running tests * raise log output to INFO * adjust logging * flush stdout and stderr * Refine by copying test results back to the host * Fix license * remove config file and different way to run QEMU * remove config file and different way to run QEMU, remove ansible * Updated / Deleted some examples (#12968) * Updated / Deleted some examples * remove onnx test * remove onnx test * Fix variable name in tutorial code snippet (#13052) Fixes incorrect variable name in tutorial code as raised in issue https://github.com/apache/incubator-mxnet/issues/13051 * customized take forward for CPU (#12997) * Update module example (#12961) * Update Module example * trigger CI * ONNX export: Scalar, Reshape - Set appropriate tensor type (#13067) np.array sets default dtype to float64 which is not supported by ONNX. Setting these to appropriate type. * Fix example for mxnet.nd.contrib.cond and fix typo in src/engine (#12954) * fix typo in src/engine * fix example for mx.nd.contrib.cond * Improve the Clojure Package README to Make it Easier to Get Started (#12881) * Improve the README and make it easier to get started * Implement feedback from @ChaiBapchya and @daveliepmann * combined deps * Add wget Co-Authored-By: gigasquid <cmeier@gigasquidsoftware.com> * WIP: update readme * WIP: readme option 3 * Add section links to Clojure README Link each install option with the corresponding README section containing instructions for that option. An existing link to Maven search is removed because it interferes with the section links and it is replicated in the Option 1 instructions below. Per my PR suggestion: https://github.com/apache/incubator-mxnet/pull/12881/files/22bbe55d8d62be9ff3aebf693f73fa6049afc01d#r226822148 * fix typo Co-Authored-By: gigasquid <cmeier@gigasquidsoftware.com> * fix formatting Co-Authored-By: gigasquid <cmeier@gigasquidsoftware.com> * fix formatting Co-Authored-By: gigasquid <cmeier@gigasquidsoftware.com> * fix link Co-Authored-By: gigasquid <cmeier@gigasquidsoftware.com> * Some more updates for the Clojure README * [MXNET-918] Introduce Random module / Refact code generation (#13038) * refactor code gen * remove xxxAPIMacroBase (overkill) * CI errors / scala-style * PR review comments * Fix a typo in operator guide (#13115) * Fix the operator API documentation * update message * deprecate old command * fix typo in op guide * [Issue #11912] throw mxnet exceptions when decoding invalid images. (#12999) * Raise an excption when passing an empty buffer to imdecode. * src/io/image_io.cc: Check the length of the input buffer. * tests/python/unittest/test_image.py: Update the (already existing) test to expect a mx.base.MXNetError. * Raise an exception when passing an invalid data buffer to imdecode. * src/io/image_io.cc: Raise an exception when the image could not be decoded instead of just logging. * tests/python/unittest/test_image.py: Add a new test test_imdecode_invalid_image. * Raise an exception when passing an invalid data buffer to imdecode. * src/io/image_io.cc: Raise an exception when the image could not be decoded instead of just logging. * tests/python/unittest/test_image.py: Add a new test test_imdecode_invalid_image. * Rollback a "empty buffer" check in the image python bindings that's now more generally handled in the core code. * python/mxnet/image/image.py: remove buffer length check. * Update adversary attack generation example (#12918) * Fix adversary example generation * Update README.md * Fix test_utils.list_gpus() * fix unused variable * Disable travis tests (#13137) * Update Gluon example folder (#12951) * Reorganized the Gluon folder in example * trigger CI * update reference * fix out of place accumulation * Document the newly added env variable (#13049) * add env variable to choose deterministic cudnn alg * set default value to false * fix build failure in Windows GPU * revert the previous change * only check determinism in CUDNN 7.x release * Add cudnn version check * fix lint error * document env variable MXNET_ENFORCE_DETERMINISM * use cudnnGet instead of cudnnFind when determinism required * Revert "use cudnnGet instead of cudnnFind when determinism required" This reverts commit d1bdf0f38f50b8c499f22ae1d50770b819f27678. * Updated CONTRIBUTORS.md to include mxnet-label-bot (#13048) * Updated CONTRIBUTERS.md to include label-bot * Created section for label bot and included wiki page * Moved Label Bot section in CONTRIBUTORS.md file to a more convenient location * Retriggering * Fix docker cleanup race condition (#13092) * Improved git reset for CI builds (#12784) * Refactor L2_normalization (#13059) * Refactor L2_normalization * Fix windows build * Fix windows build * Move cpu optimization into l2_normalization.cc * Retrigger CI * Retrigger CI * Fix variational autoencoder example (#12880) * Add documentation on GPU performance on Quantization example (#13145) * Add documentation on GPU performance * Update README.md * [MXNET-1194] Reenable nightly tutorials tests for Python2 and Python3 (#13099) * Reenable nightly tests tutorials * small fix to settings * optimize a few more tutorials * Update tests * Update runtime_functions.sh * Update fine_tuning_gluon.md * Update JenkinsfileForBinaries * Update JenkinsfileForBinaries * remove coverage * Update dec example (#12950) * update dec example * trigger CI * update to remove dependency on sklearn data * Update MKL-DNN dependency (#12953) * update mkldnn and fix conv/deconv * fix * fix indent * fix cmake * fix cmake * fix cpp test for mkldnn * fix typo * fix conficts after merge * debug: remove 5d test * debug: remove 4d test * add comments * debug: remove 2d test * update mklml in ci * fix mklml * Revert "fix mklml" This reverts commit 328a22a373c49aacb914badd0db431bfbc8234f3. * Revert "update mklml in ci" This reverts commit 9ff3687892f85f43b8eac72ba935ceda928ae7e8. * Revert "debug: remove 2d test" This reverts commit 32551b3662fc30d5c9758a86c7664b4f2e367128. * Revert "debug: remove 4d test" This reverts commit 5412d643c2b00ce54c05e7387aca6779dee120d5. * Revert "debug: remove 5d test" This reverts commit 1fe9f8806d29c765e05f91c584799a947af2eb1d. * debug illegal core dump * debug illegal core dump * Revert "debug illegal core dump" This reverts commit 39321d578ae589465c0d4edcae7f92b88fdf3feb. * Revert "debug illegal core dump" This reverts commit 153b068b6d3a18a33f399076d3420ac42f2bc387. * change cmake * pin mkldnn version to 0.17rc * change format number * remove include directories in cmake * fix cpp test * address cpplint complaint * remove comment code * update mkldnn head * License header (#13178) * Minor fix to license_header documentation * Handle UnicodeError when checking license * Updated capsnet example (#12934) * Updated capsnet * trigger CI * Update README.md * Updates to several examples (#13068) * Minor updates to several examples * fix typo * update following review * Fix Sphinx python docstring formatting error. (#13177) * [Doc] Fix repo paths in Ubuntu build doc (#13101) * [Doc] Fix repo paths in Ubuntu build doc * [Doc] Use relative path in Ubuntu build doc * Update scala intellij tutorial (#12827) * Update scala intellij tutorial Update mxnet version log4j fixes Instructions from source * Remove version numbers and various improvements * Improve cpp-package example project build files. (#13093) 1. Change output to build folder. 2. Remove files that not been deleted by make clean. * Fix Sphinx document parsing error. (#13195) Fixes #12935 * Fix #13090, Add image.imread to python API doc. (#13176) * Fix Sphinx docstring formatting error. (#13004, #13005, #13006) (#13175) * Fix #12944, Fix Sphinx python docstring formatting error. (#13174) * Fix #13013, Fix Sphinx python docstring error. (#13173) * update the README (#13186) * Fixed Sparse astype doc string formatting error (#13171) * Fix problem with some OSX not handling the cast on imDecode (#13207) * Port of scala Image API to clojure (#13107) * Port of scala Image API to clojure * Minor style changes * Add specs and other minor fixes * Fix unit tests (:facepalm:) * Fixed Documentation issues (#13215) 1. mxnet.metric.EvalMetric.get_config doc error 2. mxnet.module.SequentialModule.add doc error * update the doc (#13205) * Fix Sphinx doc errors (#13170) * Fix Sphinx python docstring error: initializer.InitDesc (#12939) (#13148) * Fix Sphinx python docstring error: text contrib module (#12949) (#13149) * Sphinx failure fixes (#13213) * [MXNET-793] Virtualized ARMv7 with Qemu CI integration (#13203) * Testing just ndarray, since otherwise we require test refactoring which will be done later * Add QEMU ARMv7 test stage to CI * test_ndarray fails, so change for test_engine until UT are fixed in ARM * Refactor kvstore test (#13140) * Refactor kvstore test * Fix pylint * Fix problem with some OSX not handling the cast on imDecode (#13207) * Fix num_gpus * remove unused variable rotateM_ (#10803) * Revert "Sphinx failure fixes" (#13230) * Revert "Refactor kvstore test (#13140)" This reverts commit d8d2d6ef3d688a465e47f7170c2a11da804c2835. * Revert "[MXNET-793] Virtualized ARMv7 with Qemu CI integration (#13203)" This reverts commit fd3dedc621919b6fee7d8ca7fa2a85749e190907. * Revert "Sphinx failure fixes (#13213)" This reverts commit 2e4d6c8c1064b74d4e1c1b3441c2ecf12b81c6e2. * [MXNET-953] Fix oob memory read (#12631) * update log4j version of Scala package (#13131) * Disable Flaky test test_operator.test_clip (#12902) * Update multi-task learning example (#12964) * Update multi task learning example * Updating README.md * Update MKLML dependency (#13181) * update mkml * refine DownloadMKLML.cmake * merge DownloadMKLML.cmake from #11148 * fix mkldnn release version * fix windows compilation * Add --no-cache option to build.py when building containers (#13182) Add functionality to build.py to disable caching * Tool to ease compilation and reproduction of test results (#13202) * Add tool to simplify reproducing tests * add local build * Add cmake_options.yaml * minor * Fix license * Fix licenses * Rename file, address CR comments about gpu build function * Address Marco's comments * support for upper triangular matrices in linalg (#12904) * Fix Sphinx python docstrings (#13160) * Doc fixes * addressing feedback * base_module fix * fixing cross-reference issues * Implemented a regression unit test for #11793 (#12975) When using C++-based iterators, it's important that only a single batch is referenced at a time. Because C++ iterators are exposed to the Python code through a C API, there is no concept of reference counting. Hence, typically C++ iterators will deallocate a batch when next() is called on them. So, we need to make sure the Python code only references a single batch at a time, otherwise the Python code will attempt to access freed memory, resulting in either (a) garbage accuracy or (b) a segmentation fault. The test passes with the latest mxnet build. I verified it failed on previous releases, such as mxnet==1.2.0. * Add Java API docs generation (#13071) * add Java API docs generation; split out from Scala API docs * bumping file for ci * make scala docs build compatible for 2.11.x and 2.12.x scala fix typo * fix exit bug * Fix Sphinx error in ONNX file (#13251) * [Example] Fixing Gradcam implementation (#13196) * fixing gradcam * changed loading parameters code * fixing type conversions issue with previous versions of matplotlib * Fix test failure due to hybridize call in test_gluon_rnn.test_layer_fill_shape (#13043) * Restore hybridize call in test_gluon_rnn.test_layer_fill_shape * reset bulk_size when cached op forward hit error to fix the test failure * add try-catch block to reset bulk_size in more places to prevent potential bugs * more cleanup upon exception in Imperative::Backward * Addressed sphinx build issue (#13246) * Add gauss err function operator (#13229) * erf register gpu * add doc * Add Turing and Volta support to arch_name (#13168) * Bugfix in ci/docker_cache.py (#13249) * Fix scaladoc build errors (#13189) * Fix scaladoc errors from missing classpath Remove duplicate scalastyle plugin * Fix scaladoc warnings Also enable and fix all feature and deprecation warnings * Add missing documentations for getnnz (#13128) * Addressed ONNX module documentation warnings and added notes for short-form representation (#13259) * Manually track num_max_thread (#12380) * use cached version of get thread max * reserve core affects omp singleton * omp_thread_max_ updated in one line * remove enabled block * add brackets * re-add excluded reserved * add missing var * refactor macro * adding unit test for MKLDNN FullyConnected operator (#12985) * adding unit test for MKLDNN FullyConnected operator * removing mkldnn filter * removing mkldnn filter * Doc fixes (#13256) * fix train mnist for inception-bn and resnet (#13239) * Fix a bug in index_copy (#13218) * fix. * add test. * retrigger * Addressed doc issues (#13165) * Addressed doc issues * Update optimizer.py * Force APT cache update before executing install (#13285) * [Example] Gradcam consolidation in tutorial (#13255) * fixing gradcam * changed loading parameters code * fixing type conversions issue with previous versions of matplotlib * gradcam consolidation * creating directory structures in utils * changing location * empty commit * [MXNET-1203] Tutorial infogan (#13144) * Adding info_gan example * adjust paths of filenames * Update index.md * Update index.md * Update index.md * Update info_gan.md Added an image * Update info_gan.md Applied some fixes * Update info_gan.md Applied some fixes * Update info_gan.md Applied some fixes * Update info_gan.md * Updated index.md file * Updated index.md file * change links * Fixed typo * Delete Untitled.ipynb * Adding Vishaals comments * Adding Anirudh's comments * Fixed some bugs * Adding Anirudh's comments * some minor fixes * Remove obsolete memory cost example (#13235) * stop gap fix to let website builds through; scaladoc fix pending (#13298) * Fix Sphinx errors in box_nms (#13261) * Fix Sphinx errors (#13252) * Sphinx errors in Gluon (#13275) * Fix Sphinx python docstring formatting error. (#13194) * Fix Sphinx python docstring formatting error (#13021). Fixes #13021 * Update src/operator/nn/batch_norm.cc Co-Authored-By: frankfliu <frankfliu2000@gmail.com> * Visualization doc fix. Added notes for shortform (#13291) * Addressed "dumplicate object reference" issues (#13214) * Update basic_layers.py (#13299) * add url and license to clojure package project (#13304) * [Example] Add docstring for test optimizer and test score (#13286) * update the doc for test_optimizer * add docstring for test_score * [Example] Update cpp example README (#13280) * update the README to solve the library cannot find problem * fix the broken format * remove redundancy and broken format * add . * [Example]update NER example readme on module prediction (#13184) * update readme on module prediction * fix typo * update url * improve grammar * update link * [MXNET-1198] MXNet Java API (#13162) * [MXNET-984] Add Java NDArray and introduce Java Operator Builder class (#12816) * clean history and add commit * add lint header * bypass the java unittest when make the package * clean up redundant test * clean spacing issue * revert the change * clean up * cleanup the JMacros * adding line escape * revert some changes and fix scala style * fixes regarding to Naveen's comment * Java Inference api and SSD example (#12830) * New Java inference API and SSD example * Adding license to java files and fixing SSD example * Fixing SSD example to point to ObjectDetector instead of ImageClassifier * Make scripts for object detector independent to os and hw cpu/gpu * Added API Docs to Java Inference API. Small fixes for PR * Cosmetic updates for API DOCS requested during PR * Attempt to fix the CI Javafx compiler issue * Migrate from Javafx to apache commons for Pair implementation * Removing javafx from pom file * Fixes to appease the ScalaStyle deity * Minor fix in SSD script and Readme * Added ObjectDetectorOutput which is a POJO for Object Detector to simplify the return type * Removing Apache Commons Immutable Pair * Adding license to new file * Minor style fixes * minor style fix * Updating to be in scala style and not explicitly declare some unnecessary variables * NativeResource Management in Scala (#12647) (#12883) * add Generic MXNetHandle trait and MXNetHandlePhantomRef class that will be used by all MXNetObjects * Generic Handle with AutoCloseable * add NativeResource and NativeResourceManager with Periodic GC calling * use NativeResource trait in NDArray, Symbol and Executor * add run train mnist script * create a Generic ResourceScope that can collect all NativeResources to dispose at the end * modify NativeResource and ResourceScope, extend NativeResource in NDArray, Symbol and Executor * remove GCExecutor * deRegister PhantomReferences by when calling dispose() * add Finalizer(temporary) to NativeResource * refactor NativeResource.dispose() method * update NativeResource/add Unit Test for NativeResource * updates to NativeResource/NativeResourceRef and unit tests to NativeResource * remove redundant code added because of the object equality that was needed * add ResourceScope * Fix NativeResource to not remove from Scope, add Unit Tests to ResourceScope * cleanup log/print debug statements * use TreeSet inplace of ArrayBuffer to speedup removal of resources from ResourceScope Fix Executor dispose and make KVStore a NativeResource * fix segfault that was happening because of NDArray creation on the fly in Optimizer * Add comments for dispose(param:Boolean) * Added unit tests for Resource Scope in Java (#12955) * Bumping down minimum java support from 8 to 7 (#12965) * [MXNET-984] Java NDArray Documentation Generation (#12835) * cherry pick javaDoc changes * update NDArray changes * refactoring change and merge all docGen in a single place * clean the scalastyle * take on Piyush nit * drop the comments * First pass at adding JavaDocs for new java api classes (#12963) * First pass at adding JavaDocs for new java api classes * Fix a scalastyle issue * Updating JavaDoc based on feedback * [MXNET-1160] add Java build/run example (#12969) * add example * clean up nit * find the pain point * add java tut into whitelist * Trigger CI * add java demo and split scala demo * address the comments * change the examples * fix the wrong configuration * Maven Surefire bug workaround (#13097) * use ResourceScope in Model/Trainer/FeedForward.scala (#12882) (#13164) * use ResourceScope in Model/Trainer/FeedForward.scala * add moveToOuterScope public method to move resources to a outerScope if it exists * fix memory leak in FeedForward.scala by making it a native resource and disposing argparams, auxParams in dispose() method * [MXNET-1187] Added Tutorial for Java under mxnet.io/docs/tutorials (#13183) * Added tutorial for Java installation on IntelliJ for mxnet.io website * Added correct image resources * Removed spurious quotes * Added java tutorial to whitelisting * Added community download edition link to intelliJ section * [MXNET-1202] Change Builder class into a better way (#13159) * applying changes for Builder functions * simplify the code structure * update docgen * follow Naveen's suggestion * apply comments to Param * clean up param build * change on the comments * add one description line * [MXNET-1041] Add Java benchmark (#13095) * add java benchmark * applied changes based on Piyush comments * applies Andrew's change * fix clojure test issue * update the statistic names * follow Naveen's instruction * [MXNET-918] [Introduce Random module / Refact code generation (#13038)][Cherry pick] (#13242) * [MXNET-918] Introduce Random module / Refact code generation (#13038) * refactor code gen * remove xxxAPIMacroBase (overkill) * CI errors / scala-style * PR review comments * clean up the duplicated code * add comments * Fixed missing break statement (#13257) * Java Benchmark failure (#13258) * patch fix * update ignore * rename getContext to bindToDevice * Update JavaBenchmark.java * Addressing PR feedback for merging Java API into master (#13277) * Addressing PR feedback for merging Java API into master * Changed constructors to package private instead of private * clean up the NDArray follow the comments (#13281) * [MXNET-1181] Added command line alternative to IntelliJ in install instructions (#13267) * Added command line alternative to IntelliJ * Removed the duplicate file * Fixed typos * Fixed minor command issue * add defaults and clean up the tests (#13295) * [MXNET-1187] Added Java SSD Inference Tutorial for website (#13201) * Added Java SSD Inference Tutorial for website * Added whitelisting to SSD tutorial * Address PR feedback * Marking intelliJ as optional * [MXNET-1182] Predictor example (#13237) * add initial commit * push back predictor * name fix and bug fix * update readme and script to run * minor fix * minor fix * fix on doc * update predictor * Reducing the length of setup tutorial (#13306) * enabling test_dropout after fixing flaky issue (#13276) * enabling test_dropout after fixing flaky issue * adding a check for positive seed * fix the flag (#13293) * Made fixes to sparse.py and sparse.md (#13305) * Fix descriptions in scaladocs for macro ndarray/sybmol APIs (#13210) * [Example] Gradcam- Fixing a link (#13307) * fixing gradcam * changed loading parameters code * fixing type conversions issue with previous versions of matplotlib * gradcam consolidation * creating directory structures in utils * changing location * empty commit * fix file lock issue * fix link * removing other commits * remove commit * Updated the Instructions for use of the label bot (#13192) * Updated Instructions for Label Bot * Updated instructions for mxnet-label-bot * Including myself as a contributor * Clarified usage of label bot * Fixed typos and instructions/examples have been made more clear * Added link for available labels * [MXNET-33] Enhance mkldnn pooling to support full convention (#11047) * fix mkldnn pooling to support full convention * backward with full convention * fix * add pooling test for full convention * add function for computing padding size * fix unit test * only support max-pooling * fix pooling bwd * address review comment * [MXNET-1213] add Cent OS build for Scala (#13279) * add centos build for Scala * migrate the build portion to docker * update build script and chmod +x * address Jenkins change * allow CentOS provide all depdencies * fix file lock issue (#13296) * modify code for working in gpu context. (#13302)

* Refactor kvstore test * Fix pylint * Fix problem with some OSX not handling the cast on imDecode (apache#13207) * Fix num_gpus

* Revert "Refactor kvstore test (apache#13140)" This reverts commit d8d2d6e. * Revert "[MXNET-793] Virtualized ARMv7 with Qemu CI integration (apache#13203)" This reverts commit fd3dedc. * Revert "Sphinx failure fixes (apache#13213)" This reverts commit 2e4d6c8.

@larroy

This PR makes it easy to create unittests that require specific settings of environment variables, while avoiding the pitfalls (discussed in comments section). This PR can be considered a recasting and expansion of the great vision of @larroy in creating the EnvManager class in #13140. In its base form, the facility is a drop-in replacement for EnvManager, and is called 'environment': with environment('MXNET_MY_NEW_FEATURE', '1'): <test with feature enabled> with environment('MXNET_MY_NEW_FEATURE', '0'): <test with feature disabled> Like EnvManager, this facility takes care of the save/restore of the previous environment variable state, including when exceptions are raised. In addition though, this PR introduces the features: A similarly-named unittest decorator: @with_environment(key, value) The ability to pass in multiple env vars as a dict (as is needed for some tests) in both forms, so for example: with environment({'MXNET_FEATURE_A': '1', 'MXNET_FEATURE_B': '1'}): <test with both features enabled> Works on Windows! This PR includes a wrapping of the backend's setenv() and getenv() functions, and uses this direct access to the backend environment to keep it in sync with the python environment. This works around the problem that the C Runtime on Windows gets a snapshot of the Python environment at startup that is immutable from Python. with environment() has a simple implementation using the @contextmanager decorator Tests are included that validate the facility works with all combinations of before_val/set_val, namely unset/unset, unset/set, set/unset, set/set. There were 5 unittests previously using EnvManager, and this PR shifts those uses to with environment():, while converting over 20 other ad-hoc uses of os.environ[] within the unittests. This PR also enables those unittests that were bypassed on Windows (due to the inability to set environment variables) to run on all platforms. Further Comments Environment variables are a two-edged sword- they enable useful operating modes for testing, debugging or niche applications, but like all features they must be tested. The correct approach for testing with a particular env var setting is: def set_env_var(key, value): if value is None: os.environ.pop(key, None) else: os.environ[key] = value old_env_var_value = os.environ.get(env_var_name) try: set_env_var(env_var_name, test_env_var_value) <perform test> finally: set_env_var(env_var_name, old_env_var_value ) The above code makes no assumption about whether the before-test and within-test state of the env var is set or unset, and restores the prior environment even if the test raises an exception. This represents a lot of boiler-plate code that could be potentially mishandled. The with environment() context makes it simple to handle all this properly. If an entire unittest wants a forced env var setting, then using the @with_environment() decorator avoids the code indent of the with environment() approach if used otherwise within the test.

@larroy

commit 0f65ef6 Author: Xi Wang <xidulu@gmail.com> Date: Wed Aug 5 10:48:50 2020 +0800 nb fix (apache#18858) commit 7b7cef5 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 18:23:48 2020 -0700 MXNet-TRT: Add PrePartition param caching - move init_tensorrt_params logic (apache#18490) * Update to TRT 7 API Signed-off-by: Serge Panev <spanev@nvidia.com> * Add PrePartition param caching - move init_tensorrt_params logic Signed-off-by: Serge Panev <spanev@nvidia.com> * Handle node with no defined input Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove tmp comment Signed-off-by: Serge Panev <spanev@nvidia.com> commit 59e200a Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Aug 4 17:01:23 2020 -0700 fix nn.dense doc (apache#18830) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 2e97226 Author: Leonard Lausen <lausen@amazon.com> Date: Tue Aug 4 21:11:32 2020 +0000 Fix edge case when casting gluon Block before export (apache#18853) * Fix edge case when casting gluon Block before export Fixes apache#18843 * Fix gpu test commit b8eccc8 Author: Yang Shi <yangshia@amazon.com> Date: Tue Aug 4 14:08:09 2020 -0700 fix set default website version rewrite rule for cdn (apache#18856) commit 7a40219 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 10:34:21 2020 -0700 Remove check for subgraph with cycles (apache#18555) * Remove check for subgraph with cycles Signed-off-by: Serge Panev <spanev@nvidia.com> * Add comments Signed-off-by: Serge Panev <spanev@nvidia.com> commit 95fa63f Author: Serge Panev <spanev@nvidia.com> Date: Mon Aug 3 17:15:02 2020 -0700 Update the onnx-tensorrt submodule - CI to TRT7 (apache#18574) commit 7f2e314 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Mon Aug 3 16:09:48 2020 -0700 update setup.py (apache#18850) * update setup.py * update python version Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit f872b43 Author: Leonard Lausen <lausen@amazon.com> Date: Mon Aug 3 20:11:06 2020 +0000 Protobuf_USE_STATIC_LIBS must be set on Apple too (apache#18851) Fixes apache#18840 commit 4bb8224 Author: Yang Shi <yangshia@amazon.com> Date: Mon Aug 3 12:30:13 2020 -0700 Fixed python website double scroller and improve UX (apache#18845) * make python site header scroll aware and avoid double scroller * add compiled assets * adjust python site second header height * add new line * set focus to main content on DOM load commit 7a5a488 Author: Iblis Lin <iblis@hs.ntnu.edu.tw> Date: Tue Aug 4 03:28:08 2020 +0800 Fix broken link in docs/README.md (apache#18847) commit 534cdbc Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 11:58:33 2020 -0700 Create greetings.yml (apache#18842) commit 9fd2cce Author: kpuatamazon <56725192+kpuatamazon@users.noreply.github.com> Date: Mon Aug 3 17:40:44 2020 +0100 Update tests/README.md Docker instructions to match ci/README.md (apache#18848) Documentation was missing python3-docker and had an outdated platform. commit 54b9e9c Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 08:59:33 2020 -0700 remove unnecessary usage of pretrained models, and prefer smaller size (apache#18844) commit 51340d8 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Sat Aug 1 16:23:03 2020 -0700 Add compiled_with_cxx11_abi API (apache#18836) * draft * add impl * add test * set default val Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal> commit 5a22193 Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Jul 31 17:06:17 2020 -0700 [NumPy] allow mixed array types (apache#18562) * allow mixed types in array func protocol * fix apache#18746 * add support for memory share check commit 08a5ee3 Author: Tao Lv <tao.a.lv@intel.com> Date: Sat Aug 1 03:38:20 2020 +0800 fix gelu to use erf based algorithm (apache#18827) commit ac36089 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 04:54:10 2020 +0000 Fixup move gluon.metric api docs (apache#18748) * Fix metric API page * Update index.rst commit 7a24006 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 02:58:55 2020 +0000 Enable DIST_KVSTORE by default in staticbuild (apache#18796) * Enable DIST_KVSTORE by default in staticbuild set(USE_DIST_KVSTORE ON CACHE BOOL "Build with DIST_KVSTORE support") * Ensure static linkage of dependencies * Fix for OS X * Fix shell syntax * Alternate approach to force static linkage of libprotobuf commit aa53291 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 30 19:53:27 2020 -0700 add adaptive left margin for python site document body (apache#18828) commit 045efb2 Author: Sheng Zha <szha@users.noreply.github.com> Date: Thu Jul 30 19:19:33 2020 -0700 [NumPy] DLPack refactor and npx.from_numpy (apache#18656) * refactor dlpack and add from_numpy to npx * remove reference of DeepNumPy * map platform-dependent types to fixed-size types * update DMLC_LOG_FATAL_THROW * fix flaky * fix flaky * test no error commit 608afef Author: Xi Wang <xidulu@gmail.com> Date: Fri Jul 31 02:30:25 2020 +0800 Fix dirichlet flaky tests (apache#18817) * make parameter smoother * minor changes commit 6bbd531 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 29 20:31:19 2020 +0000 Update clang-tidy integration (apache#18815) Run clang-tidy via cmake only on the code managed by mxnet (and not 3rdparty dependencies), update to clang-tidy-10 and run clang-tidy-10 -fix to fix all the warnings that are enforced on CI. Developers can run clang-tidy by specifying the -DCMAKE_CXX_CLANG_TIDY="clang-tidy-10" to cmake, or using the python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang_tidy script. commit b685fad Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:22:12 2020 -0700 use regex that is supported by all browsers (apache#18811) commit 9308aca Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:21:42 2020 -0700 remove other language bindings section from website api page (apache#18783) * remove other language bindings section from api page * remove language binding docs redirect * add call for contribution banner * modify call for contribution wording Co-authored-by: Aaron Markham <markhama@amazon.com> * more wording modification Co-authored-by: Aaron Markham <markhama@amazon.com> * add hyperlink to 1.x version in banner * add reference to the C api deprecation github issue Co-authored-by: Aaron Markham <markhama@amazon.com> commit 915f6b4 Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 11:28:37 2020 -0700 Remove deepnumpy reference and move Numpy tutorials to top level (apache#18798) * move np tutorials to top level * replace deepnumpy reference to np * add info in card * remove useless entry * replace NDArray API card with np.ndarray * python site refactor * remove duplicated drawer and refactor layout * extend document width to 100% for xl devices commit e9829e7 Author: Joe Evans <github@250hacks.net> Date: Tue Jul 28 18:53:29 2020 -0700 Cherry-pick large tensor support from apache#18752. (apache#18804) Co-authored-by: Joe Evans <joeev@amazon.com> commit 126636c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 28 22:11:20 2020 +0000 Fix naming in runtime_functions.sh (apache#18795) commit f83dbac Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Jul 28 11:48:05 2020 -0700 remove executor manager from API doc (apache#18802) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 7908d7e Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com> Date: Tue Jul 28 15:11:19 2020 +0800 [numpy] fix flaky mixed precision binary error (apache#18660) * temp * change test * fix bad func call * test * rectify * doc * change test commit a807f6d Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 22:06:50 2020 -0700 [NumPy] loss for np array (apache#17196) * loss for np/nd array * fix flaky commit 74430a9 Author: phile <phile_999@126.com> Date: Tue Jul 28 06:44:54 2020 +0800 remove NLL in metric (apache#18794) commit 9e77e81 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Mon Jul 27 14:27:52 2020 -0700 Update CUB and include it only for CUDA < 11 (apache#18799) commit 98b3f73 Author: Sheng Zha <szha@users.noreply.github.com> Date: Sat Jul 25 16:19:36 2020 -0700 add support for np.ndarray in autograd.function (apache#18790) commit c1db2d5 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 16:58:45 2020 +0000 Remove caffe plugin (apache#18787) * Remove caffe plugin * Fix * Remove CXX14 feature flag * Update test commit 2fbd182 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 02:48:30 2020 +0000 Split up CI sanity test functions to enable fine-grained trigger (apache#18786) Developers can now trigger fine grained checks: python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_python python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_license etc commit 06b5d22 Author: Serge Panev <spanev@nvidia.com> Date: Fri Jul 24 14:22:42 2020 -0700 ONNX import: use Conv pad attribute for symmetrical padding (apache#18675) Signed-off-by: Serge Panev <spanev@nvidia.com> commit e31ad77 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 23 11:33:31 2020 -0700 set website default version to current stable (1.6) version (apache#18738) * set website default version - test redirect * enable first time redirect on all master website pages * update test code * remove unnecessary test code * fix typo * delete test code commit 02ae456 Author: Dick Carter <dick.carter@comcast.net> Date: Thu Jul 23 11:17:10 2020 -0700 Improve environment variable handling in unittests (apache#18424) This PR makes it easy to create unittests that require specific settings of environment variables, while avoiding the pitfalls (discussed in comments section). This PR can be considered a recasting and expansion of the great vision of @larroy in creating the EnvManager class in apache#13140. In its base form, the facility is a drop-in replacement for EnvManager, and is called 'environment': with environment('MXNET_MY_NEW_FEATURE', '1'): <test with feature enabled> with environment('MXNET_MY_NEW_FEATURE', '0'): <test with feature disabled> Like EnvManager, this facility takes care of the save/restore of the previous environment variable state, including when exceptions are raised. In addition though, this PR introduces the features: A similarly-named unittest decorator: @with_environment(key, value) The ability to pass in multiple env vars as a dict (as is needed for some tests) in both forms, so for example: with environment({'MXNET_FEATURE_A': '1', 'MXNET_FEATURE_B': '1'}): <test with both features enabled> Works on Windows! This PR includes a wrapping of the backend's setenv() and getenv() functions, and uses this direct access to the backend environment to keep it in sync with the python environment. This works around the problem that the C Runtime on Windows gets a snapshot of the Python environment at startup that is immutable from Python. with environment() has a simple implementation using the @contextmanager decorator Tests are included that validate the facility works with all combinations of before_val/set_val, namely unset/unset, unset/set, set/unset, set/set. There were 5 unittests previously using EnvManager, and this PR shifts those uses to with environment():, while converting over 20 other ad-hoc uses of os.environ[] within the unittests. This PR also enables those unittests that were bypassed on Windows (due to the inability to set environment variables) to run on all platforms. Further Comments Environment variables are a two-edged sword- they enable useful operating modes for testing, debugging or niche applications, but like all features they must be tested. The correct approach for testing with a particular env var setting is: def set_env_var(key, value): if value is None: os.environ.pop(key, None) else: os.environ[key] = value old_env_var_value = os.environ.get(env_var_name) try: set_env_var(env_var_name, test_env_var_value) <perform test> finally: set_env_var(env_var_name, old_env_var_value ) The above code makes no assumption about whether the before-test and within-test state of the env var is set or unset, and restores the prior environment even if the test raises an exception. This represents a lot of boiler-plate code that could be potentially mishandled. The with environment() context makes it simple to handle all this properly. If an entire unittest wants a forced env var setting, then using the @with_environment() decorator avoids the code indent of the with environment() approach if used otherwise within the test. commit 18af71e Author: Leonard Lausen <lausen@amazon.com> Date: Thu Jul 23 18:09:10 2020 +0000 CI: Migrate remaining Dockerfiles to docker-compose.yml and remove unused code (apache#18771) * Migrate remaining Dockerfiles to docker-compose.yml - Delete unused Dockerfiles - Delete unused install/*.sh scripts - Consolidate ubuntu_gpu_tensorrt and ubuntu_gpu - Remove deprecated logic in ci/build.py (no longer needed with docker-compose) - Remove ci/docker_cache.py (no longer needed with docker-compose) * Fix * Fix * Fix ubuntu_cpu_jekyll commit 1928117 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Tue Jul 21 23:35:15 2020 -0700 Fix crash when accessing already destructed static variables (apache#18768) commit a330a02 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 22 06:31:47 2020 +0000 Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error (apache#18686) * Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error Performed shallow copy instead of deep copy * Test * Fix test commit 9548b0c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 21 21:42:01 2020 +0000 Remove duplicate settings in .codecov.yml (apache#18763) New PRs started showing the codecov/project badge again due apparent change in codecov's backend resolving these duplicate options specified in .codecov.yml

@larroy

commit d0e17e5 Author: Ke Han <38852697+hanke580@users.noreply.github.com> Date: Mon Aug 10 12:48:56 2020 +0800 [Numpy] FFI: sort, argsort, vstack etc (apache#17857) * * sort FFI * * argsort FFI * * vstack, row_stack FFI * * greater FFI * * inner FFI * multinomial FFI * rand FFI * randn FFI * * Fix input out of index and rscalar of greater * * Fix ndarray situation * * Fix sanity * fix lint * fix bugs * * Remove duplicate operator (greater) * * Fix Tuple downcast Error (Only Integer) * Fix segmentation fault(pointer) Co-authored-by: Sheng Zha <zhasheng@amazon.com> commit 5c50475 Author: Liu, Hao <haoliuhust@hotmail.com> Date: Mon Aug 10 08:15:22 2020 +0800 fix pooling_convention warning when convert model to onnx (apache#18529) * fix pooling_convention warning * fix pooling_convention warning * fix lint Co-authored-by: JackieWu <wkcn@live.cn> commit d52d9c6 Author: Sheng Zha <szha@users.noreply.github.com> Date: Sun Aug 9 13:33:03 2020 -0700 Revert "Add SOVERSION when build shared libmxnet.so library (apache#17815)" (apache#18882) This reverts commit d101c3c. commit 706c369 Author: Ziyue Huang <ziyue@apache.org> Date: Sun Aug 9 07:55:16 2020 +0800 fix trainer when the model involves share_parameters (apache#18880) * fix trainer when using shared_param * add unittest commit cf908fd Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Fri Aug 7 19:55:36 2020 -0700 [Numpy][Bugfix] Add hybridization test to loss layers (apache#18876) * Test for hybridization * fix typo * fix * fix test * update * Update loss.py * fix bug of sum commit d5fdcbf Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Fri Aug 7 18:11:16 2020 -0700 drop list support for gluon trainer (apache#18877) Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal> commit dde635f Author: Leonard Lausen <lausen@amazon.com> Date: Fri Aug 7 21:16:24 2020 +0000 Re-enable the linker version scripts for binary distribution (apache#18872) * Symbol visibility * Fix commit 1694d2f Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Aug 7 11:21:22 2020 -0700 [CI] remove data.mxnet.io usage for CI stability (apache#18871) * remove duplicate mnist functions * remove data.mxnet.io usage in tests * add waitall commit 708a900 Author: Serge Panev <spanev@nvidia.com> Date: Fri Aug 7 10:46:22 2020 -0700 Fix a bug in MXNet-TensorRT (apache#18870) Signed-off-by: Serge Panev <spanev@nvidia.com> commit d101c3c Author: Gustavo Alvarez <462213+sl1pkn07@users.noreply.github.com> Date: Fri Aug 7 04:34:51 2020 +0200 Add SOVERSION when build shared libmxnet.so library (apache#17815) https://en.wikipedia.org/wiki/Soname https://cmake.org/cmake/help/latest/prop_tgt/SOVERSION.html Co-authored-by: Leonard Lausen <lausen@amazon.com> commit a3eabf0 Author: Leonard Lausen <lausen@amazon.com> Date: Thu Aug 6 15:52:52 2020 +0000 Fix MXLibInfoCompiledWithCXX11ABI (apache#18864) * Fix MXLibInfoCompiledWithCXX11ABI * Fix test commit 84f8984 Author: bgawrych <bartlomiej.gawrych@intel.com> Date: Thu Aug 6 04:32:39 2020 +0200 ElementWiseSum fix for oneDNN (apache#18859) * Fix ElementwiseSum for DNNL * Add test for oneDNN ElemwiseSum Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com> commit a78f137 Author: Yang Shi <yangshia@amazon.com> Date: Wed Aug 5 14:24:46 2020 -0700 improve python api website ux - make toc sticky (apache#18863) commit 0f65ef6 Author: Xi Wang <xidulu@gmail.com> Date: Wed Aug 5 10:48:50 2020 +0800 nb fix (apache#18858) commit 7b7cef5 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 18:23:48 2020 -0700 MXNet-TRT: Add PrePartition param caching - move init_tensorrt_params logic (apache#18490) * Update to TRT 7 API Signed-off-by: Serge Panev <spanev@nvidia.com> * Add PrePartition param caching - move init_tensorrt_params logic Signed-off-by: Serge Panev <spanev@nvidia.com> * Handle node with no defined input Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove tmp comment Signed-off-by: Serge Panev <spanev@nvidia.com> commit 59e200a Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Aug 4 17:01:23 2020 -0700 fix nn.dense doc (apache#18830) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 2e97226 Author: Leonard Lausen <lausen@amazon.com> Date: Tue Aug 4 21:11:32 2020 +0000 Fix edge case when casting gluon Block before export (apache#18853) * Fix edge case when casting gluon Block before export Fixes apache#18843 * Fix gpu test commit b8eccc8 Author: Yang Shi <yangshia@amazon.com> Date: Tue Aug 4 14:08:09 2020 -0700 fix set default website version rewrite rule for cdn (apache#18856) commit 7a40219 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 10:34:21 2020 -0700 Remove check for subgraph with cycles (apache#18555) * Remove check for subgraph with cycles Signed-off-by: Serge Panev <spanev@nvidia.com> * Add comments Signed-off-by: Serge Panev <spanev@nvidia.com> commit 95fa63f Author: Serge Panev <spanev@nvidia.com> Date: Mon Aug 3 17:15:02 2020 -0700 Update the onnx-tensorrt submodule - CI to TRT7 (apache#18574) commit 7f2e314 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Mon Aug 3 16:09:48 2020 -0700 update setup.py (apache#18850) * update setup.py * update python version Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit f872b43 Author: Leonard Lausen <lausen@amazon.com> Date: Mon Aug 3 20:11:06 2020 +0000 Protobuf_USE_STATIC_LIBS must be set on Apple too (apache#18851) Fixes apache#18840 commit 4bb8224 Author: Yang Shi <yangshia@amazon.com> Date: Mon Aug 3 12:30:13 2020 -0700 Fixed python website double scroller and improve UX (apache#18845) * make python site header scroll aware and avoid double scroller * add compiled assets * adjust python site second header height * add new line * set focus to main content on DOM load commit 7a5a488 Author: Iblis Lin <iblis@hs.ntnu.edu.tw> Date: Tue Aug 4 03:28:08 2020 +0800 Fix broken link in docs/README.md (apache#18847) commit 534cdbc Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 11:58:33 2020 -0700 Create greetings.yml (apache#18842) commit 9fd2cce Author: kpuatamazon <56725192+kpuatamazon@users.noreply.github.com> Date: Mon Aug 3 17:40:44 2020 +0100 Update tests/README.md Docker instructions to match ci/README.md (apache#18848) Documentation was missing python3-docker and had an outdated platform. commit 54b9e9c Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 08:59:33 2020 -0700 remove unnecessary usage of pretrained models, and prefer smaller size (apache#18844) commit 51340d8 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Sat Aug 1 16:23:03 2020 -0700 Add compiled_with_cxx11_abi API (apache#18836) * draft * add impl * add test * set default val Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal> commit 5a22193 Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Jul 31 17:06:17 2020 -0700 [NumPy] allow mixed array types (apache#18562) * allow mixed types in array func protocol * fix apache#18746 * add support for memory share check commit 08a5ee3 Author: Tao Lv <tao.a.lv@intel.com> Date: Sat Aug 1 03:38:20 2020 +0800 fix gelu to use erf based algorithm (apache#18827) commit ac36089 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 04:54:10 2020 +0000 Fixup move gluon.metric api docs (apache#18748) * Fix metric API page * Update index.rst commit 7a24006 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 02:58:55 2020 +0000 Enable DIST_KVSTORE by default in staticbuild (apache#18796) * Enable DIST_KVSTORE by default in staticbuild set(USE_DIST_KVSTORE ON CACHE BOOL "Build with DIST_KVSTORE support") * Ensure static linkage of dependencies * Fix for OS X * Fix shell syntax * Alternate approach to force static linkage of libprotobuf commit aa53291 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 30 19:53:27 2020 -0700 add adaptive left margin for python site document body (apache#18828) commit 045efb2 Author: Sheng Zha <szha@users.noreply.github.com> Date: Thu Jul 30 19:19:33 2020 -0700 [NumPy] DLPack refactor and npx.from_numpy (apache#18656) * refactor dlpack and add from_numpy to npx * remove reference of DeepNumPy * map platform-dependent types to fixed-size types * update DMLC_LOG_FATAL_THROW * fix flaky * fix flaky * test no error commit 608afef Author: Xi Wang <xidulu@gmail.com> Date: Fri Jul 31 02:30:25 2020 +0800 Fix dirichlet flaky tests (apache#18817) * make parameter smoother * minor changes commit 6bbd531 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 29 20:31:19 2020 +0000 Update clang-tidy integration (apache#18815) Run clang-tidy via cmake only on the code managed by mxnet (and not 3rdparty dependencies), update to clang-tidy-10 and run clang-tidy-10 -fix to fix all the warnings that are enforced on CI. Developers can run clang-tidy by specifying the -DCMAKE_CXX_CLANG_TIDY="clang-tidy-10" to cmake, or using the python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang_tidy script. commit b685fad Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:22:12 2020 -0700 use regex that is supported by all browsers (apache#18811) commit 9308aca Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:21:42 2020 -0700 remove other language bindings section from website api page (apache#18783) * remove other language bindings section from api page * remove language binding docs redirect * add call for contribution banner * modify call for contribution wording Co-authored-by: Aaron Markham <markhama@amazon.com> * more wording modification Co-authored-by: Aaron Markham <markhama@amazon.com> * add hyperlink to 1.x version in banner * add reference to the C api deprecation github issue Co-authored-by: Aaron Markham <markhama@amazon.com> commit 915f6b4 Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 11:28:37 2020 -0700 Remove deepnumpy reference and move Numpy tutorials to top level (apache#18798) * move np tutorials to top level * replace deepnumpy reference to np * add info in card * remove useless entry * replace NDArray API card with np.ndarray * python site refactor * remove duplicated drawer and refactor layout * extend document width to 100% for xl devices commit e9829e7 Author: Joe Evans <github@250hacks.net> Date: Tue Jul 28 18:53:29 2020 -0700 Cherry-pick large tensor support from apache#18752. (apache#18804) Co-authored-by: Joe Evans <joeev@amazon.com> commit 126636c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 28 22:11:20 2020 +0000 Fix naming in runtime_functions.sh (apache#18795) commit f83dbac Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Jul 28 11:48:05 2020 -0700 remove executor manager from API doc (apache#18802) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 7908d7e Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com> Date: Tue Jul 28 15:11:19 2020 +0800 [numpy] fix flaky mixed precision binary error (apache#18660) * temp * change test * fix bad func call * test * rectify * doc * change test commit a807f6d Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 22:06:50 2020 -0700 [NumPy] loss for np array (apache#17196) * loss for np/nd array * fix flaky commit 74430a9 Author: phile <phile_999@126.com> Date: Tue Jul 28 06:44:54 2020 +0800 remove NLL in metric (apache#18794) commit 9e77e81 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Mon Jul 27 14:27:52 2020 -0700 Update CUB and include it only for CUDA < 11 (apache#18799) commit 98b3f73 Author: Sheng Zha <szha@users.noreply.github.com> Date: Sat Jul 25 16:19:36 2020 -0700 add support for np.ndarray in autograd.function (apache#18790) commit c1db2d5 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 16:58:45 2020 +0000 Remove caffe plugin (apache#18787) * Remove caffe plugin * Fix * Remove CXX14 feature flag * Update test commit 2fbd182 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 02:48:30 2020 +0000 Split up CI sanity test functions to enable fine-grained trigger (apache#18786) Developers can now trigger fine grained checks: python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_python python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_license etc commit 06b5d22 Author: Serge Panev <spanev@nvidia.com> Date: Fri Jul 24 14:22:42 2020 -0700 ONNX import: use Conv pad attribute for symmetrical padding (apache#18675) Signed-off-by: Serge Panev <spanev@nvidia.com> commit e31ad77 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 23 11:33:31 2020 -0700 set website default version to current stable (1.6) version (apache#18738) * set website default version - test redirect * enable first time redirect on all master website pages * update test code * remove unnecessary test code * fix typo * delete test code commit 02ae456 Author: Dick Carter <dick.carter@comcast.net> Date: Thu Jul 23 11:17:10 2020 -0700 Improve environment variable handling in unittests (apache#18424) This PR makes it easy to create unittests that require specific settings of environment variables, while avoiding the pitfalls (discussed in comments section). This PR can be considered a recasting and expansion of the great vision of @larroy in creating the EnvManager class in apache#13140. In its base form, the facility is a drop-in replacement for EnvManager, and is called 'environment': with environment('MXNET_MY_NEW_FEATURE', '1'): <test with feature enabled> with environment('MXNET_MY_NEW_FEATURE', '0'): <test with feature disabled> Like EnvManager, this facility takes care of the save/restore of the previous environment variable state, including when exceptions are raised. In addition though, this PR introduces the features: A similarly-named unittest decorator: @with_environment(key, value) The ability to pass in multiple env vars as a dict (as is needed for some tests) in both forms, so for example: with environment({'MXNET_FEATURE_A': '1', 'MXNET_FEATURE_B': '1'}): <test with both features enabled> Works on Windows! This PR includes a wrapping of the backend's setenv() and getenv() functions, and uses this direct access to the backend environment to keep it in sync with the python environment. This works around the problem that the C Runtime on Windows gets a snapshot of the Python environment at startup that is immutable from Python. with environment() has a simple implementation using the @contextmanager decorator Tests are included that validate the facility works with all combinations of before_val/set_val, namely unset/unset, unset/set, set/unset, set/set. There were 5 unittests previously using EnvManager, and this PR shifts those uses to with environment():, while converting over 20 other ad-hoc uses of os.environ[] within the unittests. This PR also enables those unittests that were bypassed on Windows (due to the inability to set environment variables) to run on all platforms. Further Comments Environment variables are a two-edged sword- they enable useful operating modes for testing, debugging or niche applications, but like all features they must be tested. The correct approach for testing with a particular env var setting is: def set_env_var(key, value): if value is None: os.environ.pop(key, None) else: os.environ[key] = value old_env_var_value = os.environ.get(env_var_name) try: set_env_var(env_var_name, test_env_var_value) <perform test> finally: set_env_var(env_var_name, old_env_var_value ) The above code makes no assumption about whether the before-test and within-test state of the env var is set or unset, and restores the prior environment even if the test raises an exception. This represents a lot of boiler-plate code that could be potentially mishandled. The with environment() context makes it simple to handle all this properly. If an entire unittest wants a forced env var setting, then using the @with_environment() decorator avoids the code indent of the with environment() approach if used otherwise within the test. commit 18af71e Author: Leonard Lausen <lausen@amazon.com> Date: Thu Jul 23 18:09:10 2020 +0000 CI: Migrate remaining Dockerfiles to docker-compose.yml and remove unused code (apache#18771) * Migrate remaining Dockerfiles to docker-compose.yml - Delete unused Dockerfiles - Delete unused install/*.sh scripts - Consolidate ubuntu_gpu_tensorrt and ubuntu_gpu - Remove deprecated logic in ci/build.py (no longer needed with docker-compose) - Remove ci/docker_cache.py (no longer needed with docker-compose) * Fix * Fix * Fix ubuntu_cpu_jekyll commit 1928117 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Tue Jul 21 23:35:15 2020 -0700 Fix crash when accessing already destructed static variables (apache#18768) commit a330a02 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 22 06:31:47 2020 +0000 Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error (apache#18686) * Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error Performed shallow copy instead of deep copy * Test * Fix test commit 9548b0c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 21 21:42:01 2020 +0000 Remove duplicate settings in .codecov.yml (apache#18763) New PRs started showing the codecov/project badge again due apparent change in codecov's backend resolving these duplicate options specified in .codecov.yml

@larroy

commit 8794a0a Author: Zhaoqi Zhu <zhaoqizh@usc.edu> Date: Tue Aug 18 17:36:42 2020 -0700 Numpy Dot Large Tensor Fix (apache#18925) * fix np dot * add test * fix test * tweak test Co-authored-by: Zhu <zhaoqzhu@3c22fbbb4e1a.ant.amazon.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-10-124.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-6-47.us-west-2.compute.internal> commit 32994bb Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Tue Aug 18 15:15:50 2020 -0700 Fix setting cudnn bias stride (apache#18905) commit c789d02 Author: Leonard Lausen <lausen@amazon.com> Date: Tue Aug 18 16:57:35 2020 +0000 Fix Python docs (apache#18924) * Fix Python docs * Fix * Fix commit 0afeb97 Author: nihui <shuizhuyuanluo@126.com> Date: Tue Aug 18 22:03:57 2020 +0800 Fix instancenorm math equation (apache#18955) commit e06ee4e Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Mon Aug 17 23:07:37 2020 -0700 Faster GPU frozen BatchNorm (apache#17368) * Better frozen batchnorm * Continue FreezeBN * Optimizations * Reduce number of mod operations * Cleaning * Fixing frozen bn with fix_gamma=False * Fix lint in BN * Backward frozen batchnorm * More work on backward of Frozen BN * Let it compile * NCHW Frozen BN backward * Frozen BN backward NHWC * Cleaning * Remove the change to Makefile * Fix from rebase * Temp space for BN backward * Fix from review * Fix lint * Changes from review commit 2610c10 Author: Serge Panev <spanev@nvidia.com> Date: Sat Aug 15 19:30:50 2020 -0700 Change Partition API's options_map to std::unordered_map (apache#18929) Signed-off-by: Serge Panev <spanev@nvidia.com> commit be12c8d Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Aug 14 17:35:34 2020 -0700 [Website] adjust website structure (apache#18839) * adjust website structure * update per comments * adjust ecosystem page * add ray tune * fix issues * update notebooks * fix breakage commit daf8b43 Author: Sam Skalicky <samskalicky@gmail.com> Date: Fri Aug 14 14:36:30 2020 -0700 Support extra inputs for subgraph ops (apache#18779) Support additional inputs to custom subgraph ops that are not direct dependencies to ops in the subgraph. This will enable various use cases: custom control flow ops, custom ops that maintain a state that should be saved/loaded, etc. Highlights: * Added test that uses a graph pass (addInputPass) to add a new custom input to the subgraph op * Added new optional argument (clear) to hybridize & optimize_for APIs in Gluon Block to enable multiple optimizations * refactored lib_api.h JSON utilities * added new Graph data structure utilities to simplify custom graph passes * refactored custom op registration * enhanced custom subgraph op to support additional inputs to subgraph op that is not an input to ops in the subgraph * updated subgraph & graph pass READMEs * Added error messaging from external library commit 86e96dc Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Thu Aug 13 22:27:10 2020 -0700 Fix backward of arctan2 and rarctan2 scalar on GPU (apache#18440) commit ee80b77 Author: bgawrych <bartlomiej.gawrych@intel.com> Date: Fri Aug 14 07:19:52 2020 +0200 Fix default CPU allocator memory alignment (apache#18885) * Replace std::malloc to aligned memory allocation in Pooled StorageManager * Add test checking CPU memory alignment * Fix memory allocation success check * Fix sanity commit 344587f Author: MoisesHer <50716238+MoisesHer@users.noreply.github.com> Date: Thu Aug 13 22:18:26 2020 -0700 Safe accumulation for computing gradient in Embedding & Take (apache#18385) * Safe accumulation for computing gradient in Embedding & Take * Fix bug in TakeGrad: initialize temporal storage for safe_accumulation * fix lint * make MXNET_SAFE_ACCUMULATION compatible with Windows * Increase test coverage: small inputs & SAFE_ACCUMULATION commit a2b400c Author: Joshua Z. Zhang <cheungchih@gmail.com> Date: Wed Aug 12 22:47:47 2020 -0700 fix center element not being copied (apache#18917) commit e2cbf66 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Aug 12 16:22:17 2020 +0000 Revert "drop list support for gluon trainer (apache#18877)" (apache#18892) This reverts commit d5fdcbf. commit 83d2af5 Author: Xi Wang <xidulu@gmail.com> Date: Wed Aug 12 15:25:17 2020 +0800 Gamma reparameterization gradient (apache#18852) * gamma grad wip * gamma grad wip * test tbd * fix grad * change scale to the frontend * fix bugs * change distributions.gamma * fix test and operator tune commit f2a8b97 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Aug 12 01:51:11 2020 +0000 Remove manually created symbolic link to ninja-build (apache#18906) apache@6bcfce9 for master branch commit 016c166 Author: JackieWu <wkcn@live.cn> Date: Wed Aug 12 03:54:46 2020 +0800 remove upper bound (apache#18857) commit e101d68 Author: Xi Wang <xidulu@gmail.com> Date: Tue Aug 11 12:50:26 2020 +0800 [Gluon] Add VAE demo (apache#18758) * add VAE demo * minor changes * change format to md * minor changes * add liscence * Update VAE.md * update vae demo * remove unnecessary files commit d0e17e5 Author: Ke Han <38852697+hanke580@users.noreply.github.com> Date: Mon Aug 10 12:48:56 2020 +0800 [Numpy] FFI: sort, argsort, vstack etc (apache#17857) * * sort FFI * * argsort FFI * * vstack, row_stack FFI * * greater FFI * * inner FFI * multinomial FFI * rand FFI * randn FFI * * Fix input out of index and rscalar of greater * * Fix ndarray situation * * Fix sanity * fix lint * fix bugs * * Remove duplicate operator (greater) * * Fix Tuple downcast Error (Only Integer) * Fix segmentation fault(pointer) Co-authored-by: Sheng Zha <zhasheng@amazon.com> commit 5c50475 Author: Liu, Hao <haoliuhust@hotmail.com> Date: Mon Aug 10 08:15:22 2020 +0800 fix pooling_convention warning when convert model to onnx (apache#18529) * fix pooling_convention warning * fix pooling_convention warning * fix lint Co-authored-by: JackieWu <wkcn@live.cn> commit d52d9c6 Author: Sheng Zha <szha@users.noreply.github.com> Date: Sun Aug 9 13:33:03 2020 -0700 Revert "Add SOVERSION when build shared libmxnet.so library (apache#17815)" (apache#18882) This reverts commit d101c3c. commit 706c369 Author: Ziyue Huang <ziyue@apache.org> Date: Sun Aug 9 07:55:16 2020 +0800 fix trainer when the model involves share_parameters (apache#18880) * fix trainer when using shared_param * add unittest commit cf908fd Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Fri Aug 7 19:55:36 2020 -0700 [Numpy][Bugfix] Add hybridization test to loss layers (apache#18876) * Test for hybridization * fix typo * fix * fix test * update * Update loss.py * fix bug of sum commit d5fdcbf Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Fri Aug 7 18:11:16 2020 -0700 drop list support for gluon trainer (apache#18877) Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal> commit dde635f Author: Leonard Lausen <lausen@amazon.com> Date: Fri Aug 7 21:16:24 2020 +0000 Re-enable the linker version scripts for binary distribution (apache#18872) * Symbol visibility * Fix commit 1694d2f Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Aug 7 11:21:22 2020 -0700 [CI] remove data.mxnet.io usage for CI stability (apache#18871) * remove duplicate mnist functions * remove data.mxnet.io usage in tests * add waitall commit 708a900 Author: Serge Panev <spanev@nvidia.com> Date: Fri Aug 7 10:46:22 2020 -0700 Fix a bug in MXNet-TensorRT (apache#18870) Signed-off-by: Serge Panev <spanev@nvidia.com> commit d101c3c Author: Gustavo Alvarez <462213+sl1pkn07@users.noreply.github.com> Date: Fri Aug 7 04:34:51 2020 +0200 Add SOVERSION when build shared libmxnet.so library (apache#17815) https://en.wikipedia.org/wiki/Soname https://cmake.org/cmake/help/latest/prop_tgt/SOVERSION.html Co-authored-by: Leonard Lausen <lausen@amazon.com> commit a3eabf0 Author: Leonard Lausen <lausen@amazon.com> Date: Thu Aug 6 15:52:52 2020 +0000 Fix MXLibInfoCompiledWithCXX11ABI (apache#18864) * Fix MXLibInfoCompiledWithCXX11ABI * Fix test commit 84f8984 Author: bgawrych <bartlomiej.gawrych@intel.com> Date: Thu Aug 6 04:32:39 2020 +0200 ElementWiseSum fix for oneDNN (apache#18859) * Fix ElementwiseSum for DNNL * Add test for oneDNN ElemwiseSum Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com> commit a78f137 Author: Yang Shi <yangshia@amazon.com> Date: Wed Aug 5 14:24:46 2020 -0700 improve python api website ux - make toc sticky (apache#18863) commit 0f65ef6 Author: Xi Wang <xidulu@gmail.com> Date: Wed Aug 5 10:48:50 2020 +0800 nb fix (apache#18858) commit 7b7cef5 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 18:23:48 2020 -0700 MXNet-TRT: Add PrePartition param caching - move init_tensorrt_params logic (apache#18490) * Update to TRT 7 API Signed-off-by: Serge Panev <spanev@nvidia.com> * Add PrePartition param caching - move init_tensorrt_params logic Signed-off-by: Serge Panev <spanev@nvidia.com> * Handle node with no defined input Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove tmp comment Signed-off-by: Serge Panev <spanev@nvidia.com> commit 59e200a Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Aug 4 17:01:23 2020 -0700 fix nn.dense doc (apache#18830) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 2e97226 Author: Leonard Lausen <lausen@amazon.com> Date: Tue Aug 4 21:11:32 2020 +0000 Fix edge case when casting gluon Block before export (apache#18853) * Fix edge case when casting gluon Block before export Fixes apache#18843 * Fix gpu test commit b8eccc8 Author: Yang Shi <yangshia@amazon.com> Date: Tue Aug 4 14:08:09 2020 -0700 fix set default website version rewrite rule for cdn (apache#18856) commit 7a40219 Author: Serge Panev <spanev@nvidia.com> Date: Tue Aug 4 10:34:21 2020 -0700 Remove check for subgraph with cycles (apache#18555) * Remove check for subgraph with cycles Signed-off-by: Serge Panev <spanev@nvidia.com> * Add comments Signed-off-by: Serge Panev <spanev@nvidia.com> commit 95fa63f Author: Serge Panev <spanev@nvidia.com> Date: Mon Aug 3 17:15:02 2020 -0700 Update the onnx-tensorrt submodule - CI to TRT7 (apache#18574) commit 7f2e314 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Mon Aug 3 16:09:48 2020 -0700 update setup.py (apache#18850) * update setup.py * update python version Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit f872b43 Author: Leonard Lausen <lausen@amazon.com> Date: Mon Aug 3 20:11:06 2020 +0000 Protobuf_USE_STATIC_LIBS must be set on Apple too (apache#18851) Fixes apache#18840 commit 4bb8224 Author: Yang Shi <yangshia@amazon.com> Date: Mon Aug 3 12:30:13 2020 -0700 Fixed python website double scroller and improve UX (apache#18845) * make python site header scroll aware and avoid double scroller * add compiled assets * adjust python site second header height * add new line * set focus to main content on DOM load commit 7a5a488 Author: Iblis Lin <iblis@hs.ntnu.edu.tw> Date: Tue Aug 4 03:28:08 2020 +0800 Fix broken link in docs/README.md (apache#18847) commit 534cdbc Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 11:58:33 2020 -0700 Create greetings.yml (apache#18842) commit 9fd2cce Author: kpuatamazon <56725192+kpuatamazon@users.noreply.github.com> Date: Mon Aug 3 17:40:44 2020 +0100 Update tests/README.md Docker instructions to match ci/README.md (apache#18848) Documentation was missing python3-docker and had an outdated platform. commit 54b9e9c Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Aug 3 08:59:33 2020 -0700 remove unnecessary usage of pretrained models, and prefer smaller size (apache#18844) commit 51340d8 Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Sat Aug 1 16:23:03 2020 -0700 Add compiled_with_cxx11_abi API (apache#18836) * draft * add impl * add test * set default val Co-authored-by: Ubuntu <ubuntu@ip-172-31-42-138.ec2.internal> commit 5a22193 Author: Sheng Zha <szha@users.noreply.github.com> Date: Fri Jul 31 17:06:17 2020 -0700 [NumPy] allow mixed array types (apache#18562) * allow mixed types in array func protocol * fix apache#18746 * add support for memory share check commit 08a5ee3 Author: Tao Lv <tao.a.lv@intel.com> Date: Sat Aug 1 03:38:20 2020 +0800 fix gelu to use erf based algorithm (apache#18827) commit ac36089 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 04:54:10 2020 +0000 Fixup move gluon.metric api docs (apache#18748) * Fix metric API page * Update index.rst commit 7a24006 Author: Leonard Lausen <lausen@amazon.com> Date: Fri Jul 31 02:58:55 2020 +0000 Enable DIST_KVSTORE by default in staticbuild (apache#18796) * Enable DIST_KVSTORE by default in staticbuild set(USE_DIST_KVSTORE ON CACHE BOOL "Build with DIST_KVSTORE support") * Ensure static linkage of dependencies * Fix for OS X * Fix shell syntax * Alternate approach to force static linkage of libprotobuf commit aa53291 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 30 19:53:27 2020 -0700 add adaptive left margin for python site document body (apache#18828) commit 045efb2 Author: Sheng Zha <szha@users.noreply.github.com> Date: Thu Jul 30 19:19:33 2020 -0700 [NumPy] DLPack refactor and npx.from_numpy (apache#18656) * refactor dlpack and add from_numpy to npx * remove reference of DeepNumPy * map platform-dependent types to fixed-size types * update DMLC_LOG_FATAL_THROW * fix flaky * fix flaky * test no error commit 608afef Author: Xi Wang <xidulu@gmail.com> Date: Fri Jul 31 02:30:25 2020 +0800 Fix dirichlet flaky tests (apache#18817) * make parameter smoother * minor changes commit 6bbd531 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 29 20:31:19 2020 +0000 Update clang-tidy integration (apache#18815) Run clang-tidy via cmake only on the code managed by mxnet (and not 3rdparty dependencies), update to clang-tidy-10 and run clang-tidy-10 -fix to fix all the warnings that are enforced on CI. Developers can run clang-tidy by specifying the -DCMAKE_CXX_CLANG_TIDY="clang-tidy-10" to cmake, or using the python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh build_ubuntu_cpu_clang_tidy script. commit b685fad Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:22:12 2020 -0700 use regex that is supported by all browsers (apache#18811) commit 9308aca Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 12:21:42 2020 -0700 remove other language bindings section from website api page (apache#18783) * remove other language bindings section from api page * remove language binding docs redirect * add call for contribution banner * modify call for contribution wording Co-authored-by: Aaron Markham <markhama@amazon.com> * more wording modification Co-authored-by: Aaron Markham <markhama@amazon.com> * add hyperlink to 1.x version in banner * add reference to the C api deprecation github issue Co-authored-by: Aaron Markham <markhama@amazon.com> commit 915f6b4 Author: Yang Shi <yangshia@amazon.com> Date: Wed Jul 29 11:28:37 2020 -0700 Remove deepnumpy reference and move Numpy tutorials to top level (apache#18798) * move np tutorials to top level * replace deepnumpy reference to np * add info in card * remove useless entry * replace NDArray API card with np.ndarray * python site refactor * remove duplicated drawer and refactor layout * extend document width to 100% for xl devices commit e9829e7 Author: Joe Evans <github@250hacks.net> Date: Tue Jul 28 18:53:29 2020 -0700 Cherry-pick large tensor support from apache#18752. (apache#18804) Co-authored-by: Joe Evans <joeev@amazon.com> commit 126636c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 28 22:11:20 2020 +0000 Fix naming in runtime_functions.sh (apache#18795) commit f83dbac Author: Haibin Lin <linhaibin.eric@gmail.com> Date: Tue Jul 28 11:48:05 2020 -0700 remove executor manager from API doc (apache#18802) Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com> commit 7908d7e Author: Yiyan66 <57363390+Yiyan66@users.noreply.github.com> Date: Tue Jul 28 15:11:19 2020 +0800 [numpy] fix flaky mixed precision binary error (apache#18660) * temp * change test * fix bad func call * test * rectify * doc * change test commit a807f6d Author: Sheng Zha <szha@users.noreply.github.com> Date: Mon Jul 27 22:06:50 2020 -0700 [NumPy] loss for np array (apache#17196) * loss for np/nd array * fix flaky commit 74430a9 Author: phile <phile_999@126.com> Date: Tue Jul 28 06:44:54 2020 +0800 remove NLL in metric (apache#18794) commit 9e77e81 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Mon Jul 27 14:27:52 2020 -0700 Update CUB and include it only for CUDA < 11 (apache#18799) commit 98b3f73 Author: Sheng Zha <szha@users.noreply.github.com> Date: Sat Jul 25 16:19:36 2020 -0700 add support for np.ndarray in autograd.function (apache#18790) commit c1db2d5 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 16:58:45 2020 +0000 Remove caffe plugin (apache#18787) * Remove caffe plugin * Fix * Remove CXX14 feature flag * Update test commit 2fbd182 Author: Leonard Lausen <lausen@amazon.com> Date: Sat Jul 25 02:48:30 2020 +0000 Split up CI sanity test functions to enable fine-grained trigger (apache#18786) Developers can now trigger fine grained checks: python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_python python ci/build.py -R --platform ubuntu_cpu /work/runtime_functions.sh sanity_license etc commit 06b5d22 Author: Serge Panev <spanev@nvidia.com> Date: Fri Jul 24 14:22:42 2020 -0700 ONNX import: use Conv pad attribute for symmetrical padding (apache#18675) Signed-off-by: Serge Panev <spanev@nvidia.com> commit e31ad77 Author: Yang Shi <yangshia@amazon.com> Date: Thu Jul 23 11:33:31 2020 -0700 set website default version to current stable (1.6) version (apache#18738) * set website default version - test redirect * enable first time redirect on all master website pages * update test code * remove unnecessary test code * fix typo * delete test code commit 02ae456 Author: Dick Carter <dick.carter@comcast.net> Date: Thu Jul 23 11:17:10 2020 -0700 Improve environment variable handling in unittests (apache#18424) This PR makes it easy to create unittests that require specific settings of environment variables, while avoiding the pitfalls (discussed in comments section). This PR can be considered a recasting and expansion of the great vision of @larroy in creating the EnvManager class in apache#13140. In its base form, the facility is a drop-in replacement for EnvManager, and is called 'environment': with environment('MXNET_MY_NEW_FEATURE', '1'): <test with feature enabled> with environment('MXNET_MY_NEW_FEATURE', '0'): <test with feature disabled> Like EnvManager, this facility takes care of the save/restore of the previous environment variable state, including when exceptions are raised. In addition though, this PR introduces the features: A similarly-named unittest decorator: @with_environment(key, value) The ability to pass in multiple env vars as a dict (as is needed for some tests) in both forms, so for example: with environment({'MXNET_FEATURE_A': '1', 'MXNET_FEATURE_B': '1'}): <test with both features enabled> Works on Windows! This PR includes a wrapping of the backend's setenv() and getenv() functions, and uses this direct access to the backend environment to keep it in sync with the python environment. This works around the problem that the C Runtime on Windows gets a snapshot of the Python environment at startup that is immutable from Python. with environment() has a simple implementation using the @contextmanager decorator Tests are included that validate the facility works with all combinations of before_val/set_val, namely unset/unset, unset/set, set/unset, set/set. There were 5 unittests previously using EnvManager, and this PR shifts those uses to with environment():, while converting over 20 other ad-hoc uses of os.environ[] within the unittests. This PR also enables those unittests that were bypassed on Windows (due to the inability to set environment variables) to run on all platforms. Further Comments Environment variables are a two-edged sword- they enable useful operating modes for testing, debugging or niche applications, but like all features they must be tested. The correct approach for testing with a particular env var setting is: def set_env_var(key, value): if value is None: os.environ.pop(key, None) else: os.environ[key] = value old_env_var_value = os.environ.get(env_var_name) try: set_env_var(env_var_name, test_env_var_value) <perform test> finally: set_env_var(env_var_name, old_env_var_value ) The above code makes no assumption about whether the before-test and within-test state of the env var is set or unset, and restores the prior environment even if the test raises an exception. This represents a lot of boiler-plate code that could be potentially mishandled. The with environment() context makes it simple to handle all this properly. If an entire unittest wants a forced env var setting, then using the @with_environment() decorator avoids the code indent of the with environment() approach if used otherwise within the test. commit 18af71e Author: Leonard Lausen <lausen@amazon.com> Date: Thu Jul 23 18:09:10 2020 +0000 CI: Migrate remaining Dockerfiles to docker-compose.yml and remove unused code (apache#18771) * Migrate remaining Dockerfiles to docker-compose.yml - Delete unused Dockerfiles - Delete unused install/*.sh scripts - Consolidate ubuntu_gpu_tensorrt and ubuntu_gpu - Remove deprecated logic in ci/build.py (no longer needed with docker-compose) - Remove ci/docker_cache.py (no longer needed with docker-compose) * Fix * Fix * Fix ubuntu_cpu_jekyll commit 1928117 Author: Przemyslaw Tredak <ptredak@nvidia.com> Date: Tue Jul 21 23:35:15 2020 -0700 Fix crash when accessing already destructed static variables (apache#18768) commit a330a02 Author: Leonard Lausen <lausen@amazon.com> Date: Wed Jul 22 06:31:47 2020 +0000 Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error (apache#18686) * Fix mx.symbol.numpy._Symbol.__deepcopy__ logic error Performed shallow copy instead of deep copy * Test * Fix test commit 9548b0c Author: Leonard Lausen <lausen@amazon.com> Date: Tue Jul 21 21:42:01 2020 +0000 Remove duplicate settings in .codecov.yml (apache#18763) New PRs started showing the codecov/project badge again due apparent change in codecov's backend resolving these duplicate options specified in .codecov.yml

larroy requested a review from szha as a code owner November 6, 2018 19:02

anirudhacharya approved these changes Nov 6, 2018

View reviewed changes

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 6, 2018

vishaalkapoor reviewed Nov 7, 2018

View reviewed changes

larroy force-pushed the kvstore_test branch 2 times, most recently from e1d60bd to 1cf2930 Compare November 8, 2018 13:01

marcoabreu approved these changes Nov 8, 2018

View reviewed changes

larroy force-pushed the kvstore_test branch from 1cf2930 to 1abfaa3 Compare November 8, 2018 17:03

larroy added 2 commits November 10, 2018 12:57

Refactor kvstore test

1b7529a

Fix pylint

3f3f2aa

larroy force-pushed the kvstore_test branch from 1abfaa3 to 3f3f2aa Compare November 10, 2018 13:10

gigasquid and others added 2 commits November 10, 2018 23:36

Fix problem with some OSX not handling the cast on imDecode (apache#1…

1d4540b

…3207)

Fix num_gpus

9af042e

larroy requested review from nswamy and yzhliu as code owners November 10, 2018 23:38

marcoabreu merged commit d8d2d6e into apache:master Nov 12, 2018

vdantu added a commit to vdantu/incubator-mxnet that referenced this pull request Nov 12, 2018

Revert "Refactor kvstore test (apache#13140)"

9587e6c

This reverts commit d8d2d6e.

larroy deleted the kvstore_test branch November 15, 2018 18:44

azai91 pushed a commit to azai91/incubator-mxnet that referenced this pull request Dec 1, 2018

Refactor kvstore test (apache#13140)

7acbee7

* Refactor kvstore test * Fix pylint * Fix problem with some OSX not handling the cast on imDecode (apache#13207) * Fix num_gpus

DickJC123 mentioned this pull request May 28, 2020

environment variable handling in unittests #18424

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor kvstore test #13140

Refactor kvstore test #13140

larroy commented Nov 6, 2018

anirudhacharya left a comment

anirudhacharya commented Nov 6, 2018

vishaalkapoor Nov 7, 2018

larroy Nov 8, 2018

vishaalkapoor commented Nov 7, 2018

Refactor kvstore test #13140

Refactor kvstore test #13140

Conversation

larroy commented Nov 6, 2018

Description

Checklist

Essentials

anirudhacharya left a comment

Choose a reason for hiding this comment

anirudhacharya commented Nov 6, 2018

vishaalkapoor Nov 7, 2018

Choose a reason for hiding this comment

larroy Nov 8, 2018

Choose a reason for hiding this comment

vishaalkapoor commented Nov 7, 2018