Merge upstream master #477

rohithkrn · 2019-09-20T18:56:39Z

No description provided.

Summary: This PR adds Average Pool module to C++ front-end. Pull Request resolved: pytorch#25800 Differential Revision: D17318094 Pulled By: yf225 fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db

Summary: Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message) Pull Request resolved: pytorch#25809 Reviewed By: zrphercule Differential Revision: D17329957 Pulled By: houseroad fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae

…ytorch#25862) Summary: This change adds a new prepack and run function for FC and Convolution operators in QNNPACK. The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv` Pull Request resolved: pytorch#25862 Test Plan: QNNPACK unit tests fully-connected-test convolution-test Differential Revision: D17299260 Pulled By: supriyar fbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54

Summary: Enable mGPU tests that pass on ROCm as of 2.7. Pull Request resolved: pytorch#26055 Differential Revision: D17331484 Pulled By: bddppq fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c

Summary: Pull Request resolved: pytorch#26075 att, remove verbose argument to reduce noice in the logs Test Plan: ci Imported from OSS Differential Revision: D17335935 fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481

Summary: Pull Request resolved: pytorch#25877 Test Plan: Imported from OSS Reviewed By: jianyuh Differential Revision: D17275746 Pulled By: jamesr66a fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8

Summary: Pull Request resolved: pytorch#26064 Just changing the names after pytorch#25678. ghstack-source-id: 89944542 Test Plan: CI Differential Revision: D17332068 fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7

Summary: Pull Request resolved: pytorch#25976 As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051: > We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability. ghstack-source-id: 89929947 Test Plan: CI Differential Revision: D17310441 fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922

…6079) Summary: Pull Request resolved: pytorch#26079 This reverts commit e303961. Test Plan: Imported from OSS Differential Revision: D17337585 Pulled By: jamesr66a fbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889

Summary: Pull Request resolved: pytorch#25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672

Summary: Pull Request resolved: pytorch#25734 [pytorch] Dynamic registration of RPC backends Allow non-pg rpc backends to be plugged in as a backend. ghstack-source-id: 89938296 Differential Revision: D17183789 fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089

Summary: Enable one unit test that passes now. Pull Request resolved: pytorch#25956 Differential Revision: D17298150 Pulled By: bddppq fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4

) Summary: Pull Request resolved: pytorch#26030 Test Plan: - [namedtensor ci] Pull Request resolved: pytorch#26030 Differential Revision: D17322383 Pulled By: zou3519 fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96

Summary: Pull Request resolved: pytorch#26050 Throws a warning once when someone attempts to attach names to a tensor. This is guaranteed to happen at the callsite `set_named_tensor_meta`. Test Plan: - run tests [namedtensor ci] Differential Revision: D17331634 Pulled By: zou3519 fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0

Summary: While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code. Pull Request resolved: pytorch#25868 Differential Revision: D17318386 Pulled By: Krovatkin fbshipit-source-id: 928ba6543204042924ab41a724635594709630de

Summary: Was recently enabled in pytorch#26055, it's flaky on master: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575 https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577 ``` 05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3: 05:39:40 Traceback (most recent call last): 05:39:40 File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 05:39:40 self.run() 05:39:40 File "/usr/lib/python2.7/threading.py", line 754, in run 05:39:40 self.__target(*self.__args, **self.__kwargs) 05:39:40 File "test_cuda.py", line 1894, in _test_stream_event_nogil 05:39:40 c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES)) 05:39:40 File "test_cuda.py", line 1882, in _event_wait 05:39:40 self.assertTrue(s1.query()) 05:39:40 File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue 05:39:40 raise self.failureException(msg) 05:39:40 AssertionError: False is not true ``` Pull Request resolved: pytorch#26087 Differential Revision: D17340891 Pulled By: bddppq fbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd

Summary: Pull Request resolved: pytorch#26084 Test Plan: Imported from OSS Differential Revision: D17339315 Pulled By: jamesr66a fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c

…db3244 (pytorch#25959) Summary: Pull Request resolved: pytorch#25959 Previous import was 28ca699b69b5a31892619defca2391044a9a6052 Included changes: - **[7988d836](onnx/onnx@7988d836)**: Supporting negative axes for all existing onnx ops (pytorch#2281) <Negin Raoof> - **[5ca0a09e](onnx/onnx@5ca0a09e)**: Update managingexperimentalops.md (pytorch#1981) <Joseph Spisak> - **[bc0495c1](onnx/onnx@bc0495c1)**: Fix link to community docs in readme (pytorch#2261) <Prasanth Pulavarthi> - **[2fdb3ef6](onnx/onnx@2fdb3ef6)**: move map and sequence types to onnx domain, (pytorch#2244) <Ke Zhang> - **[568b65aa](onnx/onnx@568b65aa)**: Improve compatiblity with proto3 and enable reading attributes (pytorch#2288) <Dmitri Smirnov> - **[1f350f2c](onnx/onnx@1f350f2c)**: Remove type info for loop variadic input in Loop op used to compose the Range op (pytorch#2287) <Hariharan Seshadri> - **[eb139446](onnx/onnx@eb139446)**: Add Foundation WG to working-groups.md (pytorch#2276) <Ryan Loney> - **[4eabc4b3](onnx/onnx@4eabc4b3)**: Fix testdata model for CumSum. Add exclusive attribute. (pytorch#2271) <jignparm> - **[1a62afdb](onnx/onnx@1a62afdb)**: Support GatherND operator in ONNX (pytorch#2106) <Hariharan Seshadri> - **[0e330e9d](onnx/onnx@0e330e9d)**: Support ScatterND operator in ONNX (pytorch#2220) <Bowen Bao> - **[733f7a6a](onnx/onnx@733f7a6a)**: Add Det to ONNX (pytorch#2233) <Bowen Bao> - **[52187738](onnx/onnx@52187738)**: Update the description of nearest_mode of resize op (pytorch#2257) <daquexian> - **[64b4b686](onnx/onnx@64b4b686)**: Adding sparse tensor to ONNX (pytorch#2019) <G. Ramalingam> - **[c8a8b7cc](onnx/onnx@c8a8b7cc)**: Support Range operator in ONNX (pytorch#2242) <Hariharan Seshadri> - **[44b0d6d5](onnx/onnx@44b0d6d5)**: Update resize op (pytorch#2057) <daquexian> - **[7d907964](onnx/onnx@7d907964)**: Add function to fuse dynamic quantization graph into 1 node (pytorch#2187) <Ashwini Khade> - **[36f8e6d9](onnx/onnx@36f8e6d9)**: Update logo_request.md (pytorch#2231) <Prasanth Pulavarthi> - **[4eb737c8](onnx/onnx@4eb737c8)**: Update Clip in opset 11 to support min/max as inputs instead of attributes (pytorch#2096) <Bowen Bao> - **[a25e1388](onnx/onnx@a25e1388)**: Fix segfault in tile shape inference (pytorch#2221) <daquexian> - **[2dc273c7](onnx/onnx@2dc273c7)**: update onehot shape inference to reflect the spec for depth input (pytorch#2224) <Ashwini Khade> - **[665211c1](onnx/onnx@665211c1)**: Add GatherElements Op and Rename ScatterElements (pytorch#2143) <Lara Haidar> - **[3ba2e31a](onnx/onnx@3ba2e31a)**: Unique (pytorch#2141) <liqunfu> - **[5a5588ad](onnx/onnx@5a5588ad)**: Clarify dimension variable scoping (pytorch#2211) <G. Ramalingam> - **[fabe39d5](onnx/onnx@fabe39d5)**: Liqun/topk sort (pytorch#2126) <liqunfu> - **[453aa644](onnx/onnx@453aa644)**: Update document for NMS (pytorch#2193) <Hector Li> - **[34e28ec2](onnx/onnx@34e28ec2)**: Handle negative 'axis' value in Split type and shape inferencing (pytorch#2177) <Scott McKay> - **[28ec4583](onnx/onnx@28ec4583)**: depth to space shuffle order (pytorch#2163) <Negin Raoof> - **[98f72629](onnx/onnx@98f72629)**: minor updates to fix links in readme (pytorch#2189) <Prasanth Pulavarthi> - **[321d1467](onnx/onnx@321d1467)**: Add check to disallow squeezing input axes which are not 1 (pytorch#2204) <Ashwini Khade> - **[573f0dc9](onnx/onnx@573f0dc9)**: fix a bug in fun shape inference (pytorch#2188) <Tang, Cheng> - **[36dc7110](onnx/onnx@36dc7110)**: Clarify ambiguity in gather spec regarding indices expectation (pytorch#2202) <Ashwini Khade> - **[a2449673](onnx/onnx@a2449673)**: Fix some minor issues in IR.md and Versioning.md (pytorch#2108) <edgchen1> - **[349aff69](onnx/onnx@349aff69)**: Skip install typing package for python >=3.5 (pytorch#2199) <bddppq> Test Plan: ci Reviewed By: bddppq, benoitsteiner Differential Revision: D17296390 fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71

Summary: cc: gchanan zou3519 I will look into why this is failing spuriously. Pull Request resolved: pytorch#26108 Differential Revision: D17348399 Pulled By: zou3519 fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3

Summary: Pull Request resolved: pytorch#26080 Will be used in c2 ctr_mbl_feed model to PyTorch conversion Test Plan: Unit test Reviewed By: yinghai Differential Revision: D17337604 fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753

Summary: Pull Request resolved: pytorch#25863 Differential Revision: D17347386 Pulled By: Krovatkin fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875

Summary: This basically works a simple filter as you suggested ZolotukhinM `export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements. `export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` **and** `GRAPH_DEBUG` statements in `guard_elimination.cpp` **and** in `alias_analysis.cpp` Pull Request resolved: pytorch#25895 Differential Revision: D17309090 Pulled By: Krovatkin fbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444

Summary: Pull Request resolved: pytorch#25606 This just complicates the codegen for no benefit. Test Plan: Imported from OSS Differential Revision: D17172498 Pulled By: gchanan fbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc

Summary: Pull Request resolved: pytorch#25607 Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property. Test Plan: Imported from OSS Differential Revision: D17172500 Pulled By: gchanan fbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844

Summary: Pull Request resolved: pytorch#25897 It doesn't hurt to set all variables unconditionally. And we can create link to lib directory instead of specific files - this way it's easier to switch between dynamic/static library names. Test Plan: - check android gradle CI; - use stack diff to check all 4 architectures on PR; Pull Request resolved: pytorch#25897 Differential Revision: D17307240 Pulled By: ljk53 fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a

Summary: Pull Request resolved: pytorch#25984 Link static libtorch libraries into pytorch.so (API library for android) with "-Wl,--gc-sections" flag to remove unused symbols in libtorch. Test Plan: - full gradle CI with stacked PR; - will check final artifacts.tgz size change; Differential Revision: D17312859 Pulled By: ljk53 fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5

Summary: Pull Request resolved: pytorch#26113 After pytorch#16914, passing in an argument such as "build_deps" (i.e. python setup.py build_deps develop) is invalid since it gets picked up as an invalid argument. ghstack-source-id: 90003508 Test Plan: Before, this script would execute "python setup.py build_deps develop", which errored. Now it executes "python setup.py develop" without an error. Verified by successfully running the script on devgpu. In setup.py, there is already a `RUN_BUILD_DEPS = True` flag. Differential Revision: D17350359 fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081

Summary: Pull Request resolved: pytorch#25608 Test Plan: Imported from OSS Differential Revision: D17172494 Pulled By: gchanan fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a

Summary: base_lr parameter was being overridden by super `__init__`, see pytorch#21965. Pull Request resolved: pytorch#26105 Reviewed By: yf225 Differential Revision: D17346724 Pulled By: vincentqb fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c

Summary: Pull Request resolved: pytorch#25504 Skip inserting duplicate observers for values observed in forward method of a child module or other methods in the current module. Test Plan: python test/test_jit.py -- 'TestJit.insert_observers' python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig' python test/test_jit.py -- 'TestJit.insert_observers_skip_values' Imported from OSS Differential Revision: D17208888 fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323

Summary: In schema matching we allow a homogenous tuple to be matched to list arguments. This logic wasn't yet extended for vartype lists, causing stuff like `len((1, 2, 3))` to fail. Fix for pytorch#20500 Pull Request resolved: pytorch#25944 Differential Revision: D17482510 Pulled By: eellison fbshipit-source-id: aa63318c27a01d965a7a7b68ce8bec638168dc26

Summary: At the moment it includes pytorch#26219 changes. That PR is landing at the moment, afterwards this PR will contain only javadocs. Applied all dreiss comments from previous version. Pull Request resolved: pytorch#26149 Differential Revision: D17490720 Pulled By: IvanKobzarev fbshipit-source-id: f340dee660d5ffe40c96b43af9312c09f85a000b

Summary: This PR adds support for multidimensional inputs to `torch::tensor`, to match the Python `torch.tensor` API. Closes pytorch#16099. Pull Request resolved: pytorch#26210 Differential Revision: D17456761 Pulled By: yf225 fbshipit-source-id: a53ce74c535c13c5dcb833f19e9b6b79d12376b5

…torch#26471) Summary: Pull Request resolved: pytorch#26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9

Summary: Serialization.cpp fails on big endian machines. This patch fixes the endian bugs and also makes the pytorch model files portable across different endian architectures. x86 generated model file can be read on s390 arch. First problem, is serialization.cpp forgets to convert "size" value of the storage elements to the native byte order. torch.load throws an assertion as a result (see the first stack trace below). Second problem is when it reads the model from storage (doRead) it decodes values to little endian which is the wrong order on a big endian machine. The decode should be to THP_nativeByteOrder() instead (see the model dump below) ```loaded_model = torch.load( opt.model_file, map_location=torch.device("cpu")) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 422, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 616, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: storage has wrong size: expected 2305843009213693952 got 32 (the very long number is actually 32 in the wrong endianness) ``` Model file load on x86 (correct output) ```>>> import torch >>> torch.load('400f2k_best.model', map_location=torch.device("cpu")) {'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 2.4608e-01, -1.1174e-01, -1.0854e-01, 4.0124e-01, -1.5261e-02, -1.2206e-01, 1.3229e-01, -1.2615e-01, -5.2773e-01, 2.6333e-01, -3.1462e-03, -1.4902e-01, 9.8545e-02, -1.5789e-01, -2.2625e-01, -1.0776e-01, -9.0895e-02, -3.8530e-01, 9.1152e-01, -3.9720e-01, -8.5848e-01, -4.7837e-02, -1.5178e-01, 8.5023e-02, 1.5013e-01, -9.9294e-02, -2.7422e-01, -4.3986e-01, -4.4297e-01, -3.9570e-01, ``` Model file load on s390x (wrong endianness; notice the exponents) ```>>> import torch >>> torch.load( "400f2k_best.model", map_location=torch.device("cpu")) {'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 9.2780e+21, -9.7722e-11, 4.1350e+33, 7.782e+34, 4.2056e-31, 9.0784e+18, 1.1846e-32, 3.3320e-32, -4.8288e-28, -7.2679e+12, 1.5379e-16, -5.2604e+12, -4.7240e+17, 4.6092e-21, -1.8360e-20, -2.7712e-31, 1.4548e-16, -2.5089e-27, 7.9094e-10, 7.1977e+34, 1.1930e+26, 8.4536e+15, 2.7757e+23, -5.8455e-10, -1.5611e+09, -1.1311e-23, 6.6451e+19, -2.0970e+20, 3.4878e-19, -1.0857e-12, 7.8098e+22, 5.3998e-35], ``` Pull Request resolved: pytorch#26383 Differential Revision: D17480891 fbshipit-source-id: f40569c7b9c4a1935dceb41f1a2508ce21ea3491

Summary: Pull Request resolved: pytorch#26477 - At inference time we need turn off autograd mode and turn on no-variable mode since we strip out these modules for inference-only mobile build. - Both flags are stored in thread-local variables so we cannot simply set them to false glboally. - Add "autograd/grad_mode.h" header to all-in-one header 'torch/script.h' to reduce friction for iOS engs who might need do this manually in their project. P.S. I tried to hide AutoNonVariableTypeMode in codegen but figured it's not very trivial (e.g. there are manually written part not covered by codegen). Might try it again later. Test Plan: - Integrate with Android demo app to confirm inference runs correctly. Differential Revision: D17484259 Pulled By: ljk53 fbshipit-source-id: 06887c8b527124aa0cc1530e8e14bb2361acef31

Summary: Pull Request resolved: pytorch#25975 We would like to add the FP16 weight support for the dynamic quantized LSTM. Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details ``` [jianyuhuang@devvm794.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details Building: finished in 13.4 sec (100%) 8134/8134 jobs, 81 updated Total time: 13.9 sec Trace available for this run at /tmp/testpilot.20190910-210241.2092790.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision c86e65add357582accb6ec0be23b92c8a2c510bd fbpkg ca46e8f5b26c451a8b0b2462c11bb61d at Mon Sep 9 22:16:37 2019 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/696/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971 ✓ caffe2/test:quantization - test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) 0.183 1/1 (passed) Test output: > test_quantized_rnn (test_quantization.PostTrainingDynamicQuantTest) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.184s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900050322971 Summary (total time 4.35s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D17299116 fbshipit-source-id: 7fe91ece25867f2c0496f1b63fb1041e6b815166

Summary: Adds `Distance` module parity. pytorch#25883 Pull Request resolved: pytorch#26424 Differential Revision: D17487314 Pulled By: yf225 fbshipit-source-id: c7d124cb4afb08a4733e7212af0bb276bf32d172

…ytorch#26498) Summary: Pull Request resolved: pytorch#26498 We should allocate an empty tensor as a result tensor when performing binary ops. Currently some ops use `empty_like(self)` as the initial result tensor before passing it into TensorIterator. This is not very efficient because TensorIterator may resize the tensor due to broadcasting, causing more memory allocation. By using an empty tensor as the result tensor, we only need to allocate/resize memory once as opposed to twice. Also fixes pytorch#26495. The bug there is that the implementation of `pow` is missing a resize in one case. Test Plan: - new test - run tests Differential Revision: D17500025 Pulled By: zou3519 fbshipit-source-id: bff4949af5e75541c04669b961bcf2e1ec456faf

Summary: Pull Request resolved: pytorch#26504 [pytorch] [distributed] Make distructor virtual for class with virtual function Not having virtual distructor may lead to a memory leak. ghstack-source-id: 90454880 Test Plan: Made sure pg based UT works. Differential Revision: D17488876 fbshipit-source-id: 5fdc55e175fd2b22e931b740c36cb1feed454066

Summary: test_wrapped_number was calling torch.set_default_tensor_type('torch.FloatTensor'), which was setting the default tensor types for all following tests until a class boundary (with unittest) or until end of file (with pytest). Tests that don't expect the default tensor type to be set this way were then failing if run afterwards. This fixes the issue by copying the default_tensor_type decorator from test_nn and using that instead with test_wrapped_number. The decorator correctly resets the default tensor type after the test has run. This fixes the many errors encountered when running pytest test_jit.py. Note: test_wrapped_number was introduced in pytorch#22273. Pull Request resolved: pytorch#26523 Differential Revision: D17495283 Pulled By: mruberry fbshipit-source-id: ab518c78b7706af7cb1c2d1c17823d311178996d

Summary: These are intentionally not yet used by the encoder to avoid backcompat issues. Pull Request resolved: pytorch#26454 Differential Revision: D17480844 fbshipit-source-id: e88ae7f5b94e32c7f12341a750aa4b9f7374bfb7

Summary: Pull Request resolved: pytorch#26501 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at pytorch/xla#1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there *should* have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into '*this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(*this, src))(const_cast<Tensor&>(*this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. * Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D17499154 Pulled By: ezyang fbshipit-source-id: 8ea237c2e935134b0f4f8d6cfd89c6a93037c02c

Summary: Pull Request resolved: pytorch#26364 Per pytorch#25769, we sometimes get an infinite loop when `TCPStore` calls `tcputil::connect`, and the server continually returns `ECONNRESET` or `ECONNREFUSED`. If a proper timeout is passed in, we guard against this by throwing an exception once the timeout has passed. Testing: Tested with modifying `TCPStore` to connect to an invalid port, thus getting `ECONNREFUSED`. If a valid timeout is passed in, the function correctly throws an exception. Steps below: 1) in TCPStore.cpp's constructor, replace the `connect` call with this line: `storeSocket_ = tcputil::connect(tcpStoreAddr_, 1, true, std::chrono::milliseconds(3000));` 2) Build the `TCPStoreTest` binary. 3) Run the binary. Expected output: ``` terminate called after throwing an instance of 'std::runtime_error' what(): Connecting to TCP store timed out. Aborted (core dumped) ``` ghstack-source-id: 90480086 Test Plan: See above. Differential Revision: D17430164 fbshipit-source-id: 1482aca72fcc3ddb95ea25649ec057edda5d1934

Summary: Pull Request resolved: pytorch#26515 Fix patterns of `prepack` and `permute` after recent changes to `quantized::conv2d` and `quantized::conv2d_prepack` Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17502573 fbshipit-source-id: 1a719fd610e8ea9dc16075abaa042556e1edbceb

Summary: If the `Union` contains a non-class type, `issubclass` would fail, this adds a check for that case Pull Request resolved: pytorch#26312 Pulled By: driazati Differential Revision: D17486465 fbshipit-source-id: c513cef3bbc038f15c021eb0c1bf36be0df1eb90

Summary: When used as annotations on Python functions, `NamedTuple`s go through our Python annotation -> type mapping which previously had no way of lookup up `NamedTuple`s (which are created lazily by checking if the type has certain properties, so the lookup is creating the `TupleType` from scratch). This PR threads through the necessary data to make them work. Fixes pytorch#26437 Pull Request resolved: pytorch#26443 Pulled By: driazati Differential Revision: D17486441 fbshipit-source-id: a6bbb543ff05a5abe61f1a7f68db9ecdb652b358

Summary: With this PR, we establish the following conventions: 1. Options in C++ module / optimizer constructors should always be `const SomeOptions&` type, not `SomeOptions` type. 2. The options constructor arg should always be named `options_`, not `options`, to not be confused with the module / optimizer's internal field `options`. 3. We never use `std::move` to assign `options_` to the module / optimizer's internal field `options` in the constructor definition. Instead, we simply use `options(options_)`. Here is the reasoning: We might be tempted to declare the constructor as `SomeModule(SomeOptions options_)` and have `options(std::move(options_))` in the member initialization list. However, this can be a dangerous design because the constructor might use `options_` to set values for other member fields in the member initialization list (e.g. https://github.com/pytorch/pytorch/blob/8317f75b79fb78ceeeb928aa23a901d57274b9e1/torch/csrc/api/include/torch/optim/lbfgs.h#L30-L34), and use-after-move can cause hard-to-debug problems. Instead, we choose to explicitly use `const SomeOptions&` type for `options_`, and never use `std::move` to assign it to the internal `options` field. This way we have stronger guarantee on the validity of `options_` at any point in the constructor. Notable exceptions to the above conventions: 1. C++ Embedding module doesn't adhere to the conventions now, which will be fixed after pytorch#26358 is landed. 2. C++ dataloader and dataset classes likely need similar changes. We will do it when we start to work on dataloader/dataset parity. Thanks ShahriarSS for discovering the options usage inconsistency! 🚀 Pull Request resolved: pytorch#26483 Differential Revision: D17500451 Pulled By: yf225 fbshipit-source-id: 49361a3519e4ede933789db75731d40144f0b617

Summary: Pull Request resolved: pytorch#26479 This PR doesn't delete the code for them yet because it takes some effort to determine what to delete. I will send a followup PR fully deleting tagged names, but this PR disables their creation. Test Plan: - [namedtensor ci] Differential Revision: D17484758 Pulled By: zou3519 fbshipit-source-id: 451409e36eac98ffee1b98884d0f675bb5d46c9d

Summary: Pull Request resolved: pytorch#26365 Test Plan: - [namedtensor ci] Differential Revision: D17484759 Pulled By: zou3519 fbshipit-source-id: 44068c1e9d84adf36c5ab5e7006a153b948914d6

Summary: Pull Request resolved: pytorch#26366 Changes: - `NameType::NORMAL` -> `NameType::BASIC` - `Dimname::is_wildcard` -> `Dimname::isWildcard()` - `Dimname::is_normal` -> `Dimname::isBasic()`. - `at::is_valid_identifier` -> `Dimname::isValidName(string)` - `at::match`, `at::unify` are now methods on `Dimname`. I am adopting CamelCase for struct members of a named tensor related struct. Test Plan: - [namedtensor ci] Differential Revision: D17484757 Pulled By: zou3519 fbshipit-source-id: 21c128e5025e81513e14d34506a7d7744caefdc2

Summary: Pull Request resolved: pytorch#26217 Test Plan: Imported from OSS Differential Revision: D17427577 Pulled By: pbelevich fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2

Test Plan: revert-hammer Differential Revision: D17486465 Original commit changeset: c513cef3bbc0 fbshipit-source-id: 567311c001d7dd0b7ab9ffe8bb894954bea583c9

Test Plan: revert-hammer Differential Revision: D17427577 Original commit changeset: e9b3e76ca44d fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b

iotamudelta

LG, all the required tests pass.

This reverts commit 02476d2.

ShahriarSS and others added 30 commits September 11, 2019 16:39

C++ Average Pool Module (pytorch#25800)

28a2daf

Summary: This PR adds Average Pool module to C++ front-end. Pull Request resolved: pytorch#25800 Differential Revision: D17318094 Pulled By: yf225 fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db

Enable more mGPU tests (pytorch#26055)

5376ee5

Summary: Enable mGPU tests that pass on ROCm as of 2.7. Pull Request resolved: pytorch#26055 Differential Revision: D17331484 Pulled By: bddppq fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c

TorchScript Serialization for dynamic LSTM module

e303961

Summary: Pull Request resolved: pytorch#25877 Test Plan: Imported from OSS Reviewed By: jianyuh Differential Revision: D17275746 Pulled By: jamesr66a fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8

Upgrade the naming for fbgemm quantized op (pytorch#26064)

abb7e13

Summary: Pull Request resolved: pytorch#26064 Just changing the names after pytorch#25678. ghstack-source-id: 89944542 Test Plan: CI Differential Revision: D17332068 fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7

Make regular softmax warp size aware (pytorch#25956)

a996b1d

Summary: Enable one unit test that passes now. Pull Request resolved: pytorch#25956 Differential Revision: D17298150 Pulled By: bddppq fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4

TorchScript Serialization for dynamic LSTM

bdc656d

Summary: Pull Request resolved: pytorch#26084 Test Plan: Imported from OSS Differential Revision: D17339315 Pulled By: jamesr66a fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c

Skip test_triangular_solve_batched (pytorch#26108)

f91fbf9

Summary: cc: gchanan zou3519 I will look into why this is failing spuriously. Pull Request resolved: pytorch#26108 Differential Revision: D17348399 Pulled By: zou3519 fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3

make sure all out stringstreams start out empty in jit_log.hpp

f928994

Summary: Pull Request resolved: pytorch#25863 Differential Revision: D17347386 Pulled By: Krovatkin fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875

Stop reordering TH random function arguments.

f9a8b8a

Summary: Pull Request resolved: pytorch#25608 Test Plan: Imported from OSS Differential Revision: D17172494 Pulled By: gchanan fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a

Elias Ellison and others added 25 commits September 19, 2019 15:46

Add NoQEngine to QEngine and refactor the name of set/get qengine (py…

8f50ea0

…torch#26471) Summary: Pull Request resolved: pytorch#26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9

Distance module (pytorch#26424)

872ca91

Summary: Adds `Distance` module parity. pytorch#25883 Pull Request resolved: pytorch#26424 Differential Revision: D17487314 Pulled By: yf225 fbshipit-source-id: c7d124cb4afb08a4733e7212af0bb276bf32d172

Delete tagged names

6703587

Summary: Pull Request resolved: pytorch#26365 Test Plan: - [namedtensor ci] Differential Revision: D17484759 Pulled By: zou3519 fbshipit-source-id: 44068c1e9d84adf36c5ab5e7006a153b948914d6

C++ API parity: at::Tensor::version

1985219

Summary: Pull Request resolved: pytorch#26217 Test Plan: Imported from OSS Differential Revision: D17427577 Pulled By: pbelevich fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2

Revert D17486465: [jit] Make is_optional check more robust

b59e856

Test Plan: revert-hammer Differential Revision: D17486465 Original commit changeset: c513cef3bbc0 fbshipit-source-id: 567311c001d7dd0b7ab9ffe8bb894954bea583c9

Revert D17427577: C++ API parity: at::Tensor::version

a5bcde9

Test Plan: revert-hammer Differential Revision: D17427577 Original commit changeset: e9b3e76ca44d fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b

Merge branch 'bf16_bringup' into rn/up-master

90d58f2

iotamudelta approved these changes Sep 21, 2019

View reviewed changes

rohithkrn merged commit 02476d2 into ROCm:bf16_bringup Sep 21, 2019

rohithkrn added a commit that referenced this pull request Sep 30, 2019

Revert "Merge upstream master (#477)"

512375c

This reverts commit 02476d2.

rohithkrn added a commit that referenced this pull request Oct 1, 2019

Revert "Merge upstream master (#477)" (#484)

8f2dd06

This reverts commit 02476d2.

rohithkrn deleted the rn/up-master branch October 3, 2019 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream master #477

Merge upstream master #477

rohithkrn commented Sep 20, 2019

iotamudelta left a comment

Merge upstream master #477

Merge upstream master #477

Conversation

rohithkrn commented Sep 20, 2019

iotamudelta left a comment

Choose a reason for hiding this comment