forked from pytorch/pytorch
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge upstream master -> bf16 branch #501
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…) (pytorch#27177) Summary: Pull Request resolved: pytorch#27177 Add support for F::one_hot C++ function. Test Plan: Added 3 new tests to verify API is working Imported from OSS Differential Revision: D17697934 fbshipit-source-id: a8127fb87c00daa119bb92a5702bc4bbba48290d
Summary: Pull Request resolved: pytorch#27189 Conceptually, Module is just a view over ClassType and ivalue::object. register_ methods are the only methods that are exception from this: they provide an API not available on ClassType or object directly. This PR ports this API to ClassType and makes Module truly just a view over those two. Test Plan: Imported from OSS Differential Revision: D17703533 Pulled By: ZolotukhinM fbshipit-source-id: 2cdb9fb486b3fb8527986483c7f34be7bd59fabf
Summary: experimental ops doesn't provide bc guarantee. Pull Request resolved: pytorch#27235 Reviewed By: hl475 Differential Revision: D17723292 Pulled By: houseroad fbshipit-source-id: 644ae34d130418a810e0f9d802fa25f6e34c5ccf
Summary: Pull Request resolved: pytorch#27194 Test Plan: Imported from OSS Differential Revision: D17704957 Pulled By: zafartahirov fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
Summary: Pull Request resolved: pytorch#27181 Test Plan: Imported from OSS Differential Revision: D17717482 Pulled By: jamesr66a fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
Summary: Pull Request resolved: pytorch#27184 Test Plan: Imported from OSS Differential Revision: D17717481 Pulled By: jamesr66a fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
Summary: Pull Request resolved: pytorch#27191 skip rpc and distautograd spawns tests for <python 3.6 ghstack-source-id: 91231565 close pytorch#27157 Test Plan: unit tests Differential Revision: D17697368 fbshipit-source-id: bb8cf1f47de41f9d350fd60afe37fece293d8680
…rch#25527) Summary: Pull Request resolved: pytorch#25527 Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. ghstack-source-id: 91240466 Test Plan: unit tests. Differential Revision: D17148077 fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
Summary: Pull Request resolved: pytorch#27219 Test Plan: Imported from OSS Differential Revision: D17715306 Pulled By: albanD fbshipit-source-id: d11a7634dbee6a885c7177b240958e5aed2544f3
Summary: Pull Request resolved: pytorch#27220 Test Plan: Imported from OSS Differential Revision: D17715305 Pulled By: albanD fbshipit-source-id: 574704ad23ece6da7aa2780b78867307bef523cc
Summary: Move the resolution of conflict between `USE_CUDA` and `USE_ROCM` to CMake as to effectuate: - `USE_CUDA=ON` and CUDA is found, `USE_ROCM=ON` and ROCM is found --> fatal error - Either `USE_CUDA=ON` and CUDA is found or `USE_ROCM=ON` and ROCM is found --> The respective GPU feature is ON - Otherwise no GPU support Pull Request resolved: pytorch#26910 Differential Revision: D17738652 Pulled By: ezyang fbshipit-source-id: 8e07cc7e922e0abda24a6518119c28952276064e
…ytorch#27277) Summary: This reverts commit 0cd1880. As reported by jerryzh168 and pritamdamania87, mpark::variant doesn’t compile with gcc 7.3.1 on fb devserver and throws error similar to mpark/variant#43. (However, it doesn’t fail with gcc 7.3.1 in OSS CI, based on https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2995606/output/107/0?file=true) A plausible workaround is to upgrade devserver to devtoolset-8, but that would in turn causes CUDA build to complain: ``` /usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported! #error -- unsupported GNU version! gcc versions later than 7 are not supported! ``` (Thanks pritamdamania87 for the report!) The solution for now is to revert the mpark::variant addition, and I will find alternatives that will work with gcc 7.3.1 on fb devserver. Pull Request resolved: pytorch#27277 Differential Revision: D17739804 fbshipit-source-id: ad945b3d86ab7ddbff58f4ecab95e0e1ac725ae9
…ortance (pytorch#26376) Summary: Pull Request resolved: pytorch#26376 * Create the new dense_feature_reg (FCInputLpNorm) for feature importance to be applied to the fully-connected layer for feature-importance. Test Plan: * Unit test located in: `caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test.py` Reviewed By: un-disclosed Differential Revision: D17360361 fbshipit-source-id: 1a0e119eeb17199a13dfffe58b3036ea4255e301
Summary: Pull Request resolved: pytorch#27293 This doesn't turn on 3.5 signal, but it makes it so that [test all] will include it if you do request it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17738741 Pulled By: ezyang fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148
Summary: Pull Request resolved: pytorch#26909 Differential Revision: D17683632 Pulled By: Krovatkin fbshipit-source-id: 5d36c3c4cf7411c56485ef19fe59262b9f8b45b2
…mprehension Summary: Pull Request resolved: pytorch#27261 Differential Revision: D17740159 Pulled By: Krovatkin fbshipit-source-id: 90439282aea14d8634eb41ffece5b6320d615fa7
Summary: Pull Request resolved: pytorch#27164 Test Plan: Imported from OSS Differential Revision: D17694475 Pulled By: zafartahirov fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
Summary: Pull Request resolved: pytorch#27106 Adds memory_format option to the `clone` operator. Introduce new `clone` behavior if used with `input_t.clone(memory_format=torch.preserve_format)`: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17699357 Pulled By: VitalyFedyunin fbshipit-source-id: 5ae1537c2aca1abf0bf1eec4416846129c156f66
Summary: Pull Request resolved: pytorch#27149 Extract version to version.txt and add reading version logic to setup.py and fb/torch_version.py ghstack-source-id: 91271883 Test Plan: N/A Reviewed By: gchanan, ezyang Differential Revision: D17689307 fbshipit-source-id: 21899502027cec71b63d9dc151e09ff5ff3f279d
…pytorch#27274) Summary: Pull Request resolved: pytorch#27274 This is yet another fix to address pytorch#26764. PR pytorch#26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place thus it's most logically sound place to do such tweaks. However, we observed nontrivial perf regression due to this fix. Turns out the numel() tensor method gets called in several for-loops thus incurs ~7M thread_local updates in a single forward call: ``` 7173330 numel 558 size 416 q_scale 302 _empty_affine_quantized 288 contiguous 257 q_zero_point 216 qscheme 173 empty 110 set_ 105 as_strided 104 permute ... ``` As numel() is not called from a single place so a natural workaround is to update function_wrapper.py so that it only adds the guard on gen_namespace_function() case and ignore the gen_tensor_method() case. But some tensor methods are actually being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the only "band aid" left on the table is to insert guard on JIT->aten path as originally did on pytorch#26868 - this is a simplified version of it as it doesn't hurt to extend the NonVariableMode scope a little bit to also cover stack drop/pack calls. On Android we only expose JIT API so we don't need worry about TensorMethods being called directly. On iOS we don't provide a wrapper yet but we can mention this caveat in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor unification and remove all these hacks. Test Plan: - Verified it runs quantized/fp32 MobileNetV2 models; - Verified it fixes the perf regression (revert pytorch#26908 separately); Differential Revision: D17732489 Pulled By: ljk53 fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5
Summary: GitHub commits: pytorch/FBGEMM@8786c08 Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9c04a2ba7cc2166db0203f186ece261ca8b186dd
Summary: Pull Request resolved: pytorch#27298 PR pytorch#26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place. This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined. Also the thread local state is expensive to read/write so many times and this kills perf. PR pytorch#27274 is another approach to fix this and has more details. Test Plan: Quantized mobilenetV2 perf before this change Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696 Perf after this change Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267 Imported from OSS Differential Revision: D17742565 fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff
Summary: Pull Request resolved: pytorch#27307 Test Plan: Imported from OSS Differential Revision: D17746444 Pulled By: xta0 fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39
Summary: Update the link for iOS demo app in README.md Pull Request resolved: pytorch#27145 Differential Revision: D17746591 Pulled By: xta0 fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481
Summary: Currently we use CPU_tensor_apply1 to loop through the tensor in single thread and aggregate data: ``` // compute variance per input accscalar_t var_sum = 0; CPU_tensor_apply1<scalar_t>(in, [&] (const scalar_t& i) { var_sum += (i - mean) * (i - mean); }); ``` and we don't have the ability to use TensorIterator for this. ``` accscalar_t var_sum = 0; auto iter = TensorIterator::unary_op(self, self); cpu_serial_kernel(iter, [&](scalar_t i) -> scalar_t { var_sum += (i - mean) * (i - mean); return a; //Unable to set value back, because self should be const }); ``` This PR should resolve this problem and allow to use void-lambda: ``` auto iter = at::TensorIterator(); iter.add_input(in); iter.build(); accscalar_t var_sum = 0; \ at::native::cpu_serial_kernel(iter, [&](scalar_t i) -> void { var_sum += (i - mean) * (i - mean); }); ``` In the future it make sense to change Reduction part and allow to reduce to a scalar, not just to a tensor Pull Request resolved: pytorch#27271 Differential Revision: D17743310 Pulled By: ifedan fbshipit-source-id: a149751f2d671aefd3ed84bd50b2c0543a63b701
Summary: Pull Request resolved: pytorch#26733 Close pytorch#24587 Test Plan: Imported from OSS Differential Revision: D17606981 Pulled By: VitalyFedyunin fbshipit-source-id: 732f07b981287da3ca235b272b7b6f78144f8ebe
Summary: There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0. Pull Request resolved: pytorch#27325 Differential Revision: D17749535 Pulled By: soumith fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200
Summary: Pull Request resolved: pytorch#27314 Test Plan: Imported from OSS Differential Revision: D17746371 Pulled By: pbelevich fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e
Summary: Pull Request resolved: pytorch#27318 Fix TBB build USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake Test Plan: Imported from OSS Differential Revision: D17747449 Pulled By: ilia-cher fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4
Summary: Pull Request resolved: pytorch#27190 Allow set_num_threads to be called multiple times in case of TBB parallel backend Test Plan: BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake ./build/bin/test_parallel ./build/bin/thread_init_test Reviewed By: kostmo Differential Revision: D17704236 Pulled By: ilia-cher fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e
…gs (pytorch#26989) Summary: We do support inputs with dim > 2 in _out variants Pull Request resolved: pytorch#26989 Differential Revision: D17785632 Pulled By: soumith fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001
Summary: Pull Request resolved: pytorch#27410 Similar to pytorch#25005, TSAN is not safe to use in a multi-threaded program with fork and can cause deadlocks. As a result, disabling this test for TSAN. ghstack-source-id: 91393545 Test Plan: buildbot Differential Revision: D17775141 fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a
Summary: ONNX export for Unfold in symbolic opset9 + op and ORT tests Pull Request resolved: pytorch#24970 Reviewed By: hl475 Differential Revision: D17495106 Pulled By: houseroad fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50
Summary: Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling. Fixes pytorch#26884 ](https://our.intern.facebook.com/intern/diff/17728129/) Pull Request resolved: pytorch#27109 Pulled By: driazati Differential Revision: D17728129 fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57
…boardX Summary: Pull Request resolved: pytorch#27252 Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj Reviewed By: sanekmelnikov Differential Revision: D17723414 fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065
…eb1d16 (pytorch#27474) Summary: Pull Request resolved: pytorch#27474 Previous import was 034921bd574cc84906b7996c07873454b7dd4135 Included changes: - **[2891e145](onnx/onnx@2891e145)**: Fix Unique unit test (pytorch#2381) <Scott McKay> - **[25cf73e5](onnx/onnx@25cf73e5)**: update shapeInference h file link (pytorch#2369) <prcvih> - **[e3074bc0](onnx/onnx@e3074bc0)**: modify file path (pytorch#2378) <prcvih> - **[9058d3a4](onnx/onnx@9058d3a4)**: Incrementing version number to 1.6.0 (pytorch#2353) (pytorch#2385) <Kevin Chen> - **[c963586d](onnx/onnx@c963586d)**: Remove typing packages from test requirements (pytorch#2375) <Aiken Cairncross> Test Plan: ci Reviewed By: bddppq Differential Revision: D17791527 fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc
…rch#25273) Summary: Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select. Pull Request resolved: pytorch#25273 Reviewed By: hl475 Differential Revision: D17159707 Pulled By: houseroad fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76
…ytorch#27486) Summary: Pull Request resolved: pytorch#27486 Rename `key` argument of `single_round` method to `in_key` Test Plan: CI Reviewed By: stepancheg, soumith Differential Revision: D17782904 fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603
…ytorch#27453) Summary: All of the test cases move into a base class that is extended by the intrumentation test and a new "HostTests" class that can be run in normal Java. (Some changes to the build script and dependencies are required before the host test can actually run.) ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e Pull Request resolved: pytorch#27453 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800410 fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
Summary: Pull Request resolved: pytorch#27454 See detailed discussion at pytorch#27350 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800480 Pulled By: dreiss fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
Summary: Pull Request resolved: pytorch#27455 Test Plan: Imported from OSS Differential Revision: D17800658 Pulled By: dreiss fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
Summary: Resolving issue pytorch#26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to. Pull Request resolved: pytorch#27498 Differential Revision: D17801918 Pulled By: ezyang fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06
…#27425) Summary: Similar to pytorch#27418 but try to put it under "torch" namespace Pull Request resolved: pytorch#27425 Differential Revision: D17779490 Pulled By: bddppq fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b
…ytorch#27124) Summary: Pull Request resolved: pytorch#27124 ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL 2.4 to detect errors in NCCL communicators. These were used as part of ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were supported. Although, there is still legitimate use for older NCCL versions and hence we should still support those. For that purpose, in this change I've ensured we disable NCCL error checking for versions < 2.4. ghstack-source-id: 91452959 Test Plan: 1) Test with 2.4.8 2) Test with 2.2.13 3) unit tests. Differential Revision: D17178988 fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698
Summary: Fixing pytorch#27266 In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file. Pull Request resolved: pytorch#27478 Differential Revision: D17799522 Pulled By: pbelevich fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0
Summary: Pull Request resolved: pytorch#27511 Test Plan: Imported from OSS Differential Revision: D17801185 Pulled By: jamesr66a fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a
Summary: GitHub commits: facebook/fbthrift@e80ecd1 facebook/proxygen@6c7a36b facebook/mvfst@8750462 facebook/proxygen@442d7de facebook/wangle@c138dc3 facebookincubator/fizz@3833f10 facebookincubator/katran@6fc473d pytorch/FBGEMM@82d259d Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7834a4a8620d0ab9b60060e0abadfba457fb2890
…t slice when index = negative one Test Plan: revert-hammer Differential Revision: D17159707 Original commit changeset: 2c3b27542108 fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01
Summary: Pull Request resolved: pytorch#27374 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17809770 Pulled By: ezyang fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8
Summary: Pull Request resolved: pytorch#27508 Implemented a simple exponential decay of the weight of lr loss function, with a lower bound. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308 canary: f140103452 Reviewed By: chenshouyuan Differential Revision: D17524101 fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
…erver.py Summary: Pull Request resolved: pytorch#27415 Reviewed By: zafartahirov Differential Revision: D17783101 Pulled By: gottbrath fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f
…rnel. (pytorch#26908)" (pytorch#27283) Summary: Pull Request resolved: pytorch#27283 This reverts commit 9159a60. Test Plan: Imported from OSS Differential Revision: D17738167 Pulled By: ezyang fbshipit-source-id: cc4048d553017409279603590833d1529f59048c
Summary: Fixes pytorch#27443 Pull Request resolved: pytorch#27465 Differential Revision: D17810732 Pulled By: pietern fbshipit-source-id: b8a62dd086a4f4a61c9aa6acfa495cf822995604
…DoubleTensor) (pytorch#27444) Summary: This PR stop common_utils.py from setting the default tensor type when it's imported. See issue pytorch#27355. This is a frequent source of confusion for test writers. Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are: - test_autograd.py - test_distributions.py - test_jit.py - test_nn.py This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting. Notable technical changes in this PR are: - Significant updates to test_torch.py to make it pass without setting the default floating dtype globally. - The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously. - test_torch-specific parts of common_utils were refactored into test_torch. - tensor creation methods in common_utils were updated to accept an optional dtype and device. Pull Request resolved: pytorch#27444 Differential Revision: D17795235 Pulled By: mruberry fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1
@pytorchbot retest this please |
ROCm tests passed. Merging. |
rohithkrn
added a commit
that referenced
this pull request
Oct 17, 2019
This reverts commit 416732c.
rohithkrn
added a commit
that referenced
this pull request
Oct 17, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.