Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream master -> bf16 branch #501

Merged
merged 113 commits into from Oct 9, 2019

Conversation

rohithkrn
Copy link

No description provided.

bzinodev and others added 30 commits October 2, 2019 17:28
…) (pytorch#27177)

Summary:
Pull Request resolved: pytorch#27177

Add support for F::one_hot C++ function.

Test Plan:
Added 3 new tests to verify API is working

Imported from OSS

Differential Revision: D17697934

fbshipit-source-id: a8127fb87c00daa119bb92a5702bc4bbba48290d
Summary:
Pull Request resolved: pytorch#27189

Conceptually, Module is just a view over ClassType and ivalue::object.
register_ methods are the only methods that are exception from this:
they provide an API not available on ClassType or object directly. This
PR ports this API to ClassType and makes Module truly just a view over
those two.

Test Plan: Imported from OSS

Differential Revision: D17703533

Pulled By: ZolotukhinM

fbshipit-source-id: 2cdb9fb486b3fb8527986483c7f34be7bd59fabf
Summary:
experimental ops doesn't provide bc guarantee.
Pull Request resolved: pytorch#27235

Reviewed By: hl475

Differential Revision: D17723292

Pulled By: houseroad

fbshipit-source-id: 644ae34d130418a810e0f9d802fa25f6e34c5ccf
Summary: Pull Request resolved: pytorch#27194

Test Plan: Imported from OSS

Differential Revision: D17704957

Pulled By: zafartahirov

fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
Summary: Pull Request resolved: pytorch#27181

Test Plan: Imported from OSS

Differential Revision: D17717482

Pulled By: jamesr66a

fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
Summary: Pull Request resolved: pytorch#27184

Test Plan: Imported from OSS

Differential Revision: D17717481

Pulled By: jamesr66a

fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
Summary:
Pull Request resolved: pytorch#27191

skip rpc and distautograd spawns tests for <python 3.6
ghstack-source-id: 91231565

close pytorch#27157

Test Plan: unit tests

Differential Revision: D17697368

fbshipit-source-id: bb8cf1f47de41f9d350fd60afe37fece293d8680
…rch#25527)

Summary:
Pull Request resolved: pytorch#25527

Master GH issue: pytorch#23110.

This change builds upon pytorch#24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.

Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466

Test Plan: unit tests.

Differential Revision: D17148077

fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
Summary: Pull Request resolved: pytorch#27219

Test Plan: Imported from OSS

Differential Revision: D17715306

Pulled By: albanD

fbshipit-source-id: d11a7634dbee6a885c7177b240958e5aed2544f3
Summary: Pull Request resolved: pytorch#27220

Test Plan: Imported from OSS

Differential Revision: D17715305

Pulled By: albanD

fbshipit-source-id: 574704ad23ece6da7aa2780b78867307bef523cc
Summary:
Move the resolution of conflict between `USE_CUDA` and `USE_ROCM` to CMake as to effectuate:

- `USE_CUDA=ON` and CUDA is found, `USE_ROCM=ON` and ROCM is found --> fatal error
- Either `USE_CUDA=ON` and CUDA is found or `USE_ROCM=ON` and ROCM is found --> The respective GPU feature is ON
- Otherwise no GPU support
Pull Request resolved: pytorch#26910

Differential Revision: D17738652

Pulled By: ezyang

fbshipit-source-id: 8e07cc7e922e0abda24a6518119c28952276064e
…ytorch#27277)

Summary:
This reverts commit 0cd1880.

As reported by jerryzh168 and pritamdamania87, mpark::variant doesn’t compile with gcc 7.3.1 on fb devserver and throws error similar to mpark/variant#43. (However, it doesn’t fail with gcc 7.3.1 in OSS CI, based on https://circleci.com/api/v1.1/project/github/pytorch/pytorch/2995606/output/107/0?file=true)
A plausible workaround is to upgrade devserver to devtoolset-8, but that would in turn causes CUDA build to complain:
```
/usr/local/cuda/bin/../targets/x86_64-linux/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 7 are not supported!
 #error -- unsupported GNU version! gcc versions later than 7 are not supported!
```
(Thanks pritamdamania87 for the report!)

The solution for now is to revert the mpark::variant addition, and I will find alternatives that will work with gcc 7.3.1 on fb devserver.
Pull Request resolved: pytorch#27277

Differential Revision: D17739804

fbshipit-source-id: ad945b3d86ab7ddbff58f4ecab95e0e1ac725ae9
…ortance (pytorch#26376)

Summary:
Pull Request resolved: pytorch#26376

* Create the new dense_feature_reg (FCInputLpNorm) for feature importance to be applied to the fully-connected layer for feature-importance.

Test Plan: * Unit test located in: `caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test.py`

Reviewed By: un-disclosed

Differential Revision: D17360361

fbshipit-source-id: 1a0e119eeb17199a13dfffe58b3036ea4255e301
Summary:
Pull Request resolved: pytorch#27293

This doesn't turn on 3.5 signal, but it makes it so that [test all]
will include it if you do request it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17738741

Pulled By: ezyang

fbshipit-source-id: 2b1af4d7bf26fd84a593fde292d6bfa2aabc1148
Summary: Pull Request resolved: pytorch#26909

Differential Revision: D17683632

Pulled By: Krovatkin

fbshipit-source-id: 5d36c3c4cf7411c56485ef19fe59262b9f8b45b2
…mprehension

Summary: Pull Request resolved: pytorch#27261

Differential Revision: D17740159

Pulled By: Krovatkin

fbshipit-source-id: 90439282aea14d8634eb41ffece5b6320d615fa7
Summary: Pull Request resolved: pytorch#27164

Test Plan: Imported from OSS

Differential Revision: D17694475

Pulled By: zafartahirov

fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
Summary:
Pull Request resolved: pytorch#27106

Adds memory_format option to the `clone` operator.

Introduce new `clone` behavior if used with `input_t.clone(memory_format=torch.preserve_format)`:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17699357

Pulled By: VitalyFedyunin

fbshipit-source-id: 5ae1537c2aca1abf0bf1eec4416846129c156f66
Summary:
Pull Request resolved: pytorch#27149

Extract version to version.txt and add reading version logic to setup.py and fb/torch_version.py
ghstack-source-id: 91271883

Test Plan: N/A

Reviewed By: gchanan, ezyang

Differential Revision: D17689307

fbshipit-source-id: 21899502027cec71b63d9dc151e09ff5ff3f279d
…pytorch#27274)

Summary:
Pull Request resolved: pytorch#27274

This is yet another fix to address pytorch#26764.

PR pytorch#26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place thus it's most logically sound place to do
such tweaks.

However, we observed nontrivial perf regression due to this fix. Turns out
the numel() tensor method gets called in several for-loops thus incurs ~7M
thread_local updates in a single forward call:
```
7173330 numel
    558 size
    416 q_scale
    302 _empty_affine_quantized
    288 contiguous
    257 q_zero_point
    216 qscheme
    173 empty
    110 set_
    105 as_strided
    104 permute
...
```

As numel() is not called from a single place so a natural workaround is to
update function_wrapper.py so that it only adds the guard on gen_namespace_function()
case and ignore the gen_tensor_method() case. But some tensor methods are actually
being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the
only "band aid" left on the table is to insert guard on JIT->aten path as originally
did on pytorch#26868 - this is a simplified version of it as it doesn't hurt to extend the
NonVariableMode scope a little bit to also cover stack drop/pack calls.

On Android we only expose JIT API so we don't need worry about TensorMethods being
called directly. On iOS we don't provide a wrapper yet but we can mention this caveat
in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor
unification and remove all these hacks.

Test Plan:
- Verified it runs quantized/fp32 MobileNetV2 models;
- Verified it fixes the perf regression (revert pytorch#26908 separately);

Differential Revision: D17732489

Pulled By: ljk53

fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5
Summary:
GitHub commits:

pytorch/FBGEMM@8786c08

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 9c04a2ba7cc2166db0203f186ece261ca8b186dd
Summary:
Pull Request resolved: pytorch#27298

PR pytorch#26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place.
This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined.
Also the thread local state is expensive to read/write so many times and this kills perf.

PR pytorch#27274 is another approach to fix this and has more details.

Test Plan:
Quantized mobilenetV2 perf before this change
Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696

Perf after this change
Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267

Imported from OSS

Differential Revision: D17742565

fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff
Summary: Pull Request resolved: pytorch#27307

Test Plan: Imported from OSS

Differential Revision: D17746444

Pulled By: xta0

fbshipit-source-id: ed37f91921f1ea7db6c63ba69f04883856341c39
Summary:
Update the link for iOS demo app in README.md
Pull Request resolved: pytorch#27145

Differential Revision: D17746591

Pulled By: xta0

fbshipit-source-id: 6f49a0daddc8b79804e1b8487ba1db3807a3f481
Summary:
Currently we use CPU_tensor_apply1 to loop through the tensor in single thread and aggregate data:
```
// compute variance per input
 accscalar_t var_sum = 0;
 CPU_tensor_apply1<scalar_t>(in, [&] (const scalar_t& i) {
    var_sum += (i - mean) * (i - mean);
 });
```
and we don't have the ability to use TensorIterator for this.

```
accscalar_t var_sum = 0;
auto iter = TensorIterator::unary_op(self, self);
  cpu_serial_kernel(iter, [&](scalar_t i) -> scalar_t {
        var_sum += (i - mean) * (i - mean);
  return a; //Unable to set value back, because self should be const
});
```

This PR should resolve this problem and allow to use void-lambda:
```
auto iter = at::TensorIterator();
iter.add_input(in);
iter.build();
accscalar_t var_sum = 0;                                                            \
at::native::cpu_serial_kernel(iter, [&](scalar_t i) -> void {
   var_sum += (i - mean) * (i - mean);
});
```

In the future it make sense to change Reduction part and allow to reduce to a scalar, not just to a tensor
Pull Request resolved: pytorch#27271

Differential Revision: D17743310

Pulled By: ifedan

fbshipit-source-id: a149751f2d671aefd3ed84bd50b2c0543a63b701
Summary:
Pull Request resolved: pytorch#26733

Close pytorch#24587

Test Plan: Imported from OSS

Differential Revision: D17606981

Pulled By: VitalyFedyunin

fbshipit-source-id: 732f07b981287da3ca235b272b7b6f78144f8ebe
Summary:
There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0.
Pull Request resolved: pytorch#27325

Differential Revision: D17749535

Pulled By: soumith

fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200
Summary: Pull Request resolved: pytorch#27314

Test Plan: Imported from OSS

Differential Revision: D17746371

Pulled By: pbelevich

fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e
Summary:
Pull Request resolved: pytorch#27318

Fix TBB build
USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake

Test Plan: Imported from OSS

Differential Revision: D17747449

Pulled By: ilia-cher

fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4
Summary:
Pull Request resolved: pytorch#27190

Allow set_num_threads to be called multiple times in case of TBB
parallel backend

Test Plan:
BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop
install  --cmake
./build/bin/test_parallel
./build/bin/thread_init_test

Reviewed By: kostmo

Differential Revision: D17704236

Pulled By: ilia-cher

fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e
vishwakftw and others added 24 commits October 6, 2019 23:28
…gs (pytorch#26989)

Summary:
We do support inputs with dim > 2 in _out variants
Pull Request resolved: pytorch#26989

Differential Revision: D17785632

Pulled By: soumith

fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001
Summary:
Pull Request resolved: pytorch#27410

Similar to pytorch#25005, TSAN is not
safe to use in a multi-threaded program with fork and can cause deadlocks. As a
result, disabling this test for TSAN.
ghstack-source-id: 91393545

Test Plan: buildbot

Differential Revision: D17775141

fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a
Summary:
ONNX export for Unfold in symbolic opset9 + op and ORT tests
Pull Request resolved: pytorch#24970

Reviewed By: hl475

Differential Revision: D17495106

Pulled By: houseroad

fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50
Summary:
Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling.

Fixes pytorch#26884
](https://our.intern.facebook.com/intern/diff/17728129/)
Pull Request resolved: pytorch#27109

Pulled By: driazati

Differential Revision: D17728129

fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57
…boardX

Summary: Pull Request resolved: pytorch#27252

Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj

Reviewed By: sanekmelnikov

Differential Revision: D17723414

fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065
…eb1d16 (pytorch#27474)

Summary:
Pull Request resolved: pytorch#27474

Previous import was 034921bd574cc84906b7996c07873454b7dd4135

Included changes:
- **[2891e145](onnx/onnx@2891e145)**: Fix Unique unit test (pytorch#2381) <Scott McKay>
- **[25cf73e5](onnx/onnx@25cf73e5)**: update shapeInference h file link (pytorch#2369) <prcvih>
- **[e3074bc0](onnx/onnx@e3074bc0)**: modify file path (pytorch#2378) <prcvih>
- **[9058d3a4](onnx/onnx@9058d3a4)**: Incrementing version number to 1.6.0 (pytorch#2353) (pytorch#2385) <Kevin Chen>
- **[c963586d](onnx/onnx@c963586d)**: Remove typing packages from test requirements (pytorch#2375) <Aiken Cairncross>

Test Plan: ci

Reviewed By: bddppq

Differential Revision: D17791527

fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc
…rch#25273)

Summary:
Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select.
Pull Request resolved: pytorch#25273

Reviewed By: hl475

Differential Revision: D17159707

Pulled By: houseroad

fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76
…ytorch#27486)

Summary:
Pull Request resolved: pytorch#27486

Rename `key` argument of `single_round` method to `in_key`

Test Plan: CI

Reviewed By: stepancheg, soumith

Differential Revision: D17782904

fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603
…ytorch#27453)

Summary:
All of the test cases move into a base class that is extended by the
intrumentation test and a new "HostTests" class that can be run in
normal Java.  (Some changes to the build script and dependencies are
required before the host test can actually run.)

ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e
Pull Request resolved: pytorch#27453

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800410

fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
Summary:
Pull Request resolved: pytorch#27454

See detailed discussion at
pytorch#27350

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800480

Pulled By: dreiss

fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
Summary: Pull Request resolved: pytorch#27455

Test Plan: Imported from OSS

Differential Revision: D17800658

Pulled By: dreiss

fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
Summary:
Resolving issue pytorch#26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to.
Pull Request resolved: pytorch#27498

Differential Revision: D17801918

Pulled By: ezyang

fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06
…#27425)

Summary:
Similar to pytorch#27418 but try to put it under "torch" namespace
Pull Request resolved: pytorch#27425

Differential Revision: D17779490

Pulled By: bddppq

fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b
…ytorch#27124)

Summary:
Pull Request resolved: pytorch#27124

ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL
2.4 to detect errors in NCCL communicators. These were used as part of
ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were
supported. Although, there is still legitimate use for older NCCL versions and
hence we should still support those.

For that purpose, in this change I've ensured we disable NCCL error checking
for versions < 2.4.
ghstack-source-id: 91452959

Test Plan:
1) Test with 2.4.8
2) Test with 2.2.13
3) unit tests.

Differential Revision: D17178988

fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698
Summary:
Fixing pytorch#27266

In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file.
Pull Request resolved: pytorch#27478

Differential Revision: D17799522

Pulled By: pbelevich

fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0
Summary: Pull Request resolved: pytorch#27511

Test Plan: Imported from OSS

Differential Revision: D17801185

Pulled By: jamesr66a

fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a
…t slice when index = negative one

Test Plan: revert-hammer

Differential Revision:
D17159707

Original commit changeset: 2c3b27542108

fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01
Summary:
Pull Request resolved: pytorch#27374

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17809770

Pulled By: ezyang

fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8
Summary:
Pull Request resolved: pytorch#27508

Implemented a simple exponential decay of the weight of lr loss function, with a lower bound.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay
https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308

canary: f140103452

Reviewed By: chenshouyuan

Differential Revision: D17524101

fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
…erver.py

Summary: Pull Request resolved: pytorch#27415

Reviewed By: zafartahirov

Differential Revision: D17783101

Pulled By: gottbrath

fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f
…rnel. (pytorch#26908)" (pytorch#27283)

Summary:
Pull Request resolved: pytorch#27283

This reverts commit 9159a60.

Test Plan: Imported from OSS

Differential Revision: D17738167

Pulled By: ezyang

fbshipit-source-id: cc4048d553017409279603590833d1529f59048c
Summary:
Fixes pytorch#27443
Pull Request resolved: pytorch#27465

Differential Revision: D17810732

Pulled By: pietern

fbshipit-source-id: b8a62dd086a4f4a61c9aa6acfa495cf822995604
…DoubleTensor) (pytorch#27444)

Summary:
This PR stop common_utils.py from setting the default tensor type when it's imported. See issue pytorch#27355. This is a frequent source of confusion for test writers.

Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are:

- test_autograd.py
- test_distributions.py
- test_jit.py
- test_nn.py

This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting.

Notable technical changes in this PR are:

- Significant updates to test_torch.py to make it pass without setting the default floating dtype globally.
- The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously.
- test_torch-specific parts of common_utils were refactored into test_torch.
- tensor creation methods in common_utils were updated to accept an optional dtype and device.
Pull Request resolved: pytorch#27444

Differential Revision: D17795235

Pulled By: mruberry

fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1
@rohithkrn
Copy link
Author

@pytorchbot retest this please

@rohithkrn
Copy link
Author

ROCm tests passed. Merging.

@rohithkrn rohithkrn merged commit 416732c into ROCm:bf16_bringup_v2 Oct 9, 2019
rohithkrn added a commit that referenced this pull request Oct 17, 2019
rohithkrn added a commit that referenced this pull request Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet