Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda support #103

Merged
merged 60 commits into from
Jun 28, 2018
Merged

Cuda support #103

merged 60 commits into from
Jun 28, 2018

Conversation

xasetl
Copy link
Member

@xasetl xasetl commented Jun 21, 2018

should work the way we want it to.
has to be tested, tho.

@xasetl xasetl requested review from danthe96 and WGierke June 21, 2018 19:07
@WGierke
Copy link
Contributor

WGierke commented Jun 21, 2018

How about branching from master next time so there aren't 57 commits? :)


@property
def torch_device(self):
return torch.device(f'cuda:{self.gpu}' if torch.cuda.is_available() else 'cpu')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple GPUs we can use?

@WGierke
Copy link
Contributor

WGierke commented Jun 22, 2018

2018-06-22 08:06:38 [INFO] src.evaluation.evaluator: Training DAGMM_LSTMAutoEncoder_withWindow on Synthetic Combined Outliers 4-dimensional
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: An exception occurred while training DAGMM_LSTMAutoEncoder_withWindow on Synthetic Combined Outliers 4-dimensional: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: Traceback (most recent call last):
  File "/repo/src/evaluation/evaluator.py", line 71, in evaluate
    det.fit(X_train, y_train)
  File "/repo/src/algorithms/dagmm.py", line 188, in fit
    self.to_device(self.dagmm)
  File "/repo/src/algorithms/cuda_utils.py", line 28, in to_device
    model.to(self.torch_device)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 393, in to
    return self._apply(lambda t: t.to(device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 176, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 176, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 111, in _apply
    ret = super(RNNBase, self)._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 182, in _apply
    param.data = fn(param.data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 393, in <lambda>
    return self._apply(lambda t: t.to(device))
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

@WGierke
Copy link
Contributor

WGierke commented Jun 22, 2018

2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: An exception occurred while training Donut on Synthetic Combined Outliers 4-dimensional: CUDA runtime implicit initialization on GPU:0 failed. Status: an ill
egal memory access was encountered
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: Traceback (most recent call last):
  File "/repo/src/evaluation/evaluator.py", line 71, in evaluate
    det.fit(X_train, y_train)
  File "/repo/src/algorithms/donut.py", line 154, in fit
    with self.tf_device: 
  File "/repo/src/algorithms/cuda_utils.py", line 13, in tf_device
    local_device_protos = device_lib.list_local_devices()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/device_lib.py", line 41, in list_local_devices
    for s in pywrap_tensorflow.list_devices(session_config=session_config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 1675, in list_devices
    return ListDevices(status)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: an illegal memory access was encountered

@WGierke
Copy link
Contributor

WGierke commented Jun 22, 2018

2018-06-22 08:06:59.731177: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x6339750: CUDA_ERROR_ILLEGAL_ADDRESS
Traceback (most recent call last):
  File "main.py", line 126, in <module>
    main()
  File "main.py", line 17, in main
    run_experiments()
  File "main.py", line 94, in run_experiments
    detectors = [RecurrentEBM(num_epochs=15), LSTMAD(), Donut(), LSTMED(num_epochs=40),
  File "/repo/src/algorithms/lstm_ad.py", line 67, in __init__
    torch.manual_seed(0)
  File "/usr/local/lib/python3.6/dist-packages/torch/random.py", line 33, in manual_seed
    torch.cuda.manual_seed_all(seed)
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/random.py", line 86, in manual_seed_all
    _lazy_call(lambda: _C._cuda_manualSeedAll(seed))
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 121, in _lazy_call
    callable()
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/random.py", line 86, in <lambda>
    _lazy_call(lambda: _C._cuda_manualSeedAll(seed))
RuntimeError: Creating MTGP constants failed. at /pytorch/aten/src/THC/THCTensorRandom.cu:34

@WGierke
Copy link
Contributor

WGierke commented Jun 22, 2018

python3.6 main.py > log 2>&1 yields
log.zip

nn.Dropout(p=0.5),
nn.Linear(10, n_gmm),
nn.Softmax(dim=1)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

* Adapt LSTMAD, Donut and DAGMM with NNAutoencoder

* Ignore device placement on ReEBM as well

* Adapt LSTMED

* Separate running detectors

* Use magic GPUWrapper

* Sort by framework
@WGierke WGierke merged commit d5cced5 into master Jun 28, 2018
@xasetl xasetl deleted the cuda_support branch July 4, 2018 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants