Cuda support #103

xasetl · 2018-06-21T19:07:47Z

should work the way we want it to.
has to be tested, tho.

# Conflicts: # src/evaluation/evaluator.py

# Conflicts: # main.py # src/evaluation/evaluator.py

…mm_ensemble

# Conflicts: # main.py

# Conflicts: # src/algorithms/lstm_enc_dec_axl.py

minor fix

# Conflicts: # main.py # src/algorithms/autoencoder.py # src/algorithms/dagmm.py # src/algorithms/lstm_enc_dec_axl.py

WGierke · 2018-06-21T19:08:47Z

How about branching from master next time so there aren't 57 commits? :)

WGierke · 2018-06-22T07:03:19Z

src/algorithms/cuda_utils.py

+
+    @property
+    def torch_device(self):
+        return torch.device(f'cuda:{self.gpu}' if torch.cuda.is_available() else 'cpu')


What if there are multiple GPUs we can use?

WGierke · 2018-06-22T08:07:35Z

2018-06-22 08:06:38 [INFO] src.evaluation.evaluator: Training DAGMM_LSTMAutoEncoder_withWindow on Synthetic Combined Outliers 4-dimensional
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: An exception occurred while training DAGMM_LSTMAutoEncoder_withWindow on Synthetic Combined Outliers 4-dimensional: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: Traceback (most recent call last):
  File "/repo/src/evaluation/evaluator.py", line 71, in evaluate
    det.fit(X_train, y_train)
  File "/repo/src/algorithms/dagmm.py", line 188, in fit
    self.to_device(self.dagmm)
  File "/repo/src/algorithms/cuda_utils.py", line 28, in to_device
    model.to(self.torch_device)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 393, in to
    return self._apply(lambda t: t.to(device))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 176, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 176, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 111, in _apply
    ret = super(RNNBase, self)._apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 182, in _apply
    param.data = fn(param.data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 393, in <lambda>
    return self._apply(lambda t: t.to(device))
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:20

WGierke · 2018-06-22T08:09:03Z

2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: An exception occurred while training Donut on Synthetic Combined Outliers 4-dimensional: CUDA runtime implicit initialization on GPU:0 failed. Status: an ill
egal memory access was encountered
2018-06-22 08:06:38 [ERROR] src.evaluation.evaluator: Traceback (most recent call last):
  File "/repo/src/evaluation/evaluator.py", line 71, in evaluate
    det.fit(X_train, y_train)
  File "/repo/src/algorithms/donut.py", line 154, in fit
    with self.tf_device: 
  File "/repo/src/algorithms/cuda_utils.py", line 13, in tf_device
    local_device_protos = device_lib.list_local_devices()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/device_lib.py", line 41, in list_local_devices
    for s in pywrap_tensorflow.list_devices(session_config=session_config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 1675, in list_devices
    return ListDevices(status)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: an illegal memory access was encountered

WGierke · 2018-06-22T08:10:48Z

2018-06-22 08:06:59.731177: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x6339750: CUDA_ERROR_ILLEGAL_ADDRESS
Traceback (most recent call last):
  File "main.py", line 126, in <module>
    main()
  File "main.py", line 17, in main
    run_experiments()
  File "main.py", line 94, in run_experiments
    detectors = [RecurrentEBM(num_epochs=15), LSTMAD(), Donut(), LSTMED(num_epochs=40),
  File "/repo/src/algorithms/lstm_ad.py", line 67, in __init__
    torch.manual_seed(0)
  File "/usr/local/lib/python3.6/dist-packages/torch/random.py", line 33, in manual_seed
    torch.cuda.manual_seed_all(seed)
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/random.py", line 86, in manual_seed_all
    _lazy_call(lambda: _C._cuda_manualSeedAll(seed))
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 121, in _lazy_call
    callable()
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/random.py", line 86, in <lambda>
    _lazy_call(lambda: _C._cuda_manualSeedAll(seed))
RuntimeError: Creating MTGP constants failed. at /pytorch/aten/src/THC/THCTensorRandom.cu:34

WGierke · 2018-06-22T17:50:44Z

python3.6 main.py > log 2>&1 yields
log.zip

danthe96 · 2018-06-26T17:20:18Z

src/algorithms/dagmm.py

+            nn.Dropout(p=0.5),
+            nn.Linear(10, n_gmm),
+            nn.Softmax(dim=1)
+        ]


* Adapt LSTMAD, Donut and DAGMM with NNAutoencoder * Ignore device placement on ReEBM as well * Adapt LSTMED * Separate running detectors * Use magic GPUWrapper * Sort by framework

WGierke and others added 30 commits June 3, 2018 22:52

Adapt KDD_Cup dataset to the interfaces

40065b5

Recall is finally up again

cf2f5ae

Remove unused variables

590ea32

Check out main.py

2a090b2

Rename uppercase variables

6273cd7

Add test for DAGMM

290b219

Remove unused variables

edf8a19

Log infos on circle

68ddd00

Fix flake8

434792f

Add small comment

bba07e4

Add loss function to DAGMM

2096c04

Merge master

08b2bc1

Remove unittest from Circle

3833161

Merge evaluator

8777601

Reset threshold computation

e5ea0b3

Fix flake8

4ac9f46

Do not shuffle, adjust batch size and epochs

3cb4527

Merge remote-tracking branch 'origin/wg/fix_kddcup' into gmm_ensemble

f73adb7

# Conflicts: # src/evaluation/evaluator.py

Merge remote-tracking branch 'origin/master' into gmm_ensemble

ae50fe6

# Conflicts: # main.py # src/evaluation/evaluator.py

introduces lstm encoder decoder and lstm ed gmm ensemble.

ff8d10c

line to long

cf399bc

Average energies across windows

4764889

Create AutoEncoder class

7b9c780

Add sliding windows to DAGMM

6d6ea44

Remove CustomDataLoader

4dd42f9

Adapt DAGMM test

5209826

Merge remote-tracking branch 'origin/wg/dagmm_sliding_windows' into g…

46a7f7b

…mm_ensemble

Refactor DAGMM

fbfae8a

Remove unused import, unused file

580c054

Fix import

e2b84c6

Daniel Theveßen and others added 15 commits June 20, 2018 12:18

Fix parameters

f4af503

Merge remote-tracking branch 'origin/master' into gmm_ensemble

270f4b6

# Conflicts: # main.py

Fix donut param

81b0d7c

Fix Enc_Dec param

2f9f056

Standardize epoch name

89766a4

Remove debug statement

d7c7ea7

Rename epochs

7c92568

likelihood score

8cfa046

Merge remote-tracking branch 'origin/gmm_ensemble' into gmm_ensemble

61f9a76

# Conflicts: # src/algorithms/lstm_enc_dec_axl.py

adds handling for missing values

ee94c75

minor fix

Merge remote-tracking branch 'origin/master' into lstmed_fixes

0d7fbdd

# Conflicts: # main.py # src/algorithms/autoencoder.py # src/algorithms/dagmm.py # src/algorithms/lstm_enc_dec_axl.py

adds second trainings set to train params of gaussian dist via spliting

2c04810

Merge remote-tracking branch 'origin/master' into lstm_ad_ed_val_set

4a80fc9

adds cuda support for all algorithms

ab4027c

Merge remote-tracking branch 'origin/master' into lstm_ad_ed_val_set

97d3c41

xasetl requested review from danthe96 and WGierke June 21, 2018 19:07

xasetl mentioned this pull request Jun 21, 2018

All models should support CUDA #89

Closed

WGierke reviewed Jun 22, 2018

View reviewed changes

danthe96 approved these changes Jun 26, 2018

View reviewed changes

src/algorithms/dagmm.py

nn.Dropout(p=0.5),

nn.Linear(10, n_gmm),

nn.Softmax(dim=1)

]

Copy link

Member

danthe96 Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

WGierke added 3 commits June 27, 2018 13:19

Run LSTMAD, Donut and DAGMM with NNAutoencoder on CUDA (#105)

e49f24b

* Adapt LSTMAD, Donut and DAGMM with NNAutoencoder * Ignore device placement on ReEBM as well * Adapt LSTMED * Separate running detectors * Use magic GPUWrapper * Sort by framework

Merge master

ac7859d

Merge new master

9341e58

WGierke merged commit d5cced5 into master Jun 28, 2018

xasetl deleted the cuda_support branch July 4, 2018 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda support #103

Cuda support #103

xasetl commented Jun 21, 2018

WGierke commented Jun 21, 2018

WGierke Jun 22, 2018

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

danthe96 Jun 26, 2018

Cuda support #103

Cuda support #103

Conversation

xasetl commented Jun 21, 2018

WGierke commented Jun 21, 2018

WGierke Jun 22, 2018

Choose a reason for hiding this comment

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

WGierke commented Jun 22, 2018

danthe96 Jun 26, 2018

Choose a reason for hiding this comment