Running on CPU #46

Sieqfried · 2019-10-11T10:57:19Z

Hi,

I am currently trying to run 'simplest_example.py' on a CPU within a docker container.

I have tried modifying the code to run on CPU by passing:

"placement=DeviceType.CPU" to the Factory which produces an Error regarding CUDA:

Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1022, in train
'amp_min_loss_scale', 1.0))
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 359, in __initialize_amp
opt_level=AmpOptimizations[optim_level],
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 170, in _initialize
check_params_fp32(models)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 92, in check_params_fp32
name, param.type()))
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found param fc1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

To fix that issue I additionally passed:

'optimization_level=1' to prevent APEX from being called which returned

2019-10-11 09:32:10,688 - WARNING - Data Layer does not have any weights to return. This get_weights call returns None.
Starting .....
Starting epoch 0
Traceback (most recent call last):
File "simplest_example.py", line 27, in
optimizer="sgd")
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 526, in train
stop_on_nan_loss=stop_on_nan_loss)
File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1184, in train
final_loss.get_device()))
RuntimeError: Device index must not be negative

How do I run the example on CPU? Thanks.

okuchaiev · 2019-10-11T22:05:49Z

Please have a look at this PR: #48
In that PR I am able to run simplest examples on MacBook Pro (without proper GPU) by setting:

from nemo.core import DeviceType
nf = nemo.core.NeuralModuleFactory(placement=DeviceType.CPU)

Sieqfried · 2019-10-15T07:54:12Z

I tried out the PR but am still getting this error:

Traceback (most recent call last): File "simplest_example.py", line 30, in <module> optimizer="sgd") File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/core/neural_factory.py", line 536, in train stop_on_nan_loss=stop_on_nan_loss) File "/opt/conda/lib/python3.6/site-packages/nemo_toolkit-0.8-py3.6.egg/nemo/backends/pytorch/actions.py", line 1209, in train final_loss.get_device())) RuntimeError: Device index must not be negative

okuchaiev · 2019-10-15T23:49:31Z

just pushed an update, could you please try again?

okuchaiev · 2019-10-17T00:35:42Z

Pushed more updates. On my Macbook Pro (without proper GPU), the following seems to work:

"start_here" examples
This notebook: https://github.com/NVIDIA/NeMo/blob/master/examples/asr/ASR_made_simple.ipynb on AN4 dataset but, as expected, the training is painfully slow
This inference with beam search: https://github.com/NVIDIA/NeMo/blob/master/examples/asr/InferenceWithBeamSearch.ipynb (note that for MacOS there is a separate script under "scripts" to install beams search decoders)

Sieqfried · 2019-10-17T16:05:34Z

Sorry for the delay.

Yes, now I am able to run all the examples in the start_here Folder. I am running them in a docker container so I have not run the ipython scripts.

okuchaiev · 2019-10-18T19:44:08Z

merged the PR to master. closing this issue

JFFerraro5 · 2024-07-10T21:36:53Z

I am trying to run a Training job with a CPU in a docker container. It seems this 'DeviceType' File no longer exists. The error is similar to the first question:

`  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 299, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx`

from nemo.core import DeviceType
ImportError: cannot import name 'DeviceType' from 'nemo.core'

* Fix `lab` command * install dependencies from requirements.txt * include cli and submodules only Signed-off-by: markstur <mark.sturdevant@ibm.com> * Update README to use `lab` command Signed-off-by: markstur <mark.sturdevant@ibm.com> --------- Signed-off-by: markstur <mark.sturdevant@ibm.com>

okuchaiev self-assigned this Oct 11, 2019

okuchaiev closed this as completed Oct 18, 2019

Aymericr mentioned this issue Apr 28, 2023

pip install errors MahmoudAshraf97/whisper-diarization#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on CPU #46

Running on CPU #46

Sieqfried commented Oct 11, 2019

okuchaiev commented Oct 11, 2019

Sieqfried commented Oct 15, 2019

okuchaiev commented Oct 15, 2019

okuchaiev commented Oct 17, 2019 •

edited

Loading

Sieqfried commented Oct 17, 2019

okuchaiev commented Oct 18, 2019

JFFerraro5 commented Jul 10, 2024

Running on CPU #46

Running on CPU #46

Comments

Sieqfried commented Oct 11, 2019

okuchaiev commented Oct 11, 2019

Sieqfried commented Oct 15, 2019

okuchaiev commented Oct 15, 2019

okuchaiev commented Oct 17, 2019 • edited Loading

Sieqfried commented Oct 17, 2019

okuchaiev commented Oct 18, 2019

JFFerraro5 commented Jul 10, 2024

okuchaiev commented Oct 17, 2019 •

edited

Loading