-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Labels
Description
My deepspeed version is 0.8.1 , my torch version is 1.13.1 and my transformer version is transformers==4.21.2. My CPU memory is 500GB
I follow the document to run my code.
- The below is my script
deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b --batch_size ${BS} --test_performance --dtype int8 --use_meta_tensor
and
deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b --batch_size ${BS} --test_performance --dtype float16 --use_meta_tensor\
my error is
File "inference-test.py", line 111, in <module>
outputs = pipe(inputs,
File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 71, in __call__
outputs = self.generate_outputs(input_list, num_tokens=num_tokens, do_sample=do_sample)
File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 115, in generate_outputs
self.model.cuda().to(self.device)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!
[2023-02-19 06:47:26,453] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12532
[2023-02-19 06:47:26,672] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12587
[2023-02-19 06:47:26,891] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12615
Traceback (most recent call last):
File "inference-test.py", line 111, in <module>
outputs = pipe(inputs,
File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 71, in __call__
outputs = self.generate_outputs(input_list, num_tokens=num_tokens, do_sample=do_sample)
File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 115, in generate_outputs
self.model.cuda().to(self.device)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
module._apply(fn)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
param_applied = fn(param)
File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
- then I try another script
deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b --batch_size ${BS} --test_performance --dtype int8
and
deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b --batch_size ${BS} --test_performance --dtype float16
and my error is below
RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 1358954496 bytes. Error code 12 (Cannot allocate memory)
- And I also try run model(facebook/opt-30b), the same error like above
Reactions are currently unavailable