Skip to content

[BUG] use 8 32GB V100 and use_meta_tensor to inference big model. Cannot copy out of meta tensor; no data! #2856

@lambda7xx

Description

@lambda7xx

My deepspeed version is 0.8.1 , my torch version is 1.13.1 and my transformer version is transformers==4.21.2. My CPU memory is 500GB

I follow the document to run my code.

  • The below is my script
 deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b  --batch_size ${BS}    --test_performance --dtype int8 --use_meta_tensor

and

 deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b  --batch_size ${BS}    --test_performance --dtype float16 --use_meta_tensor\

my error is

  File "inference-test.py", line 111, in <module>
    outputs = pipe(inputs,
  File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 71, in __call__
    outputs = self.generate_outputs(input_list, num_tokens=num_tokens, do_sample=do_sample)
  File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 115, in generate_outputs
    self.model.cuda().to(self.device)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
    return self._apply(lambda t: t.cuda(device))
NotImplementedError: Cannot copy out of meta tensor; no data!
[2023-02-19 06:47:26,453] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12532
[2023-02-19 06:47:26,672] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12587
[2023-02-19 06:47:26,891] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 12615
Traceback (most recent call last):
  File "inference-test.py", line 111, in <module>
    outputs = pipe(inputs,
  File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 71, in __call__
    outputs = self.generate_outputs(input_list, num_tokens=num_tokens, do_sample=do_sample)
  File "/home/YYYYY/DeepSpeedExamples/inference/huggingface/text-generation/utils.py", line 115, in generate_outputs
    self.model.cuda().to(self.device)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/YYYYY/DeepSpeedExamples/lib/python3.8/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
  • then I try another script
    deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b  --batch_size ${BS}    --test_performance --dtype int8 

and

 deepspeed --num_gpus 8 inference-test.py --name facebook/opt-66b  --batch_size ${BS}    --test_performance --dtype float16 

and my error is below

RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 1358954496 bytes. Error code 12 (Cannot allocate memory)

  • And I also try run model(facebook/opt-30b), the same error like above

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions