Skip to content

[BUG] DeepSpeed Inference example in the tutorial got killed for no reason.  #3710

@Entropy-xcy

Description

@Entropy-xcy

Describe the bug
I am running the tutorial GPT-neo inference example in https://www.deepspeed.ai/tutorials/inference-tutorial/. This is my inference code:

import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-125M',
                     device=local_rank)



generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           replace_with_kernel_inject=True
                                            )

string = generator("DeepSpeed is", min_length=50, num_return_sequences=1, max_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)

I got no error message and the process get killed for no reason:

Setting ds_accelerator to cuda (auto detect)
[2023-06-08 04:35:52,281] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-08 04:35:52,296] [INFO] [runner.py:555:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None ds_infer.py
Setting ds_accelerator to cuda (auto detect)
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-06-08 04:35:53,337] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-06-08 04:35:53,337] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-06-08 04:35:53,337] [INFO] [launch.py:163:main] dist_world_size=2
[2023-06-08 04:35:53,337] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
[2023-06-08 04:35:57,463] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.3+4559aa9b, git-hash=4559aa9b, git-branch=HEAD
[2023-06-08 04:35:57,463] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-06-08 04:35:57,464] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-06-08 04:35:57,466] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-08 04:35:57,466] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-08 04:35:57,466] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
[2023-06-08 04:35:57,555] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.3+4559aa9b, git-hash=4559aa9b, git-branch=HEAD
[2023-06-08 04:35:57,555] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-06-08 04:35:57,555] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-06-08 04:35:57,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-08 04:35:57,558] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-08 04:35:57,569] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 768, 'intermediate_size': 3072, 'heads': 12, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False}
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
------------------------------------------------------
Free memory : 20.773438 (GigaBytes)  
Total memory: 23.689514 (GigaBytes)  
Requested memory: 0.087891 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7f2916000000 
------------------------------------------------------
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[2023-06-08 04:35:59,354] [INFO] [launch.py:314:sigkill_handler] Killing subprocess 2824
[2023-06-08 04:35:59,354] [INFO] [launch.py:314:sigkill_handler] Killing subprocess 2825
[2023-06-08 04:35:59,355] [ERROR] [launch.py:320:sigkill_handler] ['/usr/bin/python', '-u', 'ds_infer.py', '--local_rank=1'] exits with return code = -7

I have double checked that the process is not been killed due to OOM.

To Reproduce

deepspeed --num_gpus 2 ds_infer.py

ds_report output

Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
random_ltd ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
spatial_inference ...... [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/usr/local/lib/python3.8/dist-packages/deepspeed']
deepspeed info ................... 0.9.3+4559aa9b, 4559aa9b, HEAD
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

System info (please complete the following information):

  • Ubuntu 20.04 LTS (in Docker)
  • 2 x RTX 3090 24G
  • transformers version: 4.31.0.dev0
  • Python version: 1.13.1
  • Any other relevant info about your setup

Docker context
I am using the Dockerfile from deepspeed's repo https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile with the base image changed to nvidia/cuda:11.7.0-devel-ubuntu20.04.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions