[BUG] DeepSpeed Inference example in the tutorial got killed for no reason. 

**Describe the bug**
I am running the tutorial GPT-neo inference example in [https://www.deepspeed.ai/tutorials/inference-tutorial/](https://www.deepspeed.ai/tutorials/inference-tutorial/). This is my inference code:
```
import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-125M',
                     device=local_rank)



generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           replace_with_kernel_inject=True
                                            )

string = generator("DeepSpeed is", min_length=50, num_return_sequences=1, max_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)
```
I got no error message and the process get killed for no reason:
```
Setting ds_accelerator to cuda (auto detect)
[2023-06-08 04:35:52,281] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-06-08 04:35:52,296] [INFO] [runner.py:555:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None ds_infer.py
Setting ds_accelerator to cuda (auto detect)
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-06-08 04:35:53,337] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-06-08 04:35:53,337] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-06-08 04:35:53,337] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-06-08 04:35:53,337] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-06-08 04:35:53,337] [INFO] [launch.py:163:main] dist_world_size=2
[2023-06-08 04:35:53,337] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
Setting ds_accelerator to cuda (auto detect)
Setting ds_accelerator to cuda (auto detect)
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
[2023-06-08 04:35:57,463] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.3+4559aa9b, git-hash=4559aa9b, git-branch=HEAD
[2023-06-08 04:35:57,463] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-06-08 04:35:57,464] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-06-08 04:35:57,466] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-08 04:35:57,466] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-08 04:35:57,466] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
[2023-06-08 04:35:57,555] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.3+4559aa9b, git-hash=4559aa9b, git-branch=HEAD
[2023-06-08 04:35:57,555] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-06-08 04:35:57,555] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-06-08 04:35:57,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-06-08 04:35:57,558] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-06-08 04:35:57,569] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 768, 'intermediate_size': 3072, 'heads': 12, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False}
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
------------------------------------------------------
Free memory : 20.773438 (GigaBytes)  
Total memory: 23.689514 (GigaBytes)  
Requested memory: 0.087891 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7f2916000000 
------------------------------------------------------
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[2023-06-08 04:35:59,354] [INFO] [launch.py:314:sigkill_handler] Killing subprocess 2824
[2023-06-08 04:35:59,354] [INFO] [launch.py:314:sigkill_handler] Killing subprocess 2825
[2023-06-08 04:35:59,355] [ERROR] [launch.py:320:sigkill_handler] ['/usr/bin/python', '-u', 'ds_infer.py', '--local_rank=1'] exits with return code = -7
```
I have double checked that the process is not been killed due to OOM. 

**To Reproduce**
```
deepspeed --num_gpus 2 ds_infer.py
```

**ds_report output**
```
Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [YES] ...... [OKAY]
cpu_adagrad ............ [YES] ...... [OKAY]
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [YES] ...... [OKAY]
quantizer .............. [YES] ...... [OKAY]
random_ltd ............. [YES] ...... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
spatial_inference ...... [YES] ...... [OKAY]
transformer ............ [YES] ...... [OKAY]
stochastic_transformer . [YES] ...... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/usr/local/lib/python3.8/dist-packages/deepspeed']
deepspeed info ................... 0.9.3+4559aa9b, 4559aa9b, HEAD
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7
```

**System info (please complete the following information):**
 - Ubuntu 20.04 LTS (in Docker)
 - 2 x RTX 3090 24G
 - transformers version: `4.31.0.dev0`
 - Python version: `1.13.1`
 - Any other relevant info about your setup

**Docker context**
I am using the Dockerfile from deepspeed's repo [https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile](https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile) with the base image changed to `nvidia/cuda:11.7.0-devel-ubuntu20.04`. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DeepSpeed Inference example in the tutorial got killed for no reason. #3710

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] DeepSpeed Inference example in the tutorial got killed for no reason. #3710

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions