[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory #4891

epignatelli · 2024-05-17T17:04:50Z

Your current environment


qdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualization:                     VT-x
L1d cache:                          1.1 MiB (24 instances)
L1i cache:                          768 KiB (24 instances)
L2 cache:                           30 MiB (24 instances)
L3 cache:                           36 MiB (2 instances)
NUMA node(s):                       2
NUMA node0 CPU(s):                  0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):                  1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[
[pip3] torch==2.2.1
[pip3] torch-ac==1.4.0
[pip3] triton==2.2.0
[pip3] vllm_nccl_cu12==2.18.1.0.4.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
[conda] torch                     2.2.1                    pypi_0    pypi
[conda] torch-ac                  1.4.0                    pypi_0    pypi
[conda] triton                    2.2.0                    pypi_0    pypi
[conda] vllm-nccl-cu12            2.18.1.0.4.0             pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      1,3,5,7,9,11    1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

I cannot instantiate a CohereForAI/c4ai-command-r-v01 model.

GPU: NVidia A100 80Gb
RAM:

               total        used        free      shared  buff/cache   available
Mem:           124Gi       5.1Gi       113Gi        29Mi       7.4Gi       119Gi
Swap:           15Gi          0B        15Gi

Repro:

python -c "from vllm import LLM; LLM("CohereForAI/c4ai-command-r-v01")

The text was updated successfully, but these errors were encountered:

epignatelli · 2024-05-17T17:32:47Z

As a temporary solution. you can use

@partial(torch.compile, backend="eager")

at line 52:

vllm/vllm/model_executor/models/commandr.py

Lines 51 to 61 in 48d5985

    
           @torch.compile 
        
           def layer_norm_func(hidden_states, weight, variance_epsilon): 
        
               input_dtype = hidden_states.dtype 
        
               hidden_states = hidden_states.to(torch.float32) 
        
               mean = hidden_states.mean(-1, keepdim=True) 
        
               variance = (hidden_states - mean).pow(2).mean(-1, keepdim=True) 
        
               hidden_states = (hidden_states - mean) * torch.rsqrt(variance + 
        
                                                                    variance_epsilon) 
        
               hidden_states = weight.to(torch.float32) * hidden_states 
        
               return hidden_states.to(input_dtype)

See pytorch/pytorch#93495

epignatelli added the bug Something isn't working label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory #4891

[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory #4891

epignatelli commented May 17, 2024

epignatelli commented May 17, 2024 •

edited

[Bug]: CohereForAI/c4ai-command-r-v01OSError: [Errno 12] Cannot allocate memory #4891

[Bug]: CohereForAI/c4ai-command-r-v01OSError: [Errno 12] Cannot allocate memory #4891

Comments

epignatelli commented May 17, 2024

Your current environment

🐛 Describe the bug

epignatelli commented May 17, 2024 • edited

[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory #4891

[Bug]: `CohereForAI/c4ai-command-r-v01`OSError: [Errno 12] Cannot allocate memory #4891

epignatelli commented May 17, 2024 •

edited