DefaultCPUAllocator: not enough memory #41

936384885xy · 2023-03-16T15:58:20Z

I succeed to run 7B model. But when I tried to run 14B model on my 4080 GPU by setting "args.strategy = 'cuda fp16i8 *21 -> cuda fp16 *20'"and "os.environ["RWKV_CUDA_ON"] = '0'", it reports an error.
During the process the program consume all my 32GB CPU memory, the log is as followed.

ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV

Chinese - cuda fp16i8 *21 -> cuda fp16 *20 - J:\ChatRWKV\v2/prompt/default/Chinese-2.py
Loading model - J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6

Loading J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth ...
Strategy: (total 40+1=41 layers)

cuda [float16, uint8], store 21 layers
cuda [float16, float16], store 20 layers
0-cuda-float16-uint8 1-cuda-float16-uint8 2-cuda-float16-uint8 3-cuda-float16-uint8 4-cuda-float16-uint8 5-cuda-float16-uint8 6-cuda-float16-uint8 7-cuda-float16-uint8 8-cuda-float16-uint8 9-cuda-float16-uint8 10-cuda-float16-uint8 11-cuda-float16-uint8 12-cuda-float16-uint8 13-cuda-float16-uint8 14-cuda-float16-uint8 15-cuda-float16-uint8 16-cuda-float16-uint8 17-cuda-float16-uint8 18-cuda-float16-uint8 19-cuda-float16-uint8 20-cuda-float16-uint8 21-cuda-float16-float16 22-cuda-float16-float16 23-cuda-float16-float16 24-cuda-float16-float16 25-cuda-float16-float16 26-cuda-float16-float16 27-cuda-float16-float16 28-cuda-float16-float16 29-cuda-float16-float16 30-cuda-float16-float16 31-cuda-float16-float16 32-cuda-float16-float16 33-cuda-float16-float16 34-cuda-float16-float16 35-cuda-float16-float16 36-cuda-float16-float16 37-cuda-float16-float16 38-cuda-float16-float16 39-cuda-float16-float16 40-cuda-float16-float16
emb.weight f16 cpu 50277 5120
blocks.0.ln1.weight f16 cuda:0 5120
blocks.0.ln1.bias f16 cuda:0 5120
blocks.0.ln2.weight f16 cuda:0 5120
blocks.0.ln2.bias f16 cuda:0 5120
blocks.0.att.time_decay f32 cuda:0 5120
blocks.0.att.time_first f32 cuda:0 5120
blocks.0.att.time_mix_k f16 cuda:0 5120
blocks.0.att.time_mix_v f16 cuda:0 5120
blocks.0.att.time_mix_r f16 cuda:0 5120
blocks.0.att.key.weight i8 cuda:0 5120 5120
blocks.0.att.value.weight i8 cuda:0 5120 5120
blocks.0.att.receptance.weight i8 cuda:0 5120 5120
blocks.0.att.output.weight i8 cuda:0 5120 5120
blocks.0.ffn.time_mix_k f16 cuda:0 5120
blocks.0.ffn.time_mix_r f16 cuda:0 5120
blocks.0.ffn.key.weight i8 cuda:0 5120 20480
blocks.0.ffn.receptance.weight i8 cuda:0 5120 5120
blocks.0.ffn.value.weight i8 cuda:0 20480 5120
...........................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
File "J:\ChatRWKV\v2\chat.py", line 110, in
model = RWKV(model=args.MODEL_NAME, strategy=args.strategy)
File "J:\ChatRWKV\python3.10.10\lib\site-packages\torch\jit_script.py", line 293, in init_then_script
original_init(self, *args, **kwargs)
File "J:\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 192, in init
w[x] = w[x] / (2 ** int(layer_id // self.RESCALE_LAYER))
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 209715200 bytes.

BlinkDL · 2023-03-17T02:41:35Z

Please see #40 :)

Will be much better soon.

Currently it requires huge amt (probably 40G+?) of CPU RAM to load 14B.

BlinkDL · 2023-03-19T14:37:11Z

Update ChatRWKV v2 & pip rwkv package (0.7.0):
Use v2/convert_model.py to convert a model for a strategy, for faster loading & saves CPU RAM.
@936384885xy

nanonomad · 2023-03-20T03:50:26Z

I've updated to the latest source and reinstalled the pip package. Unfortunately, convert_model.py appears to act similarly to the v2/chat.py script in terms of when it terminates. The message is "Killed" rather than a segmentation fault this time.

Does it need to be run on a machine with 40+gb ram? I attempted to run it on Google Colab as well, and it behaved the same (but iirc that is only 25gb ram)

Example launch command:
RWKV_CUDA_ON=1 python v2/convert_model.py --in RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth --out RWKV-4-Pile-14B-20230313-ctx8192-test1050.converted.pth --strategy 'cuda fp16 *8+'

BlinkDL · 2023-03-20T06:07:02Z

I've updated to the latest source and reinstalled the pip package. Unfortunately, convert_model.py appears to act similarly to the v2/chat.py script in terms of when it terminates. The message is "Killed" rather than a segmentation fault this time.

Find a machine with lots of RAM to convert it :) Then you can load using much less RAM.

BlinkDL mentioned this issue Mar 17, 2023

ChatRWKV triggers segmentation fault when using streaming or split loading on 4-14B BlinkDL/RWKV-LM#44

Closed

BlinkDL closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DefaultCPUAllocator: not enough memory #41

DefaultCPUAllocator: not enough memory #41

936384885xy commented Mar 16, 2023

BlinkDL commented Mar 17, 2023 •

edited

BlinkDL commented Mar 19, 2023

nanonomad commented Mar 20, 2023

BlinkDL commented Mar 20, 2023

DefaultCPUAllocator: not enough memory #41

DefaultCPUAllocator: not enough memory #41

Comments

936384885xy commented Mar 16, 2023

BlinkDL commented Mar 17, 2023 • edited

BlinkDL commented Mar 19, 2023

nanonomad commented Mar 20, 2023

BlinkDL commented Mar 20, 2023

BlinkDL commented Mar 17, 2023 •

edited