Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultCPUAllocator: not enough memory #41

Closed
936384885xy opened this issue Mar 16, 2023 · 4 comments
Closed

DefaultCPUAllocator: not enough memory #41

936384885xy opened this issue Mar 16, 2023 · 4 comments

Comments

@936384885xy
Copy link

I succeed to run 7B model. But when I tried to run 14B model on my 4080 GPU by setting "args.strategy = 'cuda fp16i8 *21 -> cuda fp16 *20'"and "os.environ["RWKV_CUDA_ON"] = '0'", it reports an error.
During the process the program consume all my 32GB CPU memory, the log is as followed.

ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV

Chinese - cuda fp16i8 *21 -> cuda fp16 *20 - J:\ChatRWKV\v2/prompt/default/Chinese-2.py
Loading model - J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6

Loading J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth ...
Strategy: (total 40+1=41 layers)

  • cuda [float16, uint8], store 21 layers
  • cuda [float16, float16], store 20 layers
    0-cuda-float16-uint8 1-cuda-float16-uint8 2-cuda-float16-uint8 3-cuda-float16-uint8 4-cuda-float16-uint8 5-cuda-float16-uint8 6-cuda-float16-uint8 7-cuda-float16-uint8 8-cuda-float16-uint8 9-cuda-float16-uint8 10-cuda-float16-uint8 11-cuda-float16-uint8 12-cuda-float16-uint8 13-cuda-float16-uint8 14-cuda-float16-uint8 15-cuda-float16-uint8 16-cuda-float16-uint8 17-cuda-float16-uint8 18-cuda-float16-uint8 19-cuda-float16-uint8 20-cuda-float16-uint8 21-cuda-float16-float16 22-cuda-float16-float16 23-cuda-float16-float16 24-cuda-float16-float16 25-cuda-float16-float16 26-cuda-float16-float16 27-cuda-float16-float16 28-cuda-float16-float16 29-cuda-float16-float16 30-cuda-float16-float16 31-cuda-float16-float16 32-cuda-float16-float16 33-cuda-float16-float16 34-cuda-float16-float16 35-cuda-float16-float16 36-cuda-float16-float16 37-cuda-float16-float16 38-cuda-float16-float16 39-cuda-float16-float16 40-cuda-float16-float16
    emb.weight f16 cpu 50277 5120
    blocks.0.ln1.weight f16 cuda:0 5120
    blocks.0.ln1.bias f16 cuda:0 5120
    blocks.0.ln2.weight f16 cuda:0 5120
    blocks.0.ln2.bias f16 cuda:0 5120
    blocks.0.att.time_decay f32 cuda:0 5120
    blocks.0.att.time_first f32 cuda:0 5120
    blocks.0.att.time_mix_k f16 cuda:0 5120
    blocks.0.att.time_mix_v f16 cuda:0 5120
    blocks.0.att.time_mix_r f16 cuda:0 5120
    blocks.0.att.key.weight i8 cuda:0 5120 5120
    blocks.0.att.value.weight i8 cuda:0 5120 5120
    blocks.0.att.receptance.weight i8 cuda:0 5120 5120
    blocks.0.att.output.weight i8 cuda:0 5120 5120
    blocks.0.ffn.time_mix_k f16 cuda:0 5120
    blocks.0.ffn.time_mix_r f16 cuda:0 5120
    blocks.0.ffn.key.weight i8 cuda:0 5120 20480
    blocks.0.ffn.receptance.weight i8 cuda:0 5120 5120
    blocks.0.ffn.value.weight i8 cuda:0 20480 5120
    ...........................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
    File "J:\ChatRWKV\v2\chat.py", line 110, in
    model = RWKV(model=args.MODEL_NAME, strategy=args.strategy)
    File "J:\ChatRWKV\python3.10.10\lib\site-packages\torch\jit_script.py", line 293, in init_then_script
    original_init(self, *args, **kwargs)
    File "J:\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 192, in init
    w[x] = w[x] / (2 ** int(layer_id // self.RESCALE_LAYER))
    RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 209715200 bytes.
@BlinkDL
Copy link
Owner

BlinkDL commented Mar 17, 2023

Please see #40 :)

Will be much better soon.

Currently it requires huge amt (probably 40G+?) of CPU RAM to load 14B.

@BlinkDL
Copy link
Owner

BlinkDL commented Mar 19, 2023

Update ChatRWKV v2 & pip rwkv package (0.7.0):
Use v2/convert_model.py to convert a model for a strategy, for faster loading & saves CPU RAM.
@936384885xy

@BlinkDL BlinkDL closed this as completed Mar 19, 2023
@nanonomad
Copy link

I've updated to the latest source and reinstalled the pip package. Unfortunately, convert_model.py appears to act similarly to the v2/chat.py script in terms of when it terminates. The message is "Killed" rather than a segmentation fault this time.

Does it need to be run on a machine with 40+gb ram? I attempted to run it on Google Colab as well, and it behaved the same (but iirc that is only 25gb ram)

Example launch command:
RWKV_CUDA_ON=1 python v2/convert_model.py --in RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth --out RWKV-4-Pile-14B-20230313-ctx8192-test1050.converted.pth --strategy 'cuda fp16 *8+'

@BlinkDL
Copy link
Owner

BlinkDL commented Mar 20, 2023

I've updated to the latest source and reinstalled the pip package. Unfortunately, convert_model.py appears to act similarly to the v2/chat.py script in terms of when it terminates. The message is "Killed" rather than a segmentation fault this time.

Find a machine with lots of RAM to convert it :) Then you can load using much less RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants