-
-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DefaultCPUAllocator: not enough memory #41
Comments
Please see #40 :) Will be much better soon. Currently it requires huge amt (probably 40G+?) of CPU RAM to load 14B. |
Update ChatRWKV v2 & pip rwkv package (0.7.0): |
I've updated to the latest source and reinstalled the pip package. Unfortunately, convert_model.py appears to act similarly to the v2/chat.py script in terms of when it terminates. The message is "Killed" rather than a segmentation fault this time. Does it need to be run on a machine with 40+gb ram? I attempted to run it on Google Colab as well, and it behaved the same (but iirc that is only 25gb ram) Example launch command: |
Find a machine with lots of RAM to convert it :) Then you can load using much less RAM. |
I succeed to run 7B model. But when I tried to run 14B model on my 4080 GPU by setting "args.strategy = 'cuda fp16i8 *21 -> cuda fp16 *20'"and "os.environ["RWKV_CUDA_ON"] = '0'", it reports an error.
During the process the program consume all my 32GB CPU memory, the log is as followed.
ChatRWKV v2 https://github.com/BlinkDL/ChatRWKV
Chinese - cuda fp16i8 *21 -> cuda fp16 *20 - J:\ChatRWKV\v2/prompt/default/Chinese-2.py
Loading model - J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050
RWKV_JIT_ON 1 RWKV_CUDA_ON 0 RESCALE_LAYER 6
Loading J:/ChatRWKV/RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth ...
Strategy: (total 40+1=41 layers)
0-cuda-float16-uint8 1-cuda-float16-uint8 2-cuda-float16-uint8 3-cuda-float16-uint8 4-cuda-float16-uint8 5-cuda-float16-uint8 6-cuda-float16-uint8 7-cuda-float16-uint8 8-cuda-float16-uint8 9-cuda-float16-uint8 10-cuda-float16-uint8 11-cuda-float16-uint8 12-cuda-float16-uint8 13-cuda-float16-uint8 14-cuda-float16-uint8 15-cuda-float16-uint8 16-cuda-float16-uint8 17-cuda-float16-uint8 18-cuda-float16-uint8 19-cuda-float16-uint8 20-cuda-float16-uint8 21-cuda-float16-float16 22-cuda-float16-float16 23-cuda-float16-float16 24-cuda-float16-float16 25-cuda-float16-float16 26-cuda-float16-float16 27-cuda-float16-float16 28-cuda-float16-float16 29-cuda-float16-float16 30-cuda-float16-float16 31-cuda-float16-float16 32-cuda-float16-float16 33-cuda-float16-float16 34-cuda-float16-float16 35-cuda-float16-float16 36-cuda-float16-float16 37-cuda-float16-float16 38-cuda-float16-float16 39-cuda-float16-float16 40-cuda-float16-float16
emb.weight f16 cpu 50277 5120
blocks.0.ln1.weight f16 cuda:0 5120
blocks.0.ln1.bias f16 cuda:0 5120
blocks.0.ln2.weight f16 cuda:0 5120
blocks.0.ln2.bias f16 cuda:0 5120
blocks.0.att.time_decay f32 cuda:0 5120
blocks.0.att.time_first f32 cuda:0 5120
blocks.0.att.time_mix_k f16 cuda:0 5120
blocks.0.att.time_mix_v f16 cuda:0 5120
blocks.0.att.time_mix_r f16 cuda:0 5120
blocks.0.att.key.weight i8 cuda:0 5120 5120
blocks.0.att.value.weight i8 cuda:0 5120 5120
blocks.0.att.receptance.weight i8 cuda:0 5120 5120
blocks.0.att.output.weight i8 cuda:0 5120 5120
blocks.0.ffn.time_mix_k f16 cuda:0 5120
blocks.0.ffn.time_mix_r f16 cuda:0 5120
blocks.0.ffn.key.weight i8 cuda:0 5120 20480
blocks.0.ffn.receptance.weight i8 cuda:0 5120 5120
blocks.0.ffn.value.weight i8 cuda:0 20480 5120
...........................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last):
File "J:\ChatRWKV\v2\chat.py", line 110, in
model = RWKV(model=args.MODEL_NAME, strategy=args.strategy)
File "J:\ChatRWKV\python3.10.10\lib\site-packages\torch\jit_script.py", line 293, in init_then_script
original_init(self, *args, **kwargs)
File "J:\ChatRWKV\v2/../rwkv_pip_package/src\rwkv\model.py", line 192, in init
w[x] = w[x] / (2 ** int(layer_id // self.RESCALE_LAYER))
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 209715200 bytes.
The text was updated successfully, but these errors were encountered: