Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning only runs on CPU #29

Open
diabeticpilot opened this issue Mar 11, 2024 · 4 comments
Open

Fine tuning only runs on CPU #29

diabeticpilot opened this issue Mar 11, 2024 · 4 comments

Comments

@diabeticpilot
Copy link

Hello,

I am running this on a few 2X 4090 cloud instances on Vast to test and benchmark. Most machines work without issues, however sometimes I have noticed on certain machines that the GPUs are never used and the fine-tuning stays running on the CPU only. Llama 2 70B can get 15-18s/it on most instances. For ones where the GPUs are not used, it is 800s/it.

nvidia-smi is showing no active processes and 0% on both GPUs. Any idea on how to troubleshoot or fix this issue?

Here is how I am running it and all the settings:

export CUDA_VISIBLE_DEVICES=1,0
python train.py --model_name meta-llama/Llama-2-70b-hf --batch_size 2 --context_length 2048 --precision bf16 --train_type qlora --use_gradient_checkpointing true --use_cpu_offload true --dataset alpaca --reentrant_checkpointing true \

Performance:
[42:45<2887:27:12, 803.50s/it]

nvidia-smi:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:41:00.0 Off | Off |
| 30% 29C P8 20W / 450W | 10717MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:61:00.0 Off | Off |
| 30% 30C P8 24W / 450W | 11015MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

@johnowhitaker
Copy link
Contributor

I think on some shared machines export CUDA_VISIBLE_DEVICES=1,0 might reference cards other than the ones you're assigned. (Don't quote me on this but I think I just hit a similar issue). Removing that and running just the training script in a new shell where CUDA_VISIBLE_DEVICES isn't defined worked in my case.

@js-2024
Copy link

js-2024 commented Apr 15, 2024

I'm having the same issue on Linux Mint with 7x3090. The behavior is almost identical to what diabeticpilot described above, right down to the GPUs loading up a little under 12G of VRAM each, then going dormant and the CPU going to max. CPU RAM was allocated around 128GB.

@zhksh
Copy link

zhksh commented Apr 23, 2024

I'm having the same issue on Linux Mint with 7x3090. The behavior is almost identical to what diabeticpilot described above, right down to the GPUs loading up a little under 12G of VRAM each, then going dormant and the CPU going to max. CPU RAM was allocated around 128GB.

same here, alternating usage of GPU (4x3090) and CPU (24cores maxed out), training llama-3-8b, ~45s/it .
No idea whats going on, but the loss logs right after CPU drops and GPU takes over, feels like inference is done on CPU and backprop on GPU.

@zhksh
Copy link

zhksh commented Apr 23, 2024

ok sorry, --use_cpu_offload false helps, i assumed "false" to be default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants