-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A100 80GB lora training out of memory #12
Comments
same here. a6000 cuda at rank 4 and 512. somethings whack here |
I tried Lora training with flux-schnell and dev models using train batch size 1, gradient_accumulation_steps 4, and rank 2 on 40gb a100, but it still raised Cuda out of memory exception. |
same here |
You need to run
Launch using:
With default settings in the example files this requires 42,837 MiB VRAM. |
@thavocado How much RAM are you using? I"m getting a SIGKILL with ~50GB of RAM |
Please use Deepspeed and set Accelerate accordingly. |
really shouldn't need deepspeed though. i think it's because the VAE and T5 / CLIP are all loaded during training. |
@arcanite24 40GB of RAM here |
#12 (comment) I deployed 3 x A40 48GB GPUs on Runpod for training and got the following results:
It seems that to train at 1024 resolution, we might need at least about 150GB of VRAM or more...(?) |
Just 30 hours on ADA 6000. Using config from #12 (comment) is it fine or I need to adjust something. 42GB used |
looks like the proper speed |
I'm running on 8*A100 for 1024 img size and requires around 61GB for each GPU |
I'm running on 1*A100 for 1024 img size and requires around 63GB(65147MiB) for A100-GPU, while i have processed the data and only load vae and dit model |
Training lora encounters insufficient video memory on a single A100 80GB graphics card. Any help would be much appreciated.
lora rank is 16 and batch size is 1.
The text was updated successfully, but these errors were encountered: