I tried training with 6 NVIDIA L40 GPUs (memory 46068MB) following the recommended parameters python train.py model=deflow lr=2e-4 epochs=20 batch_size=16 loss_fn=deflowLoss, but I encountered a memory overflow error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 2; 44.31 GiB total capacity; 42.31 GiB already allocated; 10.31 MiB free; 43.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
It is obvious that this means I may have set an unreasonable batch size. Immediately after, I checked the configurations in the original paper and found that it states:
In Table I for the Argoverse 2 test dataset, our DeFlow implementation is as follows: We use four GRU iterations (as shown in Table II), and the model trains for a total of 50 epochs with a batch size of 80.
All local experiments were executed on a desktop powered by an Intel® Core™ i9-12900KF and equipped with a GeForce RTX 3090 GPU.
I am curious about how to run a batch size of 80 on a 24GB memory RTX 3090, and whether the recommended configuration requires higher memory support (like 80GB)?
I tried training with 6 NVIDIA L40 GPUs (memory 46068MB) following the recommended parameters
python train.py model=deflow lr=2e-4 epochs=20 batch_size=16 loss_fn=deflowLoss, but I encountered a memory overflow error:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 2; 44.31 GiB total capacity; 42.31 GiB already allocated; 10.31 MiB free; 43.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.It is obvious that this means I may have set an unreasonable batch size. Immediately after, I checked the configurations in the original paper and found that it states:
I am curious about how to run a batch size of 80 on a 24GB memory RTX 3090, and whether the recommended configuration requires higher memory support (like 80GB)?