Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: data is not aligned! This can lead to a speed loss #13

Open
WhenMelancholy opened this issue Aug 14, 2023 · 3 comments
Open

Warning: data is not aligned! This can lead to a speed loss #13

WhenMelancholy opened this issue Aug 14, 2023 · 3 comments

Comments

@WhenMelancholy
Copy link

During the training process, I encountered the following warning outputs:

Sanity Checking DataLoader 0:   0%|                                                                                                     | 0/2 [00:00<?, ?it/s][swscaler @ 0x641c700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x743a880] Warning: data is not aligned! This can lead to a speed loss
Epoch 0:   0%|                                                                                                                        | 0/565 [00:00<?, ?it/s][swscaler @ 0x59d9700] Warning: data is not aligned! This can lead to a speed loss
[swscaler @ 0x6c7f880] Warning: data is not aligned! This can lead to a speed loss

Although it did not affect the training, I am unclear about the reason behind this. My training instructions are as follows:

CUDA_VISIBLE_DEVICES=2 PL_TORCH_DISTRIBUTED_BACKEND=gloo PYTHONPATH=.:$PYTHONPATH python train/train_vqgan.py dataset=mrnet dataset.root_dir="~/github/medicaldiffusion/data/MRNet-v1.0/" model=vq_gan_3d model.gpus=1 model.default_root_dir="~/github/medicaldiffusion/when/checkpoints/vq_gan" model.default_root_dir_postfix="mrnet" model.precision=16 model.embedding_dim=8 model.n_hiddens=16 model.downsample=[4,4,4] model.num_workers=32 model.gradient_clip_val=1.0 model.lr=3e-4 model.discriminator_iter_start=10000 model.perceptual_weight=4 model.image_gan_weight=1 model.video_gan_weight=1 model.gan_feat_weight=4 model.batch_size=2 model.n_codes=16384 model.accumulate_grad_batches=1 

These instructions are referenced from train_vqgan.sh.

Thank you in advance!

@benearnthof
Copy link

@WhenMelancholy This happened for me aswell, as far as I know this indicates that the number of images in your training data is not evenly divisible by the number of CUDA devices you're training on. This should only have a negligible impact on training as long as you're only training on one server. I believe this is a warning from PyTorch lightning.

@xiexing0916
Copy link

@benearnthof This happened for me aswell, could you please tell me how to debug? Is it because the dataset is not divisible by 16?

@benearnthof
Copy link

There is no reason to debug anything as this warning just indicates some minor inefficiencies when scaling images. My prior statement may be incorrect as this most likely stems from one of the image dimensions not being divisible by 16. This should not impact the model however

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants