Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python train.py Loss not decreasing #26

Closed
cclamd opened this issue Jan 3, 2024 · 4 comments
Closed

python train.py Loss not decreasing #26

cclamd opened this issue Jan 3, 2024 · 4 comments

Comments

@cclamd
Copy link

cclamd commented Jan 3, 2024

hi ,i add the data according to the readme document ,but when i run python train.py it shows

class screw
args1.json defaultdict(<class 'str'>, {'img_size': [256, 256], 'Batch_Size': 2, 'EPOCHS': 300, 'T': 1000, 'base_channels': 128, 'beta_schedule': 'linear', 'loss_type': 'l2', 'diffusion_lr': 0.0001, 'seg_lr': 1e-05, 'random_slice': True, 'weight_decay': 0.0, 'save_imgs': True, 'save_vids': False, 'dropout': 0, 'attention_resolutions': '32,16,8', 'num_heads': 4, 'num_head_channels': -1, 'noise_fn': 'gauss', 'channels': 3, 'mvtec_root_path': '/content/drive/MyDrive/DiffusionAD/datasets/mvtec', 'visa_root_path': 'datasets/VisA_1class/1cls', 'dagm_root_path': 'datasets/dagm', 'mpdd_root_path': 'datasets/mpdd', 'anomaly_source_path': '/content/drive/MyDrive/DiffusionAD/datasets/dtd', 'noisier_t_range': 600, 'less_t_range': 300, 'condition_w': 1, 'eval_normal_t': 200, 'eval_noisier_t': 400, 'output_path': 'outputs', 'arg_num': '1'})
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Epoch:0, Train loss: nan: 1% 1/160 [00:04<12:14, 4.62s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/309.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/309.png
Epoch:0, Train loss: nan: 1% 2/160 [00:06<08:03, 3.06s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/151.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/151.png
thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/023.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/023.png
thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/180.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/180.png
Epoch:0, Train loss: nan: 2% 3/160 [00:08<06:13, 2.38s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/015.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/015.png
thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/292.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/292.png
Epoch:0, Train loss: nan: 2% 4/160 [00:09<05:21, 2.06s/it]thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/113.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/113.png
thresh_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/DISthresh/good/152.png
image_path /content/drive/MyDrive/DiffusionAD/datasets/mvtec/screw/train/good/152.png

i print the data path of the image_path and thresh_path ,the path is right,but why loss can't decrease

@HuiZhang0812
Copy link
Owner

The default configuration for batch size is 16. A small batch size, such as the one you set to 2, may lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9).

@cclamd
Copy link
Author

cclamd commented Jan 4, 2024

ok, thanks ,so which value should i set for the min of batch size to decrease the loss, should i have to set it to 16 ?

#16

@HuiZhang0812
Copy link
Owner

If your GPU RAM is sufficiently large, setting the batch size to 16 is recommended.

@cclamd
Copy link
Author

cclamd commented Jan 8, 2024

thanks , i try some value and find "batch size =6 " is the min value

FireShot Capture 016 - DiffusionAD ipynb - Colaboratory - colab research google com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants