Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust SqueezeNet for ImageNet #89

Open
GlebSBrykin opened this issue Jan 13, 2021 · 10 comments
Open

Robust SqueezeNet for ImageNet #89

GlebSBrykin opened this issue Jan 13, 2021 · 10 comments

Comments

@GlebSBrykin
Copy link

At the moment, I have managed to run your library on my computer with a GPU and I would like to train a robust SqueezeNet 1.1 on the Imagenet dataset. But I ran into a problem: Imagenet is no longer available for download. I managed to download the validation and training part from academictorrents, but I couldn't find devkit archive. Please upload this archive here, it is only 2.5 MB. Without this, it is impossible to start training...☹️

@dtsip
Copy link
Contributor

dtsip commented Jan 13, 2021

As far as I know, ImageNet is still available for download (under certain terms and conditions) from here.

That being said, I do not fully understand why the training and validation images are not enough to train a model.

@GlebSBrykin
Copy link
Author

The fact is that in this file, as I understand it, there are annotations for the files of the validation part of the dataset. That is, it is impossible to test the model without it. But the problem, apparently, was solved. robustness does not use torchvision.datasets.ImageNet. In general, I am now preparing datasets for robustness.

@GlebSBrykin
Copy link
Author

Well. I managed to prepare the dataset and run the training through the robustness CLI. But there is a problem. The maximum size of the batch is 4. And this is on SqueezeNet 1.1. 8 GB of RAM and 3 GB of video memory. Why such a large memory consumption? Is there any way to fix this? 1 epoch on ImageNet requires 24 hours and it would be nice to increase the size of the batch.

@dtsip
Copy link
Contributor

dtsip commented Jan 14, 2021

It is possible to measure the number of model parameters and the number of activations produced during the forward and backward passes to directly see what is consuming memory.

Unfortunately, we do not have the capacity to investigate this, especially since it is not as issue directly related to the library, but rather standard DNN training.

@GlebSBrykin
Copy link
Author

Okay, then I'll rephrase the question a little. Does robust learning put an extra load on memory compared to regular learning?

@dtsip
Copy link
Contributor

dtsip commented Jan 14, 2021

Nope. Robust learning requires additional passes through the model, but the memory footprint is essentially the same. (When using use_best for PGD there will be an additional copy of the input stored, but this is tiny compared to the size of saving the model activations, which is where the real memory consumption happens.)

@GlebSBrykin
Copy link
Author

Good. Then what settings would you recommend me to use? I mean learning_rate, etc. I will teach on the standard ImageNet.

@GlebSBrykin
Copy link
Author

And what's more, is it possible to compute different parts of the network on different devices? For example, for VGG19. The convolutional part is calculated on the GPU, and the fully connected part is calculated on the CPU.

@dtsip
Copy link
Contributor

dtsip commented Jan 29, 2021

We typically train robust models with the same parameters as their standard version. So I would start by using the parameters used for a standard SqueezeNet on ImageNet.

Yes, it should be possible to use different devices. It might require modifying that training code a bit though since this is not a typical use-case.

@GlebSBrykin
Copy link
Author

So, I keep trying to train the robust SqueezeNet. Decided to use RestrictedImageNet due to limited resources. I have a problem: Robust SqueezeNet1_1 seems to behave incorrectly. This is expressed in the fact that when I try to use it for style transfer, the loss is always equal to nan. And even at the first iteration, before the image update and the optimizer step. Moreover, the code was tested on a regular SqueezeNet from the PyTorch repository and no problems were observed. I do not know how normal it is that the training loss decreases from 1.6000 to 1.5500 per epoch. In my opinion, this is too little. The parameters are as follows: lr = 0.01, attack-lr = 0.05, attack-steps = 7, eps = 3.0, batch-size = 4, constraint = 2. And one more question: is it possible to extract from ImageNet only the data that is used in training on RestrictedImageNet? I'd like to train the model in Google.Colab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants