Modify image size and training for Inception Models #425
Conversation
Thanks @soumendukrg! Please rebase the PR on top of the 'master' branch which now supports PyTorch 1.3. Cheers |
pytorch 1.3 and torchvision 0.4: initial adaptations 1. change requirements.txt (new version numbers for pytorch and torchvision) 2. Change onnx op.uniqueName to op.debugName See: lanpa/tensorboardX@9084ab8 lanpa/tensorboardX#483 Fixes in SummaryGraph and related tests for PyTorch 1.3 * Naming of trace entries changed in 1.3. One such change is that the "root" input of the model is now named 'input.1' instead of just '0'. Fixed test that checked that. * One of the workarounds for scope names after ONNX pass is not needed anymore. Removed it and updated relevant test. adjust full_flow_tests.py for pytorch 1.3 results Move to PyTorch 1.3.1 and torchvision 0.4.2 + fix full_flow tests Unit tests: Adjust tolerance in test_sim_bn_fold + filter some warnings Updated expected acc for system tests fixed image_size and training loss, accuracy for inception models Revert "fixed image_size and training loss, accuracy for inception models" This reverts commit fbbd351. Revert "Revert "fixed image_size and training loss, accuracy for inception models"" This reverts commit ed895e6. new file: tests/layer_quant_params.yaml new file: tests/quant_stats_after_prepare_model.yaml delete full_system_log generated yaml files fix typos
@nzmora I have rebased the PR as you suggested. Please review the changes. Thanks. NOTE: Training using Inception_V3 is only possible on a single GPU as of now. This issue talks about this problem. I have checked and this problem persists in torch 1.3.0: |
Thanks @soumendukrg! We've looked at your PR and we are thinking about it. On the one hand, adding The fundamental problem that you've uncovered in our API is that our assumption that data loading is dependent purely on the dataset is... wrong. When we load a DS, we also pre-process it - and the pre-processing is model-specific in corner-cases (as you've shown with inception v3). So we have to go and change the relevant API functions (incl. related functions such as A rather small functional change, can create a large code-change ripple, when the API is not right :-(. Thanks! |
@nzmora Further, I solved the distributed training of inception and googlenet networks using the solution posted in the PyTorch issue I referred earlier. In addition, |
I discovered another flaw in inception retraining using Inception_V3 transforms normalizes the input inside the network itself:
One can find more details in pytorch forum post as to why is this performed: This needs to be reviewed. I am thinking of an efficient way to solve this. Any thoughts @nzmora @guyjacob ? |
Hi @soumendukrg , So how do we handle this issue? I need to sleep on it. Let me know what you think. |
@barrh - please take a look and give us your opinion. |
First, thanks for bringing this up @soumendukrg . This is very problematic that these issues with using inception models go unnoticed without any useful warning. To my understanding, the following are model-dependent:
The dataset load functionality should be kept generic, therefore, I would pass the (1) and (2) as orthogonal arguments. We can write a small lookup utility, based on model name/type, to find the matching values for those. e.g. Regarding the loss computation (3), since it's done in compress_classifier.py, I think the implementation is fine (great documentation!), but moving it into a separate function could be better to clean up the main loop. Other notes:
|
This is based on PR #425 (@soumendukrg). The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models require input images size of [1, 3, 299, 299]. Thus, we need to change the pre-processing. This commit does not include other required fixes (loss handling; specialized input normalization). Co-authored-by: Soumendu Kumar Ghosh <soumendu@ece.iitkgp.ernet.in>
Hi @soumendukrg, @barrh It is basically a slightly more generic way to replace Thanks! |
Hi @soumendukrg , Some further thoughts (just doing the math, so we have this on record): From looking at the Inception papers I could not deduce the preprocessing they recommend. However, looking at some Keras and TF code, I understand that the standard way of preprocessing Inception input is: Let
where Assuming inputs
if we expand this we get:
And this formula matches the code you gave above:
Despite what I wrote above, about the "standard" Inception preprocessing being different than other models, I think that this is not true for TorchVision's Inception.
This is quite explicit.
And if you reverse-engineer the Cadene code you see that 'inceptionv3' from Cadene uses 'inception_v3' from TorchVision! One of them is incorrect... So I ran two evaluations on
The TorchVision documentation claims we should be expecting Top1: 77.45 and Top5: 93.56. This is the second indication that we should be using only the torchvision preprocessing. So I changed the invocation of Inception models such that Cheers |
Hi @nzmora , sorry I took so long to reply. I have reviewed the branch you created, and that definitely handles the inception models in an efficient way. I am rewriting my code using your branch and will make a separate function for the loss function according to @barrh suggestion. Regarding the normalization, thanks for taking a deeper look into this issue. I got the exact same results as you did when I used the 2 different types of preprocessing. Infact, the googlenet results match exactly with the one you reported with inception preprocessing. I will update the PR soon. |
@nzmora , I have updated the PR with all inception related changes which you made in your branch inception_support. Additionally, I have created a separate function for training loss calculation for inception networks with lots of documentations. I have also merged it with the current master, as you can already see. Please review and let me know if I need to make any changes. Thanks. |
* Merge pytorch 1.3 commits This PR is a fix for issue #422. 1. ImageNet models usually use input size [batch, 3, 224, 224], but all Inception models require an input image size of [batch, 3, 299, 299]. 2. Inception models have auxiliary branches which contribute to the loss only during training. The reported classification loss only considers the main classification loss. 3. Inception_V3 normalizes the input inside the network itself. More details can be found in @soumendukrg's PR #425 [comments](IntelLabs/distiller#425 (comment)). NOTE: Training using Inception_V3 is only possible on a single GPU as of now. This issue talks about this problem. I have checked and this problem persists in torch 1.3.0: [inception_v3 of vision 0.3.0 does not fit in DataParallel of torch 1.1.0 #1048](pytorch/vision#1048) Co-authored-by: Neta Zmora <neta.zmora@intel.com>
* Merge pytorch 1.3 commits This PR is a fix for issue #422. 1. ImageNet models usually use input size [batch, 3, 224, 224], but all Inception models require an input image size of [batch, 3, 299, 299]. 2. Inception models have auxiliary branches which contribute to the loss only during training. The reported classification loss only considers the main classification loss. 3. Inception_V3 normalizes the input inside the network itself. More details can be found in @soumendukrg's PR #425 [comments](#425 (comment)). NOTE: Training using Inception_V3 is only possible on a single GPU as of now. This issue talks about this problem. I have checked and this problem persists in torch 1.3.0: [inception_v3 of vision 0.3.0 does not fit in DataParallel of torch 1.1.0 #1048](pytorch/vision#1048) Co-authored-by: Neta Zmora <neta.zmora@intel.com>
This is based on PR #425 (@soumendukrg). The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models require input images size of [1, 3, 299, 299]. Thus, we need to change the pre-processing. This commit does not include other required fixes (loss handling; specialized input normalization). Co-authored-by: Soumendu Kumar Ghosh <soumendu@ece.iitkgp.ernet.in>
This is based on PR #425 (@soumendukrg). The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models require input images size of [1, 3, 299, 299]. Thus, we need to change the pre-processing. This commit does not include other required fixes (loss handling; specialized input normalization). Co-authored-by: Soumendu Kumar Ghosh <soumendu@ece.iitkgp.ernet.in>
This is based on PR #425 (@soumendukrg). The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models require input images size of [1, 3, 299, 299]. Thus, we need to change the pre-processing. This commit does not include other required fixes (loss handling; specialized input normalization). Co-authored-by: Soumendu Kumar Ghosh <soumendu@ece.iitkgp.ernet.in>
This PR is a fix for issue #422.
The file data_loader had fixed classification image size for ImageNet as [1, 3, 224, 224]. However, all Inception models requires an input image size of [1, 3, 299, 299].
To fix this issue, I modified the apputils/image_classifier.py file to add a new parameter to the load_data function. This function calls apputils.load_data, so I changed the corresponding function in apputils/data_loader.py.
Also, image_classifier.py is modified to consider both losses from the normal classifier and aux_logits classifier of inception network, and the classification accuracy is calculated only from normal classifier.
@nzmora : Please review the changes.
PS: My fork is based on PyTorch 1.3, so all additional changes for PyTorch1.3 support are present in this PR.