Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-training on the Cat/Dog Dataset #370

Closed
FingerVonFrings opened this issue Jul 23, 2019 · 16 comments
Closed

Re-training on the Cat/Dog Dataset #370

FingerVonFrings opened this issue Jul 23, 2019 · 16 comments

Comments

@FingerVonFrings
Copy link

FingerVonFrings commented Jul 23, 2019

Hello

error when in Processing Images with TensorRT. Please see error below.

306lab:~/jetson-inference/python/training/imagenet$ imagenet-console.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt $DATASET/test/cat/011.jpg wgoutput011.jpg

jetson.inference.init.py
jetson.inference -- initializing Python 2.7 bindings...
jetson.inference -- registering module types...
jetson.inference -- done registering module types
jetson.inference -- done Python 2.7 binding initialization
jetson.utils.init.py
jetson.utils -- initializing Python 2.7 bindings...
jetson.utils -- registering module functions...
jetson.utils -- done registering module functions
jetson.utils -- registering module types...
jetson.utils -- done registering module types
jetson.utils -- done Python 2.7 binding initialization
[image] loaded '/home/hfut/datasets/cat_dog/test/cat/011.jpg' (700 x 525, 3 channels)
jetson.inference -- PyTensorNet_New()
jetson.inference -- PyImageNet_Init()
jetson.inference -- imageNet loading network using argv command line params
jetson.inference -- imageNet.init() argv[0] = '--model=cat_dog/resnet18.onnx'
jetson.inference -- imageNet.init() argv[1] = '--input_blob=input_0'
jetson.inference -- imageNet.init() argv[2] = '--output_blob=output_0'
jetson.inference -- imageNet.init() argv[3] = '--labels=/home/hfut/datasets/cat_dog/labels.txt'

imageNet -- loading classification network model from:
-- prototxt (null)
-- model cat_dog100/resnet18.onnx
-- class_labels /home/hfut/datasets/cat_dog/labels.txt
-- input_blob 'input_0'
-- output_blob 'output_0'
-- batch_size 1

[TRT] TensorRT version 5.0.6
[TRT] loading NVIDIA plugins...
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - ONNX (extension '.onnx')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file cat_dog/resnet18.onnx.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading /usr/bin/ cat_dog/resnet18.onnx

Input filename: cat_dog/resnet18.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string:

WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
While parsing node number 69 [Gather -> "192"]:
ERROR: /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.0/parsers/onnxOpenSource/ModelImporter.cpp:142 In function importNode:
[8] No importer registered for op: Gather
[TRT] failed to parse ONNX model 'cat_dog/resnet18.onnx'
[TRT] device GPU, failed to load cat_dog/resnet18.onnx
[TRT] failed to load cat_dog/resnet18.onnx
[TRT] imageNet -- failed to initialize.
jetson.inference -- imageNet failed to load built-in network 'googlenet'
PyTensorNet_Dealloc()
Traceback (most recent call last):
File "/usr/local/bin/imagenet-console.py", line 53, in
net = jetson.inference.imageNet(opt.network, argv)
Exception: jetson.inference -- imageNet failed to load network
jetson.utils -- freeing CUDA mapped memory`

but if i download this completed model that was trained for a full 100 epochs from here:
then it's ok in Processing Images with TensorRT.
I notice it will generate (resnet18.onnx.1.1.GPU.FP16.engine) file.
but when i use my model ,this file unable to generate.
Any help?Thanks

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 23, 2019

Hi @FingerVonFrings, this issue was fixed by patching ResNet-18 model definition in my fork of torchvision with this commit dusty-nv/vision@5c46136

So you may want to uninstall torchvision package, and re-install it from my fork:

$ sudo pip uninstall torchvision
$ python -c "import torchvision"   # should make error if succesfully uninstalled
$ git clone -bv0.3.0 https://github.com/dusty-nv/vision
$ vision
$ sudo python setup.py install

Then you should be able to train again. At first you can try training for just a couple epochs, then run onnx_export.py script and try imagenet-console again to make sure it works before doing more training.

@FingerVonFrings
Copy link
Author

It does work.Thank you so much for your reply and advice!!!

@duttasantanuGH
Copy link

Hi dusty-nv
First of all thanks for your comprehensive and well curated resource guide. I am facing following error while trying to install pytorch following your above instruction. Unfortunately, I am facing the following error: Kindly help me in resolving this.

dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ sudo python setup.py install
Traceback (most recent call last):
File "setup.py", line 6, in
from setuptools import setup, find_packages
ImportError: No module named setuptools

Setuptools are already installed but I am getting this error.

Thanks Santanu

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 30, 2019 via email

@duttasantanuGH
Copy link

Yes it is working properly in interactive tool. I faced the same issue as mentioned in this thread and hence need to install again.

@duttasantanuGH
Copy link

Do you want me to install using interactive app? Previously i installed python3 version using interactive app. But faced the same issue as FingerVonFrings.

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 30, 2019 via email

@duttasantanuGH
Copy link

Yes my python is mapped to python3. As mentioned earlier, in nteractive tool, I can see setuptools can be imported successfully.
dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ python
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.

import setuptools
print(setuptools.version)
41.0.1

But when I try to run the following scripts, I am getting error as mentioned before
$ sudo pip uninstall torchvision
$ python -c "import torchvision" # should make error if succesfully uninstalled
$ git clone -bv0.3.0 https://github.com/dusty-nv/vision
$ vision
$ sudo python setup.py install

If I use your pytorch installer - pytorch get installed properly. I have done checking steps post installation to ensure it as suggested by you. I installed python3 compatible version of torchvision using interactive tool. But when retaining also takes place without any issues. I am being able to convert onnx file. But then at the "Processing Images with TensorRT" I am facing the same error as mentioned in this thread.

To overcome this issues, I tried to uninstall and install torchvision as suggested by you but at that stage facing this issue. Hope this clarifies.

Please help in resolving the issue.
Thanks and Regards
Santanu

@duttasantanuGH
Copy link

Hi dusty
you are absolutely right that python mapping was not correct. I was mapping it for session but that was not effective for sudo...
I have corrected it. But getting the following error: Can you please resolving it?
Installed /usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg
Processing dependencies for torchvision==0.3.0
Searching for torch>=1.1.0
Reading https://pypi.org/simple/torch/
No local packages or working download links found for torch>=1.1.0
error: Could not find suitable distribution for Requirement.parse('torch>=1.1.0')

Thanks Santanu

@dusty-nv
Copy link
Owner

dusty-nv commented Jul 31, 2019 via email

@duttasantanuGH
Copy link

Thank you Dusty for your kind advice. It worked like a charm yesterday. I ran it manually.

@ghost
Copy link

ghost commented Sep 11, 2019

Hello Dusty. I faced the same problem. At first I used python3 to install torchvision and torch, after I failed, I tried to used python2 to re-train and run onnx_export.py script. everything worked fine until here but when I tried imagenet-console again. I still got the same error. Could you help me??

Here is the error:
jetson.inference.init.py
jetson.inference -- initializing Python 2.7 bindings...
jetson.inference -- registering module types...
jetson.inference -- done registering module types
jetson.inference -- done Python 2.7 binding initialization
jetson.utils.init.py
jetson.utils -- initializing Python 2.7 bindings...
jetson.utils -- registering module functions...
jetson.utils -- done registering module functions
jetson.utils -- registering module types...
jetson.utils -- done registering module types
jetson.utils -- done Python 2.7 binding initialization
[image] loaded '/home/krsbi/datasets/cat_dog/test/dog/01.jpg' (500 x 375, 3 channels)
jetson.inference -- PyTensorNet_New()
jetson.inference -- PyImageNet_Init()
jetson.inference -- imageNet loading network using argv command line params
jetson.inference -- imageNet.init() argv[0] = '--model=cat_dog/resnet18.onnx'
jetson.inference -- imageNet.init() argv[1] = '--input_blob=input_0'
jetson.inference -- imageNet.init() argv[2] = '--output_blob=output_0'
jetson.inference -- imageNet.init() argv[3] = '--labels=~/datasets/cat_dog/labels.txt'

imageNet -- loading classification network model from:
-- prototxt (null)
-- model cat_dog/resnet18.onnx
-- class_labels ~/datasets/cat_dog/labels.txt
-- input_blob 'input_0'
-- output_blob 'output_0'
-- batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins...
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - ONNX (extension '.onnx')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file cat_dog/resnet18.onnx.1.1.GPU.FP16.engine
[TRT] loading network profile from engine cache... cat_dog/resnet18.onnx.1.1.GPU.FP16.engine
[TRT] device GPU, cat_dog/resnet18.onnx loaded
[TRT] device GPU, CUDA engine context initialized with 2 bindings
[TRT] binding -- index 0
-- name 'input_0'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 224 (SPATIAL)
-- dim #2 224 (SPATIAL)
[TRT] binding -- index 1
-- name 'output_0'
-- type FP32
-- in/out OUTPUT
-- # dims 1
[TRT] warning -- unknown nvinfer1::DimensionType (127)
-- dim #0 2 (UNKNOWN)
[TRT] binding to input 0 input_0 binding index: 0
[TRT] binding to input 0 input_0 dims (b=1 c=3 h=224 w=224) size=602112
[TRT] binding to output 0 output_0 binding index: 1
[TRT] binding to output 0 output_0 dims (b=1 c=2 h=1 w=1) size=8
device GPU, cat_dog/resnet18.onnx initialized.
[TRT] cat_dog/resnet18.onnx loaded
imageNet -- failed to find ~/datasets/cat_dog/labels.txt
imageNet -- failed to load synset class descriptions (0 / 0 of 2)
[TRT] imageNet -- failed to initialize.
jetson.inference -- imageNet failed to load built-in network 'googlenet'
PyTensorNet_Dealloc()
Traceback (most recent call last):
File "imagenet-console.py", line 53, in
net = jetson.inference.imageNet(opt.network, argv)
Exception: jetson.inference -- imageNet failed to load network
jetson.utils -- freeing CUDA mapped memory

@kirkchu
Copy link

kirkchu commented Feb 1, 2020

imageNet -- failed to find ~/datasets/cat_dog/labels.txt

replace "~" to "/home/your_name/datasets/cat_dog/labels.txt"

@junjunjansent
Copy link

@dusty-nv

I have a similar problem where it shows it has failed to parse ONNX model 'cat_dog/resnet18/onnx'. Similarly, if I download the trained 100 cat_dog epochs, there would be no issue.

I followed your instruction to uninstall torchvision package (using pip3 for myself), and re-install it from your fork "git clone -bv0.3.0 https://github.com/dusty-nv/vision" (previously torchvision version was '0.5.0a0+85b8fbf'), but it still did not work.

Hope you are able to advise as I am using this to train a different model as well. I am thinking whether this is due to an updated TensorRT version.

Thank you so much.

Following code provided for imagenet exactly: (https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md)

~/jetson-inference/python/training/classification$ imagenet.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt $DATASET/test/cat/01.jpg cat.jpg

Output:

jetson.inference -- imageNet loading network using argv command line params

imageNet -- loading classification network model from:
         -- prototxt     (null)
         -- model        cat_dog/resnet18.onnx
         -- class_labels /home/jansen/datasets/cat_dog/labels.txt
         -- input_blob   'input_0'
         -- output_blob  'output_0'
         -- batch_size   1

[TRT]    TensorRT version 6.0.1
[TRT]    loading NVIDIA plugins...
[TRT]    Plugin Creator registration succeeded - GridAnchor_TRT
[TRT]    Plugin Creator registration succeeded - GridAnchorRect_TRT
[TRT]    Plugin Creator registration succeeded - NMS_TRT
[TRT]    Plugin Creator registration succeeded - Reorg_TRT
[TRT]    Plugin Creator registration succeeded - Region_TRT
[TRT]    Plugin Creator registration succeeded - Clip_TRT
[TRT]    Plugin Creator registration succeeded - LReLU_TRT
[TRT]    Plugin Creator registration succeeded - PriorBox_TRT
[TRT]    Plugin Creator registration succeeded - Normalize_TRT
[TRT]    Plugin Creator registration succeeded - RPROI_TRT
[TRT]    Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT]    Could not register plugin creator:  FlattenConcat_TRT in namespace: 
[TRT]    detected model format - ONNX  (extension '.onnx')
[TRT]    desired precision specified for GPU: FASTEST
[TRT]    requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]    native precisions detected for GPU:  FP32, FP16
[TRT]    selecting fastest native precision for GPU:  FP16
[TRT]    attempting to open engine cache file cat_dog/resnet18.onnx.1.1.6001.GPU.FP16.engine
[TRT]    cache file not found, profiling network model on device GPU
[TRT]    device GPU, loading /usr/bin/ cat_dog/resnet18.onnx
----------------------------------------------------------------
Input filename:   cat_dog/resnet18.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
While parsing node number 0 [Conv -> "123"]:
--- Begin node ---
input: "input_0"
input: "0.conv1.weight"
output: "123"
op_type: "Conv"
attribute {
  name: "dilations"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "group"
  i: 1
  type: INT
}
attribute {
  name: "kernel_shape"
  ints: 7
  ints: 7
  type: INTS
}
attribute {
  name: "pads"
  ints: 3
  ints: 3
  ints: 3
  ints: 3
  type: INTS
}
attribute {
  name: "strides"
  ints: 2
  ints: 2
  type: INTS
}

--- End node ---
ERROR: ModelImporter.cpp:296 In function importModel:
[5] Assertion failed: tensors.count(input_name)
[TRT]    failed to parse ONNX model 'cat_dog/resnet18.onnx'
[TRT]    device GPU, failed to load cat_dog/resnet18.onnx
[TRT]    failed to load cat_dog/resnet18.onnx
[TRT]    imageNet -- failed to initialize.
jetson.inference -- imageNet failed to load built-in network 'googlenet'
Traceback (most recent call last):
  File "/usr/local/bin/imagenet.py", line 55, in <module>
    net = jetson.inference.imageNet(opt.network, sys.argv)
Exception: jetson.inference -- imageNet failed to load network

@dusty-nv
Copy link
Owner

dusty-nv commented Sep 4, 2020

@officialjansent , which version of JetPack and PyTorch do you have installed?

If you upgrade to the latest JetPack, PyTorch 1.5, torchvision 0.7.0 (upstream torchvision, not my fork) you shouldn't have any problems. And on the latest versions you shouldn't need my torchvision fork.

@junjunjansent
Copy link

junjunjansent commented Sep 5, 2020

@dusty-nv
Jetpack 4.3, Pytorch 1.4, torchvision (now 0.3.0)

@officialjansent , which version of JetPack and PyTorch do you have installed?

If you upgrade to the latest JetPack, PyTorch 1.5, torchvision 0.7.0 (upstream torchvision, not my fork) you shouldn't have any problems. And on the latest versions you shouldn't need my torchvision fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants