-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-training on the Cat/Dog Dataset #370
Comments
Hi @FingerVonFrings, this issue was fixed by patching ResNet-18 model definition in my fork of torchvision with this commit dusty-nv/vision@5c46136 So you may want to uninstall torchvision package, and re-install it from my fork: $ sudo pip uninstall torchvision
$ python -c "import torchvision" # should make error if succesfully uninstalled
$ git clone -bv0.3.0 https://github.com/dusty-nv/vision
$ vision
$ sudo python setup.py install Then you should be able to train again. At first you can try training for just a couple epochs, then run |
It does work.Thank you so much for your reply and advice!!! |
Hi dusty-nv dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ sudo python setup.py install Setuptools are already installed but I am getting this error. Thanks Santanu |
Hi Santu, if you run an interactive python interpreter, are you able to import setuptools ok there?
…________________________________
From: duttasantanuGH <notifications@github.com>
Sent: Tuesday, July 30, 2019 11:50:09 AM
To: dusty-nv/jetson-inference <jetson-inference@noreply.github.com>
Cc: Dustin Franklin <dustinf@nvidia.com>; Comment <comment@noreply.github.com>
Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)
Hi dusty-nv
First of all thanks for your comprehensive and well curated resource guide. I am facing following error while trying to install pytorch following your above instruction. Unfortunately, I am facing the following error: Kindly help me in resolving this.
dlinano@jetson-nano:~/sd/jetson-inference/build/vision$ sudo python setup.py install
Traceback (most recent call last):
File "setup.py", line 6, in
from setuptools import setup, find_packages
ImportError: No module named setuptools
Setuptools are already installed but I am getting this error.
Thanks Santanu
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#370?email_source=notifications&email_token=ADVEGK4L4OMGU3OKYVEX4ZLQCBPLDA5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ENRUQ#issuecomment-516479186>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADVEGK2XO37QUHUSMKTJSMDQCBPLDANCNFSM4IGBIWGA>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
|
Yes it is working properly in interactive tool. I faced the same issue as mentioned in this thread and hence need to install again. |
Do you want me to install using interactive app? Previously i installed python3 version using interactive app. But faced the same issue as FingerVonFrings. |
Hmm is your python mapped to python3? That could be causing the error when it goes to install torchvision. You could try running those steps from the script manually if it helps.
…________________________________
From: duttasantanuGH <notifications@github.com>
Sent: Tuesday, July 30, 2019 12:29:33 PM
To: dusty-nv/jetson-inference <jetson-inference@noreply.github.com>
Cc: Dustin Franklin <dustinf@nvidia.com>; Comment <comment@noreply.github.com>
Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)
Do you want me to install using interactive app? Previously i installed python3 version using interactive app. But faced the same issue as FingerVonFrings.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#370?email_source=notifications&email_token=ADVEGKYKUMMCFAFOVLEAYDTQCBT63A5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ERJXQ#issuecomment-516494558>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADVEGKZAVPGUJUKFCSRMW5TQCBT63ANCNFSM4IGBIWGA>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
|
Yes my python is mapped to python3. As mentioned earlier, in nteractive tool, I can see setuptools can be imported successfully.
But when I try to run the following scripts, I am getting error as mentioned before If I use your pytorch installer - pytorch get installed properly. I have done checking steps post installation to ensure it as suggested by you. I installed python3 compatible version of torchvision using interactive tool. But when retaining also takes place without any issues. I am being able to convert onnx file. But then at the "Processing Images with TensorRT" I am facing the same error as mentioned in this thread. To overcome this issues, I tried to uninstall and install torchvision as suggested by you but at that stage facing this issue. Hope this clarifies. Please help in resolving the issue. |
Hi dusty Thanks Santanu |
My script uses 'python' for python 2 and 'python3' for python 3. So when you mapped your python to python3, torchvision is getting installed under python3 but torch is getting installed under python2.
Either run the steps manually and correct the usage pip/pip3 and python/python3 to match, or just select Python 3.6 version when running the script.
…________________________________
From: duttasantanuGH <notifications@github.com>
Sent: Wednesday, July 31, 2019 12:34:57 PM
To: dusty-nv/jetson-inference <jetson-inference@noreply.github.com>
Cc: Dustin Franklin <dustinf@nvidia.com>; Comment <comment@noreply.github.com>
Subject: Re: [dusty-nv/jetson-inference] Re-training on the Cat/Dog Dataset (#370)
Hi dusty
you are absolutely right that python mapping was not correct. I was mapping it for session but that was not effective for sudo...
I have corrected it. But getting the following error: Can you please resolving it?
Installed /usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg
Processing dependencies for torchvision==0.3.0
Searching for torch>=1.1.0
Reading https://pypi.org/simple/torch/
No local packages or working download links found for torch>=1.1.0
error: Could not find suitable distribution for Requirement.parse('torch>=1.1.0')
Thanks Santanu
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#370?email_source=notifications&email_token=ADVEGK4MMII2TFBSHDD7TLLQCG5LDA5CNFSM4IGBIWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H2PKQ#issuecomment-516925354>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADVEGK376Y653KKE3LC3K3LQCG5LDANCNFSM4IGBIWGA>.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
|
Thank you Dusty for your kind advice. It worked like a charm yesterday. I ran it manually. |
Hello Dusty. I faced the same problem. At first I used python3 to install torchvision and torch, after I failed, I tried to used python2 to re-train and run onnx_export.py script. everything worked fine until here but when I tried imagenet-console again. I still got the same error. Could you help me?? Here is the error: imageNet -- loading classification network model from: [TRT] TensorRT version 5.1.6 |
replace "~" to "/home/your_name/datasets/cat_dog/labels.txt" |
I have a similar problem where it shows it has failed to parse ONNX model 'cat_dog/resnet18/onnx'. Similarly, if I download the trained 100 cat_dog epochs, there would be no issue. I followed your instruction to uninstall torchvision package (using pip3 for myself), and re-install it from your fork "git clone -bv0.3.0 https://github.com/dusty-nv/vision" (previously torchvision version was '0.5.0a0+85b8fbf'), but it still did not work. Hope you are able to advise as I am using this to train a different model as well. I am thinking whether this is due to an updated TensorRT version. Thank you so much. Following code provided for imagenet exactly: (https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md)
Output:
|
@officialjansent , which version of JetPack and PyTorch do you have installed? If you upgrade to the latest JetPack, PyTorch 1.5, torchvision 0.7.0 (upstream torchvision, not my fork) you shouldn't have any problems. And on the latest versions you shouldn't need my torchvision fork. |
@dusty-nv
|
Hello
error when in Processing Images with TensorRT. Please see error below.
306lab:~/jetson-inference/python/training/imagenet$ imagenet-console.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt $DATASET/test/cat/011.jpg wgoutput011.jpg
jetson.inference.init.py
jetson.inference -- initializing Python 2.7 bindings...
jetson.inference -- registering module types...
jetson.inference -- done registering module types
jetson.inference -- done Python 2.7 binding initialization
jetson.utils.init.py
jetson.utils -- initializing Python 2.7 bindings...
jetson.utils -- registering module functions...
jetson.utils -- done registering module functions
jetson.utils -- registering module types...
jetson.utils -- done registering module types
jetson.utils -- done Python 2.7 binding initialization
[image] loaded '/home/hfut/datasets/cat_dog/test/cat/011.jpg' (700 x 525, 3 channels)
jetson.inference -- PyTensorNet_New()
jetson.inference -- PyImageNet_Init()
jetson.inference -- imageNet loading network using argv command line params
jetson.inference -- imageNet.init() argv[0] = '--model=cat_dog/resnet18.onnx'
jetson.inference -- imageNet.init() argv[1] = '--input_blob=input_0'
jetson.inference -- imageNet.init() argv[2] = '--output_blob=output_0'
jetson.inference -- imageNet.init() argv[3] = '--labels=/home/hfut/datasets/cat_dog/labels.txt'
imageNet -- loading classification network model from:
-- prototxt (null)
-- model cat_dog100/resnet18.onnx
-- class_labels /home/hfut/datasets/cat_dog/labels.txt
-- input_blob 'input_0'
-- output_blob 'output_0'
-- batch_size 1
[TRT] TensorRT version 5.0.6
[TRT] loading NVIDIA plugins...
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - ONNX (extension '.onnx')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file cat_dog/resnet18.onnx.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading /usr/bin/ cat_dog/resnet18.onnx
Input filename: cat_dog/resnet18.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string:
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
While parsing node number 69 [Gather -> "192"]:
ERROR: /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/release/5.0/parsers/onnxOpenSource/ModelImporter.cpp:142 In function importNode:
[8] No importer registered for op: Gather
[TRT] failed to parse ONNX model 'cat_dog/resnet18.onnx'
[TRT] device GPU, failed to load cat_dog/resnet18.onnx
[TRT] failed to load cat_dog/resnet18.onnx
[TRT] imageNet -- failed to initialize.
jetson.inference -- imageNet failed to load built-in network 'googlenet'
PyTensorNet_Dealloc()
Traceback (most recent call last):
File "/usr/local/bin/imagenet-console.py", line 53, in
net = jetson.inference.imageNet(opt.network, argv)
Exception: jetson.inference -- imageNet failed to load network
jetson.utils -- freeing CUDA mapped memory`
but if i download this completed model that was trained for a full 100 epochs from here:
then it's ok in Processing Images with TensorRT.
I notice it will generate (resnet18.onnx.1.1.GPU.FP16.engine) file.
but when i use my model ,this file unable to generate.
Any help?Thanks
The text was updated successfully, but these errors were encountered: