DUAL NET INFERENCE

Note: I'm in the process of migrating it to Jetpack 3.1 so it's not going to work at this time.

This is a fork of NVIDIA's deep learning inference library. If you haven't seen used that yet then I strongly advise you to use that use that as a starting point, and that can be obtained on GitHub. Most everything here was copied from there, and mutilated by someone who hacks together some code once every 5 years or so. So best practices are not exactly followed.

The main purpose of this fork is to test out pipelining DetectNet, and ImageNet. Where DetectNet is used to detect the presence of a type of object (car, boat, plane), and an ImageNet model is used to further classify the detected object (what make/model car, what type of plane, etc).

This repository is kept as close to jetson-inference as possible with only adding a few routines that were needed. The ImageNet and DetectNet examples should work as they did.

I added two example demos. One of these demos is dualnet-camera that combines image detection and recognition. The other demo is a very simplified live-camera based blackjack game.

Building from Source

Provided along with this repo are TensorRT-enabled examples of running Googlenet/Alexnet on live camera feed for image recognition, and pedestrian detection networks with localization capabilities (i.e. that provide bounding boxes).

The latest source can be obtained from GitHub and compiled onboard Jetson TX1/TX2.

note: this branch is verified against JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)

1. Cloning the repo

To obtain the repository, navigate to a folder of your choosing on the Jetson. First, make sure git and cmake are installed locally:

sudo apt-get install git cmake

Then clone the jetson-inference repo:

git clone http://github.com/S4WRXTTCS/jetson-inference

2. Configuring

When cmake is run, a special pre-installation script (CMakePreBuild.sh) is run and will automatically install any dependencies.

cd jetson-inference
mkdir build
cd build
cmake ../

3. Compiling

Make sure you are still in the jetson-inference/build directory, created above in step #2.

cd jetson-inference/build			# omit if pwd is already /build from above
make

Depending on architecture, the package will be built to either armhf or aarch64, with the following directory structure:

|-build
   \aarch64		    (64-bit)
      \bin			where the sample binaries are built to
      \include		where the headers reside
      \lib			where the libraries are build to
   \armhf           (32-bit)
      \bin			where the sample binaries are built to
      \include		where the headers reside
      \lib			where the libraries are build to

binaries residing in aarch64/bin, headers in aarch64/include, and libraries in aarch64/lib.

Classifying Images with ImageNet

There are multiple types of deep learning networks available, including recognition, detection/localization, and soon segmentation. The first deep learning capability to highlight is image recognition using an 'imageNet' that's been trained to identify similar objects.

The imageNet object accept an input image and outputs the probability for each class. Having been trained on ImageNet database of 1000 objects, the standard AlexNet and GoogleNet networks are downloaded during step 2 from above.

After building, first make sure your terminal is located in the aarch64/bin directory:

$ cd jetson-inference/build/aarch64/bin

Then, classify an example image with the imagenet-console program. imagenet-console accepts 2 command-line arguments: the path to the input image and path to the output image (with the class overlay printed).

$ ./imagenet-console orange_0.jpg output_0.jpg

$ ./imagenet-console granny_smith_1.jpg output_1.jpg

Next, we will use imageNet to classify a live video feed from the Jetson onboard camera.

Running the Live Camera Recognition Demo

Similar to the last example, the realtime image recognition demo is located in /aarch64/bin and is called imagenet-camera. It runs on live camera stream and depending on user arguments, loads googlenet or alexnet with TensorRT.

$ ./imagenet-camera googlenet           # to run using googlenet
$ ./imagenet-camera alexnet             # to run using alexnet

The frames per second (FPS), classified object name from the video, and confidence of the classified object are printed to the openGL window title bar. By default the application can recognize up to 1000 different types of objects, since Googlenet and Alexnet are trained on the ILSVRC12 ImageNet database which contains 1000 classes of objects. The mapping of names for the 1000 types of objects, you can find included in the repo under data/networks/ilsvrc12_synset_words.txt

note: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the DEFAULT_CAMERA define at the top of imagenet-camera.cpp to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.

Locating Object Coordinates using DetectNet

The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability to highlight is detecting multiple objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.

The detectNet object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. Three example detection network models are are automatically downloaded during the repo source configuration:

ped-100 (single-class pedestrian detector)
multiped-500 (multi-class pedestrian + baggage detector)
facenet-120 (single-class facial recognition detector)

To process test images with detectNet and TensorRT, use the detectnet-console program. detectnet-console accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are included with the repo:

$ ./detectnet-console peds-007.png output-7.png

To change the network that detectnet-console uses, modify detectnet-console.cpp (beginning line 33):

detectNet* net = detectNet::Create( detectNet::PEDNET_MULTI );	 // uncomment to enable one of these 
//detectNet* net = detectNet::Create( detectNet::PEDNET );
//detectNet* net = detectNet::Create( detectNet::FACENET );

Then to recompile, navigate to the jetson-inference/build directory and run make.

Multi-class Object Detection

When using the multiped-500 model (PEDNET_MULTI), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.

$ ./detectnet-console peds-008.png output-8.png

Running the Live Camera Detection Demo

Similar to the previous example, detectnet-camera runs the object detection networks on live video feed from the Jetson onboard camera. Launch it from command line along with the type of desired network:

$ ./detectnet-camera multiped       # run using multi-class pedestrian/luggage detector
$ ./detectnet-camera ped-100        # run using original single-class pedestrian detector
$ ./detectnet-camera facenet        # run using facial recognition network
$ ./detectnet-camera cardnet        # run using Playing Card detection network
$ ./detectnet-camera                # by default, program will run using multiped

note: to achieve maximum performance while running detectnet, increase the Jetson TX1 clock limits by running the script: sudo ~/jetson_clocks.sh

note: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the DEFAULT_CAMERA define at the top of detectnet-camera.cpp to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.

Running the DualNet Demo

The dualnet-camera combines the detection (DetectNet) and recognition (ImageNet) on the live video feed from the Jetson onboard camera. Launch it from the command line along with the desired networks. Where the first network is the DetectNet network, and the second network is the ImageNet Network.

$ ./detectnet-camera cardnet alexnet_54cards  # run using PlayingCard Detection, and PlayingCard recognition
$ ./detectnet-camera                          # by default it runs using PlayingCard Detection, and PlayingCard recognition

Running the BlackJack Camera Demo

$ ./blackjack-camera                # by default, program will run using the correct networks

By default, it uses USB camera at device 1. To change this you'll need to change the DEFAULT_CAMERA define at the top of blackjack-camera.cpp to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920. The internal camera can be used, but isn't advised.

To play the game have the camera facing down towards the table. Half of the image is the computer playing area, and half of it is the human side. Simply deal a card to the computer side, and then the human side. The computer will tell you when it wants to hit or stand. To tell the computer that you want to stand then simply use the Red Joker to tell it you're staying. As of now the game is pretty limited in that it doesn't know the ACE can be different values. It's only intended as a demonstration of what's possible with combining ImageNet and DetectNet.

If you find that it's not recognizing cards correctly then move the camera up or down. It also struggles with cards that are too close together. The detectnet detects them as a single card and it screws everything up. You also can't have overlaying cards.

Here is what it should look like. This image shows an 11x17 piece of paper I used to act as the playing table with outlines for the cards, but this isn't needed.

If you need to retrain the DetectNet based CardNet or the ImageNet AlexNet_54cards then you can add the necessary data to the following datasets, and then retrain them in Digits 5.0.

The DetectNet training data is here https://drive.google.com/file/d/0B8dR1eAmu3fTR3l4WkNtR0dqS0E/view?usp=sharing

The ImageNet training data is here https://drive.google.com/file/d/0B8dR1eAmu3fTcG1mZVN4OHFNTU0/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
blackjack-camera		blackjack-camera
data		data
detectnet-camera		detectnet-camera
detectnet-console		detectnet-console
docs		docs
dualnet-camera		dualnet-camera
imagenet-camera		imagenet-camera
imagenet-console		imagenet-console
segnet-camera		segnet-camera
segnet-console		segnet-console
tools		tools
util		util
CMakeLists.txt		CMakeLists.txt
CMakePreBuild.sh		CMakePreBuild.sh
README.md		README.md
detectNet.cpp		detectNet.cpp
detectNet.h		detectNet.h
gdown.pl		gdown.pl
imageNet.cpp		imageNet.cpp
imageNet.cu		imageNet.cu
imageNet.h		imageNet.h
segNet.cpp		segNet.cpp
segNet.h		segNet.h
tensorNet.cpp		tensorNet.cpp
tensorNet.h		tensorNet.h

S4WRXTTCS/jetson-inference

Folders and files

Latest commit

History

Repository files navigation

DUAL NET INFERENCE

Building from Source

1. Cloning the repo

2. Configuring

3. Compiling

Classifying Images with ImageNet

Running the Live Camera Recognition Demo

Locating Object Coordinates using DetectNet

Multi-class Object Detection

Running the Live Camera Detection Demo

Running the DualNet Demo

Running the BlackJack Camera Demo

If you need to retrain the DetectNet based CardNet or the ImageNet AlexNet_54cards then you can add the necessary data to the following datasets, and then retrain them in Digits 5.0.

About

Resources

Stars

Watchers

Forks

Languages