<a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"> <img src="files/DLI Header.png" alt="Header" style="width: 400px;"/> </a>

# Understanding Embedded Applications

## Set up Jetson TX2

Important - READ FIRST!
This lab requires some setup:

Open a terminal window (Ctrl+Alt+T) on Jetson to initialize a few tools and models. They are going to take about 10 minutes to run, so execute them as the instructor introduces the course.

### The password to the first command (below) is 'nvidia'. Run it from the home directory:

<pre>sudo apt-get install -y git cmake </pre>

### This second command (below) can be copied and pasted as one block:

Copying and pasting will appear to paste only the first line, however, once that command has been run, the second will appear.

```   
git clone http://github.com/dusty-nv/jetson-inference  
cd jetson-inference  
mkdir build  
cd build  
cmake ../  
make  
cd aarch64/bin 
```   

You are now in a folder that contains applications and models that we will learn with, but that you can use on your own if you repeat these steps on your own Jetson.

## Deployed Models

### Pre-Input and Post-Output

While what is inside of a deep neural network may be complex, at their core, they are simply functions. They take some input and generate some output. 

![](functionmachine.PNG)

For the sake of this conversation, we're going to assume that our network has been architected and trained successfully, and focus specifically on the skill of *deployment*. 

To successfully *deploy* a trained model, we have two initial jobs.

1) Our first job is to provide our model the an **input** that it expects.  
2) Our second job is to provide our end user an **output** that is useful.  

The input that our network expects is determined both by its architecture and by *how* it was trained. Deployment requires writing code to convert the input we have to the input the network expects. The output that our network generates is determined by its architecture and what it learned. Deployment requires writing code to convert the output that is generated to the output our end user expects. 

For the beginning of this lab, we will look at a few applications that use trained models. Our job will be to identify four characteristics:

1) The input of the *application*.  
2) The input of the *network*.  
3) The output of the *network.*  
4) The output of the *application.*  

If those 4 characteristics are identified, deployment consists of writing code in any programming language from (1) to (2) and from (3) to (4).

**Challenge: While it runs, identify the inputs (1) and outputs (4) of the application <code>imagenet-camera</code>. Once you have identified them, feel free to experiment with the application to see how (and when) it works.**

Note: To exit the application, click the red (x) in the top left. 

In Jetson's terminal, from the directory where you left off (~/jetson-inference/build/aarch/bin), run imagenet-camera with the command: 

<code>./imagenet-camera</code>

![](filecabinet.png)

You can see that <code>imagenet-camera</code>: <pre>

Takes as input, image data from the camera.
Creates as output, the input image's most likely classification and its likelihood printed on top of the image.</pre>

To start to understand how this was built, our next job is to identify the expected input (2) and generated output (3) of the *trained neural network.*

### Expected Input

To successfully deploy our trained network, we have to convert the application's input to the input expected by the neural network.

#### Examining our network architecture

The information we care about should be pretty easy to find. We want to know the *dimensions* of the *input* layer. Each network has a file (which you'll need later), that describes the network in a human readable format. We're just going to look at the first layer that has dimensions listed.

The network used in imagenet-camera, AlexNet was defined in the framework Caffe. The first layer is defined below. It tells us that AlexNet expects an input of the dimensions 227X227X3. (The "10" means that it will take 10 images at a time.) 

```
name: "AlexNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
```

This was found at the very top of the definition file. See how easy it is to find by examining the whole thing [here](https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/deploy.prototxt).

#### Application input to network input

The input to the application is the raw camera footage from Jetson. Examining Jetson's documentation informs us that the camera captures 720p (1280X720X3) images. 

**For successful deployment, code was written (C++ is this case) to preprocess the image from 1280X720X3 to 227X227X3.**

Note: There could be other constraints than size. Assess format (most networks expect raw pixels (RGB)) and any preprocessing that was done during training (often the mean image from a dataset is subtracted).

### Expected Output

To successfully build a useful application, we have to convert the network's output to an output that is useful to a user.

#### Examining our network architecture

Examining the last layer of our network architecture (the bottom of the same file) tells us that Alexnet generates a 1000 unit vector. What the 1000 numbers represent demands a bit more specialized knowledge. In this case, the layer "type" is "Softmax." A quick internet search of the word "softmax" returns a description of how to use softmax.

In summary, each of the 1000 elements represents the probability that the image belongs to each class. 

```
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}
```

Again, you can see that this just came from the bottom of the network definition file, [here](https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/deploy.prototxt).

#### Network Output to Application Output

The output of the application is what you saw, the input image's most likely classification and its likelyhood printed on top of the image. 

To go from the 1000 element softmax output, the C++ program identified the element of the output with the largest value, used a reference file, "labels.txt," to identify which class that element represented, and displayed both the class title and that element's value written as a %. (It also chose a font and a location).

Below is a diagram that represents the neural network as a function with an input and an output and shows in pseudo-code the logical steps that needed to be written before and after the network for a successful deployment.

![](fmachinepsuedocode.PNG)

While the code is not the key takeaway, if you are interested in seeing what this looks like in C++, you can examine the application [here](https://github.com/dusty-nv/jetson-inference/blob/master/imagenet-camera/imagenet-camera.cpp).

Let's try again. Here, we'll look at a second application and work to identify the four same characteristics:  

1) The input of the *application*.  
2) The input of the *network*.  
3) The output of the *network.*  
4) The output of the *application.*  

## Object Detection Application

**Challenge: Identify the input (1)  and output (4) of detectnet-camera. Run the application below and stand in front of the camera to visualize the input and output of the application. When done, exit the video stream.**

Note: This application may take a few minutes to "build CUDA engine."

<code>./detectnet-camera</code>

![](peddetect.png)

You can see that this application takes the same input as imagenet-camera but instead outputs the input image with a blue rectangle covering people.

**Double challenge: Examine the model architecture file to determine the input (2) and output (3) of the network that enables it to detect pedestrians from the top and bottom of the architecture file [here](https://github.com/dusty-nv/jetson-inference/blob/master/data/networks/detectnet.prototxt).**

To review the last section, we'll create another function diagram of our deployment. In the next section, you'll have to find this information yourself, so try it with us. 

![](fmachineobject.PNG)

### What constitutes a "learned" function

A trained neural network consists of two components:

1) A description of the network architecture - established prior to training  
2) A index of learned parameters (weights) - established during training  

We explored the network architecture in the previous section. In this section, we will see how we can use the same architecture in the same application, but replace the weights, to create an entirely new capability.

**Explore: Replace the learned parameters in detectnet-camera with the learned parameters from a different dataset to prove that the application can harness different models as long as the model architectures share input and output layers.**

Try on Jetson:

./detectnet-camera facenet


You can see that Detectnet was architected to learn to detect objects within an image, and then trained to apply that skillset to faces or pedestrians, etc. In deployment, since the input and output formats are identical, this application doesn't need to care whether it's overlaying blue bounding boxes on top of faces, pedestrians, dogs, bottles, etc. All the application needs to know is to draw a blue box from (x1,y1) to (x2,y2).

Here are other options that were in the repo you downloaded. Feel free to play.

- <code>./detectnet-camera coco-bottle</code> detects bottles/soda cans in the camera
- <code>./detectnet-camera coco-dog</code> detects dogs in the camera (trust us, it works)  
- <code>./detectnet-camera multiped</code> runs using multi-class pedestrian/luggage detector  
- <code>./detectnet-camera pednet</code> runs using original single-class pedestrian detector  
- <code>./detectnet-camera facenet</code> runs using facial recognition network  
- <code>./detectnet-camera</code> by default, program runs using multiped  

### Summary

So far we have learned:

1) Deployment is the task of using a deep neural network to some application.  
2) To successfully deploy a network, we must understand the network's input and output.  
3) A deployed neural network has two components, the model architecture and the learned weights.  

The applications we've been working with and more are available from [this repo.](https://github.com/dusty-nv/jetson-inference)

Next, we'll deploy a new model that has the same inputs and outputs as the model deployed in <code>imagenet-camera</code> to learn to go from data to deployment.

When the class is ready, go to the [next notebook.](Deploying%20Custom%20Model.ipynb)