## Jetson Nano for Visual Recognition
### Using the ssd_mobilenet_v2_coco vision model on Jetson Nano to achieve object recognition in the environment.

</p>
 <span style="color: green;"> The method for running the program:<br>
   This program is intended to run on a Jetson Nano environment. To enable the JetBot car to move freely, the Jetson Nano system board must not be connected to power cables, mice, monitors, or other USB cables. Instead, access and execution must be done through a web-based Jupyter lab accessed remotely over the network. The IP address of the Jetson Nano, displayed on its LCD screen when connected to the local network, will be in the format of 192.168.1.XXX (where XXX is a two- or three-digit number).<br>
   Steps to initiate the program:<br><br>
 1. Insert an SD card with the JetPack software image burned onto it.<br>
 2. Connect a monitor to the HDMI port of the Jetson Nano, and plug in a keyboard and mouse into the USB ports. Power on the Jetson Nano system.<br>
 3. Navigate to Ubuntu 18.04, and configure the local network connection settings for the Jetson Nano system as shown in Figure-1.<br>
 4. Disconnect the monitor, keyboard, mouse, USB devices, and power cable. Restart the Jetson Nano. At this point, the local IP address of the Jetson Nano will be visible on the LCD display of the smart car, as shown in Figure-2.<br>
 5. On an external computer connected to the same local network, enter the local IP address of the Jetson Nano into a web browser to establish a connection to the Jetson Nano system, as illustrated in Figure-3.<br>
 <span>

<div style="display: flex; justify-content: space-between; align-items: center;">
    <img class=" long-press-able-img " src="Ubuntu1.jpg" style="width: 30%; height: auto; margin-right: 1%;">
    <img class=" long-press-able-img " srce-img_q1nu4_69 long-press-able-img " src="液晶1.jpg" style="width: 30%; height: auto; margin-right: 1%;">
    <img class="_double-img_q1nu4_69 long-press-able-img " src="远程1.jpg" style="width: 30%; height: auto;">
</div>



In the Topic 2 of the VAIC (Vision-Aided Intelligent Challenge) technical notes, we delve into leveraging the Jetson Nano platform to perform object recognition using visual models. In the context of VAIC competitions, often the focus is on identifying a limited set of objects (e.g., VAIC_23_24 involves distinguishing between three differently colored balls of the same shape, and VAIC_24_25 involves rings with similar shapes but varying colors).<br>
Adopting pre-existing, powerful, and reliable multi-object vision models and fine-tuning them specifically for the target objects of the competition is a straightforward and rapid approach to implementing the vision capabilities of VAIC robots. These robust, multi-purpose vision models, known as pretrained models, have been trained on vast amounts of data and can be utilized to address similar or related problems without the need for training from scratch. Pretrained models significantly reduce the training time and data requirements for new tasks, enhancing the model's generalization ability and accuracy. In fields such as Natural Language Processing (NLP) and Computer Vision (CV), the use of pretrained models has become a standard practice.<br>
Therefore, this topic emphasizes introducing algorithms and program implementations on the Jetson Nano for recognizing various objects in the surrounding environment using existing multi-object vision models. This aims to equip AI beginners with an understanding of the general workings of vision models and the implementation of visual reasoning algorithms on the Jetson Nano. For enhancing the recognition performance of specific targets in VAIC competitions through fine-tuning pretrained models, detailed instructions will be covered in subsequent topics.<br>
Note: For information on the Jetson Nano system, the camera used, and the vehicle that carries the Jetson Nano, please refer to the "Introduction_to_Jetson-Nano_of_system.ipynb" document located within this directory.<br>

In [1]:
from jetbot import ObjectDetector

model = ObjectDetector('ssd_mobilenet_v2_coco.engine')

‘from jetbot import ObjectDetector’
imports the ObjectDetector module from the NVIDIA jetbot function library. jetbot is a Python library developed for the NVIDIA Jetson Nano, designed to simplify the development of robotic projects such as autonomous vehicles, surveillance robots, and more. This library provides a range of tools and modules that assist developers in rapidly building and deploying projects based on the Jetson Nano. Among the jetbot library, ObjectDetector is a particularly crucial component, as it is used for detecting objects within images.<br>

‘model = ObjectDetector('ssd_mobilenet_v2_coco.engine')’<br>
imports the ssd_mobilenet_v2_coco vision model. The ssd_mobilenet_v2_coco is a deep learning model commonly used for object detection tasks. It combines the earlier MobileNet v2 model with the later SSD model and is trained on the COCO dataset.<br>
A brief explanation is as follows:<br>
1.SSD (Single Shot MultiBox Detector):<br>
Single Shot: This model completes the object detection task in a single forward pass, making it very fast.<br>
MultiBox: Refers to the generation and processing of multiple candidate bounding boxes in SSD.<br>
Detector: The primary function of SSD is to perform object detection, i.e., identifying objects in images, classifying them into predefined categories, and outputting their locations (bounding boxes).<br>
MobileNet V2:<br>
2.MobileNet V2 is a lightweight and efficient convolutional neural network (CNN) architecture designed specifically for mobile devices and embedded vision applications. It employs depthwise separable convolutions, significantly reducing the number of parameters and computations compared to standard convolutions, making it ideal for use in resource-constrained environments.<br>
3.COCO Dataset:<br>
COCO (Common Objects in Context) is a large-scale image dataset containing hundreds of thousands of images across 80 object categories. It is widely used for tasks such as object detection, semantic segmentation, and keypoint detection. The SSD MobileNet V2 COCO model is typically trained on this dataset, enabling it to detect common objects like people, cars, animals, etc.<br>
For a detailed explanation of the ssd_mobilenet_v2_coco model's network structure and operational principles, please refer to the appendix titled "Technical _Specification_of_SSD_MobileNet_V2_COCO_Model.ipynb" Interested readers may consult this document for further information.<br>
The .engine suffix indicates that this model has been optimized using TensorRT, enabling high-speed inference on NVIDIA Jetson series GPUs.

In [2]:
def parse_data(file_path):
    items = []
    with open(file_path, 'r', encoding='utf-8') as file:
        item = {}
        for line in file:
            line = line.strip()
            if not line or line == "item {":  # Skip empty lines and start of a new item
                continue
            if line == "}":  # End of current item
                items.append(item)
                item = {}
            else:
                try:
                    key, value = line.split(": ", 1)  # Only split on the first colon
                    key = key.strip()
                    value = value.strip().strip('"')
                    if key == "name" and key in item:  # Handle duplicate 'name' keys
                        item["zh_name"] = value
                    else:
                        item[key] = value
                except ValueError as e:  # Catch lines that don't have a colon
                    print(f"Error processing line: {line} - {e}")
    return items



file_path = 'recognize_objects.json'
data = parse_data(file_path)
#key='zh_name'
#id=88
#print(data[id]['id'],data[88][key])

88 毛毛熊



To read out the 80 recognized object category labels and display names for the ssd_mobilenet_v2_coco model when identifying objects from the COCO dataset, the data file is named "识别物体.json" located in this directory.

In [3]:
from jetbot import Camera
w=300
h=300
camera = Camera.instance(width=w, height=h)


Import the camera function module from the jetbot Python library, set the image dimensions, and create a Camera instance.

In [4]:
detections = model(camera.value)

#print(detections)

To pass the captured image (camera.value) from the camera (camera) to the model (model) for recognition and store the detected objects or features in a variable named detections, following the format described:<br>
key="detection_boxes": list(int) #Bounding boxes for objects  <br>
key="detection_classes"：list(int) # List of class indices  <br>
key="detection_scores":list(float) # Confidence scores for each detection <br>
key="num_detections": int # Total number of detected objects  <br>
For example:<br>
{  <br>
  "detection_boxes": [  <br>
    [0.1, 0.2, 0.5, 0.6],  // Bounding box for the first object (xmin, ymin, xmax, ymax)  <br>
    [0.4, 0.4, 0.7, 0.8],  // Bounding box for the second object  <br>
    [0.3, 0.3, 0.6, 0.7]   // Bounding box for the third object  <br>
  ],  <br>
  "detection_classes": [1, 3, 17],  // Detected class indices (e.g., 1 represents person, 3 represents car, 17 represents dog)  <br>
  "detection_scores": [0.95, 0.89, 0.78],  // Confidence scores for each detection  <br>
  "num_detections": 3  // Total number of detected objects  <br>
}

In [5]:
from IPython.display import display
import ipywidgets.widgets as widgets

detections_widget = widgets.Textarea()

detections_widget.value = str(detections)

#display(detections_widget)

In a Jupyter Notebook or other interactive Python environments for creating and displaying interactive widgets, follow <br>

1.from IPython.display import display<br>
This imports the display function from the IPython.display module. The display function is used to render or show widgets or other output content within Jupyter Notebook.<br>

2.import ipywidgets.widgets as widgets<br>
This imports the widgets module from the ipywidgets library. ipywidgets is a package for creating interactive HTML widgets in Jupyter Notebook.<br>

3.detections_widget = widgets.Textarea()<br>
This creates a Textarea widget and assigns it to the variable detections_widget. A Textarea is a multiline text box that allows users to input or display text. In this context, it's being used to display the contents of the detections variable.<br>

4.detections_widget.value = str(detections)<br>
This sets the content of the Textarea widget to the string representation of the detections variable. By using str(detections), the detections variable is converted into a string, which is then displayed as text within the Textarea.<br>

In [6]:
image_number = 0
object_number = 0

#print(detections[image_number][object_number])

Initialize image_number and object_number.

In [8]:
from jetbot import Robot

robot = Robot()


from jetbot import Robot:<br>
Imports the Robot class from the jetbot package. The jetbot package is specifically designed for the JetBot robotics platform, encompassing a range of classes and functions for controlling the robot, processing camera inputs, conducting image recognition, and more.<br>
robot = Robot():<br>
Creates an instance of the Robot class and assigns it to the variable robot. Through this instance, you can access all the methods and attributes defined within the Robot class, enabling you to control the behavior of the JetBot robot. The constructor of Robot() is responsible for initializing the hardware interfaces required by the robot (such as motors, cameras, etc.) and setting up some basic configurations.<br>

In [None]:
from jetbot import bgr8_to_jpeg
import cv2
import numpy as np
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi',fourcc,20.0,(300,300))
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 0.5
text_color = (255,255,255)
thickness = 2
#blocked_widget = widgets.FloatSlider(min=0.0, max=1.0, value=0.0, description='blocked')
image_widget = widgets.Image(format='jpeg', width=w, height=h)
label_widget = widgets.IntText(value=1, description='tracked label')
speed_widget = widgets.FloatSlider(value=0.4, min=0.0, max=1.0, description='speed')
turn_gain_widget = widgets.FloatSlider(value=0.8, min=0.0, max=2.0, description='turn gain')
display(widgets.VBox([
    widgets.HBox([image_widget]),#, blocked_widget]),
    label_widget,
    speed_widget,
    turn_gain_widget
]))

width = int(image_widget.width)
height = int(image_widget.height)




The above text defines ipywidgets widgets for various purposes:<br>
image_widget for displaying images<br>
label_widget for displaying labels<br>
speed_widget for controlling the movement speed of a vehicle<br>
turn_gain_widget for controlling the turning speed or gain of a vehicle<br>
In Jupyter Notebook or JupyterLab, the display function in conjunction with the ipywidgets library can be used to create and display interactive widgets. The display(widgets.VBox([...])) call demonstrates how to organize widget layouts using VBox (Vertical Box) and HBox (Horizontal Box).<br>
display function: This function is specific to Jupyter Notebook and JupyterLab and is used to display objects in the output cell. Here, it is utilized to show a VBox widget that contains other widgets.<br>
widgets.VBox: VBox is a class in the ipywidgets library that creates a container for vertical layout. It takes a list as an argument, which contains the widgets to be arranged vertically.<br>
widgets.HBox([image_widget]): Here, an HBox (Horizontal Box) is created, which is a container for arranging widgets horizontally. However, in this specific example, the HBox only contains a single widget—the image_widget.<br>
label_widget, speed_widget, turn_gain_widget: These are already created widgets used for displaying labels, controlling speed, and adjusting turning gain, respectively. <br>
They are directly added to the VBox, resulting in a vertical arrangement.<br>
Layout: Ultimately, the VBox container vertically arranges the widgets in the following order: first, an HBox containing the image_widget (though in this particular case, the HBox only holds one element), followed by label_widget, speed_widget, and turn_gain_widget.<br>
When the display function is called, it renders this VBox container in the output cell of Jupyter Notebook or JupyterLab, allowing users to see the vertically arranged widget layout.<br>


In [None]:
import numpy as np

def rgb_to_jpeg(rgb_image):

    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY),90] #设置JPEG的质量
    result,encoded_img = cv2.imencode('.jpg',rgb_image,encode_param)
    if result:
        return encoded_img.tobytes()
    else:
        raise ValueError("Image encoding failed")
        
        
def detection_center(detection):
    """Computes the center x, y coordinates of the object"""
    bbox = detection['bbox']
    center_x = (bbox[0] + bbox[2]) / 2.0 - 0.5
    center_y = (bbox[1] + bbox[3]) / 2.0 - 0.5
    return (center_x, center_y)
    
def norm(vec):
    """Computes the length of the 2D vector"""
    return np.sqrt(vec[0]**2 + vec[1]**2)

def closest_detection(detections):
    """Finds the detection closest to the image center"""
    closest_detection = None
    for det in detections:
        center = detection_center(det)
        if closest_detection is None:
            closest_detection = det
        elif norm(detection_center(det)) < norm(detection_center(closest_detection)):
            closest_detection = det
    return closest_detection



1.Define the rgb_to_jpeg function that accepts an RGB image (a NumPy array) as input and returns its JPEG-encoded byte stream.<br>
To set the JPEG quality:<br>
cv2.IMWRITE_JPEG_QUALITY is a flag defined in OpenCV used to specify the quality when saving JPEG images. 90 is the quality value, ranging from 0 (worst quality, smallest file size) to 100.<br>
Encoding the image:<br>
Use the cv2.imencode function to encode the image into JPEG format. This function expects the image data to be in the format OpenCV expects, which is BGR order instead of RGB. Therefore, if your input image is in RGB format, you need to convert it to BGR first.<br>
Handling the result:<br>
cv2.imencode returns a boolean value and the encoded image (if successful). The boolean indicates whether the operation was successful, and the second return value is the encoded image data (if the operation was successful).<br>
2.detection_center(detection) function<br>
This function takes a detection object (detection) as input, which should include a key 'bbox', whose value is a list or tuple containing four elements representing the coordinates of the bounding box. These four elements are typically defined as (x_min, y_min, width, height), i.e., the x and y coordinates of the top-left corner of the bounding box, along with its width and height.<br>
The goal of the function is to calculate the x and y coordinates of the center of this bounding box. However, there's a notable point: after calculating the center coordinates, it subtracts 0.5 from each. This could be to adjust the coordinate system to a specific one, possibly centered on the image's center, but it's not the standard way to calculate the center of a bounding box. The standard approach only involves using (x_min + width / 2, y_min + height / 2) as the center coordinates.<br>
3.norm(vec) function<br>
This function takes a 2D vector (vec) as input and returns the length (or Euclidean distance, magnitude, or norm) of the vector. This is achieved by calculating the square root of the sum of squares of the vector's components, i.e., np.sqrt(vec[0]**2 + vec[1]**2). This function is used to calculate the distance from the center of a detection object to the center of the image (or an assumed origin) in subsequent computations.<br>
4.closest_detection(detections) function<br>
This function takes a list of detection objects (detections) as input and aims to find which detection object in this list has a center closest to the center of the image (or the assumed origin, if the coordinate adjustment in the detection_center function was intended for this purpose).
The function iterates through each detection object, calculates its center coordinates, and computes the distance from this center to the origin (or image center). By comparing these distances, the function ultimately returns the detection object with the center closest to the origin.<br>
Note: If the detections list is empty, or if all detection objects' centers lie on a circle equidistant from the origin, the function will return the first detection object in the list as the "closest" one, as it doesn't handle priority conflicts in such cases.<br>
Additionally, due to the coordinate adjustment (subtracting 0.5) in the detection_center function, this "closest" calculation is actually based on a specific, potentially non-standard coordinate system. If this is not the intended behavior, the detection_center function might need adjustment to return the standard bounding box center coordinates.<br>

In [None]:


from PIL import Image,ImageDraw,ImageFont
font_path = "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc"
zh_font = ImageFont.truetype(font_path,13)        
def execute(change):
    labels=[]
    image = change['new']
   
    # compute all detected objects
    detections = model(image)

    # draw all detections on image
    for det in detections[0]:
        bbox = det['bbox']
        labelv = det['label']
        text = data[labelv]['display_name']
        #labels.append(det['id'])
        #print(data[labelv][key])
        cv2.rectangle(image, (int(width * bbox[0]), int(height * bbox[1])), (int(width * bbox[2]), 
                                                                      int(height * bbox[3])), (0, 0, 0), 1)
        cv2.putText(image,text,(int(width * bbox[0]),int(height * bbox[1])),font,font_scale,text_color,thickness)
        '''
        frame_pil = Image.fromarray(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))
        draw = ImageDraw.Draw(frame_pil)
        draw.text((int(width * bbox[0]),int(height * bbox[1])),text,font=zh_font,fill=(255,255,255,1))
        frame_with_text = cv2.cvtColor(np.array(frame_pil),cv2.COLOR_RGB2BGR)
        '''
    # select detections that match selected class label
    matching_detections = [d for d in detections[0] if d['label'] == int(label_widget.value)]
    
    # get detection closest to center of field of view and draw it
    det = closest_detection(matching_detections)
    
   
    if det is not None:
        bbox = det['bbox']
        #cv2.rectangle(image, (int(width * bbox[0]), int(height * bbox[1])), (int(width * bbox[2]), 
        #int(height * bbox[3])), (0,0,0), 1)
        
    rgb_image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
    image_widget.value = rgb_to_jpeg(image)
    out.write(rgb_image)
    
execute({'new': camera.value})


This piece of code represents a function that processes image detection results, utilizing OpenCV (via cv2) and PIL (Python Imaging Library, potentially through PIL.Image or similar modules for image manipulations) to draw bounding boxes and labels for detected objects on an image, and handle specific detection outcomes.<br>
execute(change)<br>
Purpose: Receives a new image change, processes it, and draws bounding boxes and labels for detected objects on the image.<br>
Parameters: change is a dictionary where the key 'new' corresponds to an image (in the form of a NumPy array or another format convertible to an image).<br>
Process:<br>
Initialization: Creates an empty list labels (though not used later in the described code), and retrieves the image corresponding to the 'new' key in the change dictionary.<br>
Object Detection: Invokes a function named model (defined elsewhere in the code) to process the image and return detection results. The model is mentioned as 'ssd_mobilenet_v2_coco.engine', indicating it's an image detection model.<br>
Drawing Detection Results:<br>
Iterates through each object in the detection results.<br>
Uses the bounding box (bbox) and label (label) to draw a rectangle and text label.<br>
Utilizes OpenCV's cv2.rectangle and cv2.putText functions. Note that width and height refer to the image's dimensions.<br>
Filtering Results by Specific Categories:<br>
Filters detection results to match a specific category based on label_widget.value (not defined in the provided code snippet, likely a value from a UI component).<br>
Finding the Closest Detection to the Center:<br>
Assumes closest_detection is an undefined function that finds the detection result whose center is closest to the center of the image, among the filtered results.<br>
Handling a Specific Detection Result (if found):<br>
Draws the bounding box for this detection (though the drawing code is commented out).<br>
Converting and Displaying the Image:<br>
Converts the image from BGR format to RGB format.<br>
image_widget.value is a UI component used to display the processed image. Here, the rgb_to_jpeg function (as defined earlier) is utilized to convert the image to a JPEG-encoded byte stream.<br>
out is mentioned as a video file writer or similar output stream used to save the processed image, but its usage or definition is not detailed in the context provided.<br>
Note: The described process assumes familiarity with OpenCV's functionality for image processing and drawing, as well as an understanding of how the detection model (model) and potential UI components (label_widget, image_widget) work within the larger context of the application.


<video width="600" height="400" controls>
  <source src="/Users/lichengtong/Jetson旋转式物体识别/识别结果.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

In [21]:
robot.left(speed=0.075)
robot.right(speed=-0.075)

Set the rotation speed of the car and start rotating it.

Call the block below to connect the execute function to each camera frame update.

In [22]:
camera.unobserve_all()
camera.observe(execute, names='value')

<video width="600" height="400" controls>
  <source src="/Users/lichengtong/Jetson旋转式物体识别/识别现场.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>


camera.unobserve_all()<br>
Removes all previously bound callbacks from the camera object. In the Widgets library, when you want to perform certain actions whenever the value of a Widget changes, you use the .observe() method to bind a callback function (such as the execute function) to that Widget. If later you no longer need these callback functions, or want to rebind new callback functions, you can use .unobserve_all() to remove all existing callbacks, avoiding unnecessary executions or potential errors.<br>
camera.observe(execute, names='value')<br>
Binds the execute function as an observer to the camera object, so that the execute function is automatically called whenever the value attribute of the camera object changes. Here, camera is the Widget for capturing images, and the value attribute represents the currently captured image or video frame.<br>


In [23]:
import time
out.release()
camera.unobserve_all()
time.sleep(1.0)
robot.stop()

out.release() releases the video output and closes the video file.<br>
camera.unobserve_all()<br>
Removes all previously bound callbacks from the camera object. In the Widgets library, when you want to perform certain actions whenever the value of a Widget changes, you use the .observe() method to bind a callback function (such as the execute function) to that Widget. If later you no longer need these callback functions, or want to rebind new callback functions, you can use .unobserve_all() to remove all existing callbacks, avoiding unnecessary executions or potential errors.<br>
camera.observe(execute, names='value')<br>
Binds the execute function as an observer to the camera object, so that the execute function is automatically called whenever the value attribute of the camera object changes. Here, camera is the Widget for capturing images, and the value attribute represents the currently captured image or video frame.<br>