DeepLearning jetson AI project

This project aims to build an AI machine that recognizes user gestures to control users' home appliances (e.g., laptop, TV, Audio) in real-life scenarios. Based on this code, Jetson Nano can recognize the user's hand and manipulates the operating system through gestures. The function of gestures are as follows:

Volume Control
Mute
Unmute
Capture
Shutdown

Outcomes

Youtube URL

How to Install

Update and Upgrade the APT and PIP

sudo apt-get update
sudo apt-get upgrade
pip3 install --upgrade pip

Download jetpack from Nvidia JetPack SDK

We used version 4.6 / https://developer.nvidia.com/embedded/jetpack-sdk-46

Create Boot image from Jetpack (We used Balena Etcher)
Install Pytorch and Torchvision(https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048) ⚠️ DO NOT INSTALL PYTORCH AND TORCHVISON VIA ANY PACKAGE MANAGER ⚠️

We used PyTorch v1.10.0 and torchvision v0.10.0
if error with PIL deprecated, install Pillow < v7
Pytorch Install

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev libomp-dev
pip3 install Cython
pip3 install numpy torch-1.11.0-cp36-cp36m-linux_aarch64.whl

Torchvision Install

sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev
git clone --branch v0.11.0 https://github.com/pytorch/vision torchvision
cd torchvision
export BUILD_VERSION=0.11.0 
python3 setup.py install --user
pip3 install 'pillow<7'

Install Mediapipe ⚠️ DO NOT INSTALL MEDIAPIPE VIA PIP ⚠️

Reference : https://github.com/Melvinsajith/How-to-Install-Mediapipe-in-Jetson-Nano

sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev \
zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran

sudo pip3 install -U pip testresources setuptools==49.6.0

sudo pip3 install -U --no-deps numpy==1.19.4 future==0.18.2 mock==3.0.5 \
keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 \
protobuf pybind11 cython pkgconfig

sudo env H5PY_SETUP_REQUIRES=0 pip3 install -U h5py==3.1.0

Install OpenCV

sudo apt-get install python3-opencv

Test the OpenCV

import cv2
print(cv2.getBuildInformation())

Check for GSTREAMER support in VIDEO I/O section

GStreamer:                   YES (1.14.5)tree/master/script

Increase swap for more swap ram

git clone https://github.com/JetsonHacksNano/installSwapfile.git  
cd installSwapfile
./installSwapfile.sh

Install Mediapipe

sudo apt-get install -y libopencv-core-dev  libopencv-highgui-dev libopencv-calib3d-dev libopencv-features2d-dev libopencv-imgproc-dev libopencv-video-dev
sudo chmod 744 setup_opencv.sh
./setup_opencv.sh
sudo pip3 install opencv_contrib_python

Download Files https://drive.google.com/file/d/1lHr9Krznst1ugLF_ElWGCNi_Y4AmEexx/view?usp=sharing

unzip mediapipe-bin.zip
cd mediapipe-bin
sudo pip3 install numpy-1.19.4-cp36-none-manylinux2014_aarch64.whl mediapipe-0.8.5_cuda102-cp36-none-linux_aarch64.whl
pip3 install dataclasses

5-1. (Optional) If you wanna input CSI Camera video source, install OpenCV using this method

⚠️ WARNING. This task lasts at least 4 hours ⚠️

Reference : https://github.com/AastaNV/JEP/

wget https://raw.githubusercontent.com/AastaNV/JEP/master/script/install_opencv4.5.0_Jetson.sh
./install_opencv4.5.0_Jetson.sh

Algorithms used in Projects

The K-Nearest Neighbor (K-NN) algorithm is the simplest machine learning algorithm, classification algorithm. Data with similar characteristics are used under the assumption that they tend to fall into similar categories.

Code Block Description

Client.py

Code for socket communication with server

clientSock = socket(AF_INET, SOCK_STREAM)
clientSock.connect(('172.20.10.3', 7777))
rps_gesture = {0:'mute', 5:'unmute', 9:'capture', 10:'fix'}

Code for Live Video Capture

cap = cv2.VideoCapture('nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1280, height=720, format=(string)NV12, framerate=(fraction)20/1 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw, format=(string)BGR ! appsink', cv2.CAP_GSTREAMER)

Run while live capture continues

while live capture
while cap.isOpened():
        success, image = cap.read()
        
        if not success:
            print("can't open video")
            continue

Compute angles between joints, Get angle using arcos of dot product, then Translate certain numeric codes to the server in utf-8 and send them to Socat communication

Code for socket communication with server
 if results.multi_hand_landmarks is not None:
            for res in results.multi_hand_landmarks:
                joint = np.zeros((21, 3))
                for j, lm in enumerate(res.landmark):
                    joint[j] = [lm.x, lm.y, lm.z]

                # Compute angles between joints
                v1 = joint[[0,1,2,3,0,5,6,7,0,9,10,11,0,13,14,15,0,17,18,19],:] # Parent joint
                v2 = joint[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],:] # Child joint
                v = v2 - v1 # [20,3]
                # Normalize v
                v = v / np.linalg.norm(v, axis=1)[:, np.newaxis]

                # Get angle using arcos of dot product
                angle = np.arccos(np.einsum('nt,nt->n',
                    v[[0,1,2,4,5,6,8,9,10,12,13,14,16,17,18],:], 
                    v[[1,2,3,5,6,7,9,10,11,13,14,15,17,18,19],:])) # [15,]

                angle = np.degrees(angle) # Convert radian to degree

                # Inference gesture
                data = np.array([angle], dtype=np.float32)
                ret, result, neighbours, dist = knn.findNearest(data, 4)
                idx = int(result[0][0])

                # Draw gesture result
                if idx in rps_gesture.keys():
                    
                    print("idx : ", idx)
                    if idx == 0:
                        print("mute")
                        clientSock.send(str(1111).encode('utf-8')) //Translate certain numeric codes to the server in utf-8 and send them to Socat communication
                        
                    elif idx == 5:
                        print("unmute")
                        clientSock.send(str(2222).encode('utf-8'))
                        
                    elif idx == 9 :
                        print("capture")
                        clientSock.send(str(3333).encode('utf-8'))

                    elif idx == 10 :
                        print("shutdown")
                        clientSock.send(str(4444).encode('utf-8'))

                    cv2.putText(image, text=rps_gesture[idx].upper(), org=(int(res.landmark[0].x * image.shape[1]), int(res.landmark[0].y * image.shape[0] + 20)), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 255, 255), thickness=2)

                # Other gestures
                # cv2.putText(img, text=gesture[idx].upper(), org=(int(res.landmark[0].x * img.shape[1]), int(res.landmark[0].y * img.shape[0] + 20)), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 255, 255), thickness=2)

                mp_drawing.draw_landmarks(image, res, mp_hands.HAND_CONNECTIONS)

Server.py

Code for socket communication with client

def recieve_data(val):
    serverSock = socket(AF_INET, SOCK_STREAM)
    serverSock.bind(('', 7777))
    serverSock.listen(1)
    connectionSock, addr = serverSock.accept()
    print("Client address : ", str(addr))

Decode specific numeric codes passed from the client to perform corresponding actions

while True:
        print("val : ", val.value)
        try : 
            vol = int(connectionSock.recv(4).decode('utf-8'))
            if vol == 1111:
                print("mute")
                osascript.osascript('set volume output muted TRUE')
                val.value = 0
                while True:
                    vol = int(connectionSock.recv(4).decode('utf-8'))
                    if vol == 2222:
                        osascript.osascript('set volume output muted FALSE')
                        break
                    
            if vol == 3333:
                print("screenshot")
                os.system("screencapture screen.png")
                vol = 0
                
            if vol == 4444:
                print("fix volume")
                osascript.osascript('tell app "System Events" to shut down')
                time.sleep(5)
            if vol < 300:
                val.value = vol
        except:
            pass

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
old		old
.DS_Store		.DS_Store
.gitignore		.gitignore
NVIDIA-Jetson-AI-Specialist-Certificate-Hyeonmin-Kim.pdf		NVIDIA-Jetson-AI-Specialist-Certificate-Hyeonmin-Kim.pdf
README.md		README.md
client.py		client.py
demo.gif		demo.gif
demo_2.gif		demo_2.gif
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepLearning jetson AI project

Outcomes

Youtube URL

How to Install

⚠️ DO NOT INSTALL PYTORCH AND TORCHVISON VIA ANY PACKAGE MANAGER ⚠️

⚠️ DO NOT INSTALL MEDIAPIPE VIA PIP ⚠️

⚠️ WARNING. This task lasts at least 4 hours ⚠️

Algorithms used in Projects

Code Block Description

Client.py

Server.py

Certificate

README

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepLearning jetson AI project

Outcomes

Youtube URL

How to Install

⚠️ DO NOT INSTALL PYTORCH AND TORCHVISON VIA ANY PACKAGE MANAGER ⚠️

⚠️ DO NOT INSTALL MEDIAPIPE VIA PIP ⚠️

⚠️ WARNING. This task lasts at least 4 hours ⚠️

Algorithms used in Projects

Code Block Description

Client.py

Server.py

Certificate

README

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages