DLAV

Group 9

Milestone 1

Instructions

Run a JavaScript to obtain the frame from our webcam from the browser.
Convert the returned base64 to an OpenCV image format, Numpy ndarray with a shape (h, w, c), RGB
Run YOLOv4 to detect objects in the returned frame from browser, here we only keep objects with human label (label 0)
Iterate through all the detected human, and crop the original image only keep the objects inside each bounding boxes.
Use mediapipe to inference hand landmarks (key points) in each croped image with human objects.
Build a KNN model to implement hand pose recognition, here we build a small data set with eight_sign, five_sign, four_sign, ok, one_sign, six_sign, spider, ten_sign, three_sign, two_sign, 10 different hand gestures.
In order to accelerate our reference speed, a embedding method is introdeced. The Dimensions of landmarks are reduced from 21 points to 5 points.
When the detected person is doing the right hand gesture, we will render an alpha image only with bounding boxes.

This is how we done to detect the Point of Interests (POI).

Milestone 2

Based on milestone1, we added the DeepSort algorithm for tracking points of interest.

Something about the Coordinates

For a openCV picture, the 0-dim is y-axis, the 1-dim is x-axis, the 2-dim is channel

xywh -> xc, yc, width, height
xyxy -> left, top, right, bottom
xyah -> xc, yc, w/h, height
xysr -> xc, yc, square, h/w

Instructions

First, run of the cells one by one, and we upload all the required files to Google Drive. Use the gdown with a id parameter to download given file from Google Drive.

For example: gdown 1dWOhStdDXK_kBefa9t9hDYLZ6kyrBwgP

By giving our app a hand gesture, you will be the POI (point of interest), then our app will keep tracking you whenever you are in or out of the camera. If you are out of the camera too long, our app will count your leaving time, and if run out of time, the app will re-initialize and try to find a new POI.

Here you can set the gesture you want and tuning the max leaving time, by give the variables target_pose and max_count some new values (See cell below).

Supported hand poses are eight_sign, five_sign, four_sign, ok, one_sign, six_sign, spider, ten_sign, three_sign, two_sign.

Our milestone 1&2 is quite light-weighted. 🚀 Enjoy 🍻!

Milestone 3

In milestone 1&2, we implement Object Detection, Keypoint Detection, KNN Classification, and Multi Object Tracking (MOT).

Code Structure

milestone3
│───deepsort.py
│───detector.py
│───client.py  
│───requirements.txt
│
│───hand_knn
│   │───embedder.py
│   │───hand_detect.py
│   └───dataset_embedded.npz # KNN embedding data set with 5-dim and 10 classes
│   
│───deep
│   │───fastreid
│   │───checkpoint # Download the checkpoint first
│   │───feature_extractor.py
│   └───.....
│
└───sort
    │───detection.py
    │───track.py
    │───tracker.py
    └───.....

client.py is the main interferece of our application, which receive frames from loomo.
detector.py is the core part of our application. There is a forward() function inside, which will processing the frames from client and return the tracked points(x, y) and the flags. Note that each of them is a python <list> type.
deep/checkpoint is the directory for fast-ReId checkpoint(weights), here is the link Link to fast-ReID checkpooint
hand_knn/dataset_embedded.npz is the embedded dataset only with 5-dims (21 landmarks --> 5 representations)
requirements.txt is all the packages used in our project. Please bulid a new python environment with this file to avoid env configuration error.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
deep		deep
hand_knn		hand_knn
sort		sort
README.md		README.md
client.py		client.py
deepsort.py		deepsort.py
detector.py		detector.py
follow.cfg		follow.cfg
requirements.txt		requirements.txt
run_client.sh		run_client.sh
saved_model.pth		saved_model.pth
test.py		test.py
yolov5s.pt		yolov5s.pt

Chuanfang-Neptune/DLAV-G9

Folders and files

Latest commit

History

Repository files navigation

DLAV

Milestone 1

Milestone 2

Milestone 3

Enjoy our 🤖️/🚗

About

Resources

Stars

Watchers

Forks

Languages