Group 9
Instructions
- Run a JavaScript to obtain the frame from our webcam from the browser.
- Convert the returned base64 to an OpenCV image format, Numpy ndarray with a shape (h, w, c), RGB
- Run YOLOv4 to detect objects in the returned frame from browser, here we only keep objects with human label (label 0)
- Iterate through all the detected human, and crop the original image only keep the objects inside each bounding boxes.
- Use mediapipe to inference hand landmarks (key points) in each croped image with human objects.
- Build a KNN model to implement hand pose recognition, here we build a small data set with
eight_sign
,five_sign
,four_sign
,ok
,one_sign
,six_sign
,spider
,ten_sign
,three_sign
,two_sign
, 10 different hand gestures. - In order to accelerate our reference speed, a embedding method is introdeced. The Dimensions of landmarks are reduced from 21 points to 5 points.
- When the detected person is doing the right hand gesture, we will render an alpha image only with bounding boxes.
This is how we done to detect the Point of Interests (POI).
Based on milestone1, we added the DeepSort
algorithm for tracking points of interest.
Something about the Coordinates
For a openCV picture, the 0-dim is y-axis, the 1-dim is x-axis, the 2-dim is channel
- xywh -> xc, yc, width, height
- xyxy -> left, top, right, bottom
- xyah -> xc, yc, w/h, height
- xysr -> xc, yc, square, h/w
Instructions
First, run of the cells one by one, and we upload all the required files to Google Drive. Use the gdown
with a id
parameter to download given file from Google Drive.
For example:
gdown 1dWOhStdDXK_kBefa9t9hDYLZ6kyrBwgP
By giving our app a hand gesture, you will be the POI (point of interest), then our app will keep tracking you whenever you are in or out of the camera. If you are out of the camera too long, our app will count your leaving time, and if run out of time, the app will re-initialize and try to find a new POI.
Here you can set the gesture you want and tuning the max leaving time, by give the variables target_pose
and max_count
some new values (See cell below).
Supported hand poses are
eight_sign
,five_sign
,four_sign
,ok
,one_sign
,six_sign
,spider
,ten_sign
,three_sign
,two_sign
.
Our milestone 1&2 is quite light-weighted. 🚀 Enjoy 🍻!
In milestone 1&2, we implement Object Detection, Keypoint Detection, KNN Classification, and Multi Object Tracking (MOT).
Code Structure
milestone3
│───deepsort.py
│───detector.py
│───client.py
│───requirements.txt
│
│───hand_knn
│ │───embedder.py
│ │───hand_detect.py
│ └───dataset_embedded.npz # KNN embedding data set with 5-dim and 10 classes
│
│───deep
│ │───fastreid
│ │───checkpoint # Download the checkpoint first
│ │───feature_extractor.py
│ └───.....
│
└───sort
│───detection.py
│───track.py
│───tracker.py
└───.....
client.py
is the main interferece of our application, which receive frames from loomo.detector.py
is the core part of our application. There is aforward()
function inside, which will processing the frames from client and return the tracked points(x, y) and the flags. Note that each of them is apython <list> type
.deep/checkpoint
is the directory for fast-ReId checkpoint(weights), here is the link Link to fast-ReID checkpoointhand_knn/dataset_embedded.npz
is the embedded dataset only with5-dims
(21 landmarks
-->5 representations
)requirements.txt
is all the packages used in our project. Please bulid a new python environment with this file to avoid env configuration error.