New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFX Workspace: A Rust crate for the YOLO family of object detection models #2768
Comments
I have forked |
Excellent. Thanks! |
Week 1 Update. Added support for 2 new functions to the Wasmedge / opencvmini plugin, Added support for the two functions in the secondstate rust opencvmini SDK, Begun creation of example of using Yolo in WasmNN Proposal crate dependencies for the YOLO Crate: Design thoughts: let yolo = YoloModel::new()
.network(net_bytes, ModelTypeEnum)
.class_names(Vec<String>)
.build()
.unwrap(); // With useful error message related to failure.
let classes = yolo.infer_image(image_bytes);
let classes2 = yolo.infer_image(image2_bytes);
let video_classes = yolo.infer_video(video_bytes); Plan for following week:
|
Update Weeks 2 + 3
Plan for the next 3-6 weeks.
|
Week 4 Update.
Plan this week is to use FFMpeg to parse video frames, run detection on them individually, then reassembling the video |
Week 5 + 6 update: Spent most of this time evaluating video processing options and attempting to build proof of concept applications to process video for the yolo-rs
After discussions with community and @juntao, decided that due to memory limitations of wasm32 application, largest video that would be able to be processed would need to fit into instance memory of <4GB, And it may be better to develop a plugin that the YOLO crate can call out to, in order to process video, this path means that we get native performance, as well as more options for enabling hardware specific optimizations. Second plan: Developing a wasmedge plugin for the yolo crate purely for video processing using ffmpeg.
This video processing plugin exists at Charles-Schleich/wasmedge-yolo-rs-video-processing-plugin I plan on pursuing doing the video processing inside pure WASM after getting a fully working plugin for the yolo-rs crate at Charles-Schleich/yolo-rs |
Week 7 Update: Worked entirely this week on the video processing plugin for YOLO. mod plugin {
type FramesCount = i32;
type HostResultType = i32; // Can correspond 0 to okay, and num>0 to the equivalent of an error enum
#[link(wasm_import_module = "yolo-video-proc")]
extern "C" {
pub fn load_video_to_host_memory(
str_ptr: i32,
str_len: i32,
str_capacity: i32,
width_ptr: *mut u32,
height_ptr: *mut u32,
) -> FramesCount;
pub fn get_frame(
frame_index: i32,
image_buf_ptr: i32,
image_buf_len: i32,
image_buf_capacity: i32,
) -> i32;
pub fn write_frame(frame_index: i32, image_buf_ptr: i32, image_buf_len: i32) -> i32;
pub fn assemble_output_frames_to_video(
str_ptr: i32,
str_len: i32,
str_capacity: i32,
) -> FramesCount;
}
} Explanation of each function. I am currently working with videos with just a single video stream. End of Week 7 progress report.
Interesting findings: I spent some time creating really low resolution videos, i.e. 2px by 2px, 5px by 5px, and 10px by 10px, so i could validate the format of bytes received from decoding the stream. |
Week 8 + 9 Update: These two weeks have been focused entirely on understanding how video encoders work, how FFMPEG handles re-encoding raw frames into a video, and implementing that in Rust in the form of a plugin. I have an repository yolo-video-processing-plugin in active development with a simple example involving a 3 step pipeline I have been following the https://github.com/leandromoreira/ffmpeg-libav-tutorial My plan between now and the 22nd involves
|
Week 10 + 11 Update: Week 10 : mainly developing the Video processing plugin and battling FFMPEG rust bindings getting re-encoding of raw frames to work reliability. Week 11: Adding the video processing plugin code to the main repository, Link to the final repo Plan for week 12: 20 Nov -> 30 Nov
|
Hey you still working on this ?? |
Hello @ehxdie, Yes, still working to improve developer experiencing and fix a few issues with the project. |
Motivation
YOLO (You Only Look Once) are a family of high performance models for general object detection in images and videos. There exist many tutorials online to use YOLO models for object detection in Python.
There are drawbacks to making use of python as the runtime language for inference in a production setting.
The typical Python setup will include adding packages to the environment such as OpenCV/Tensorflow/Pytorch, and potentially CUDA drivers should the target execution hardware be a Nvidia GPU.
One might then attempt to Dockerize the production project, additionally requiring Nvidia Container Toolkit.
In the case of embedded devices, and Micro-controllers this setup is unfeasible.
With WASI-NN plugins, WasmEdge is well suited for running AI applications, and can offer Rust + Wasm alternative to Python + Docker setups.
Details
An application doing inference must pre-process input data (Images, Audio, Video) into TFlite/ PyTorch formats and post-process the outputs from the models to be used further.
While many functions exist in opencvmini and media-pipe, there are some Python equivalent functions missing to fully support the pre and post processing required for the YOLO family of models.
In addition to adding these functions, creating a high level crate purely for object detection using YOLO would lower the barrier of entry to using Rust for object detection.
The aim of this issue is to:
Rust SDK Design
The design of the YOLO SDK will be similar to the media-pipe crate.
We may choose to extend the functionality of this crate depending on extra time available.
Milestones
Appendix
Python OpenCV example for object detection in YOLO
The text was updated successfully, but these errors were encountered: