VisionARM: A Vision Based Hand Gesture Controlled Robotic Arm System
University of Florida | Artificial Intelligence System Project
VisionARM is an intelligent hand gesture controlled robotic arm system, that uses computer vision and machine learning to interpret hand gestures in real-time. The system leverages Google MediaPipe for precise hand landmark detection and a custom-trained Support Vector Machine (SVM) classifier to recognize 8 distint hand gesture labels with 98.5% accuracy. Build with accessibility and usability in mind, VisionARM features a complete User interface powered by Gradio, secure user authentication, real-time performance monitoring with Prometheus, and seamleass Arduino integration for physical robotic arm deployment. From introducing customization in industrial setting to being an assistive technology, VisionARM demonstrates the potential of vision-based interfaces in creating more natural and accessible human-robot interaction systems. The entire pipelineโfrom gesture detection to command executionโoperates in real-time with an average latency of just ~20ms, making it suitable for practical applications requiring responsive control. Key Achievement: 98.5% gesture classification accuracy with sub-20ms latency on a dataset of 2,000 hand gesture samples.
- Features
- Demo
- System Architecture
- Installation
- Project Structure
- Quick Set up and Usage Guide
- Dataset
- Hardware Integration
- Performance Monitoring
- Technologies Used
- Future Enhancements
- Acknowledgments
- Contact
- Real-Time Hand Gesture Recognition: Detects and classifies 8 different hand gestures using MediaPipe and SVM
- High Accuracy: Achieves 98.5% classification accuracy on test data
- Low Latency: Average processing time of ~20ms per frame for real-time responsiveness
- Web-Based Interface: Intuitive Gradio UI accessible from any browser
- Secure user registration and authentication system
- Password hashing with SHA-256 encryption
- User feedback collection and storage
- Session management with persistent login state
- Real-time FPS and latency tracking
- System resource monitoring (CPU, Memory, GPU)
- Prometheus metrics integration
- Live performance visualization with Plotly graphs
- Historical performance data logging
- Arduino serial communication for robotic arm control
- Real-time command transmission
- Configurable serial port settings
- Hardware connection status monitoring
- Live webcam feed with hand landmark visualization
- Real-time gesture prediction display
- Confidence score visualization
- System status indicators
- Interactive performance dashboards
- User feedback system
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VisionARM System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโผโโโโโโ โโโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ
โ Camera โ โ Gradio โ โ Arduino โ
โ Input โ โ Web UI โ โ Hardware โ
โโโโโโฌโโโโโโ โโโโโโโฌโโโโโโโ โโโโโโโฒโโโโโโโ
โ โ โ
โ โโโโโโโโผโโโโโโโ โ
โ โ User โ โ
โ โ Management โ โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโ
โ Hand Gesture Processing Pipeline โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1. MediaPipe Hand Detection โ
โ 2. Feature Extraction (10D Binary Vector Encoding) โ
โ 3. SVM Classification (8 Gesture Classes) โ
โ 4. Command Mapping & Transmission โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ
โPerformanceโ โ Gesture โ โ Database โ
โMonitoring โ โ Prediction โ โ Storage โ
โโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
VisionARM implements an end-to-end computer vision and machine learning pipeline for real-time hand gesture recognition and robotic arm control. The system achieves 98.5% accuracy with ~20ms latency, making it suitable for responsive real-time applications.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VisionARM System Stack โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Layer 1: Input Processing โ Webcam โ OpenCV โ MediaPipe โ
โ Layer 2: Feature Extraction โ Hand Landmarks โ Binary Encodingโ
โ Layer 3: Classification โ SVM (RBF Kernel) โ
โ Layer 4: Command Translation โ Gesture โ Arduino Commands โ
โ Layer 5: Monitoring โ Prometheus Metrics โ
โ Layer 6: User Interface โ Gradio Web UI โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Python: 3.11 or higher
- Webcam: Built-in or USB camera
- Operating System: Windows, macOS, or Linux
- Arduino: For hardware integration
git clone https://github.com/Shiva-a1/VisionARM.git
cd visionarm# Using conda (recommended)
conda create -n visionarm python=3.11
conda activate visionarm
# Or using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate# Core dependencies
pip install mediapipe opencv-python numpy pandas
pip install scikit-learn joblib
# Web interface
pip install gradio
# Hardware communication
pip install pyserial
# Monitoring and visualization
pip install prometheus-client psutil plotlyvisionarm/
โ
โโโ main.ipynb # Main application notebook
โโโ Classifcation_Model_Training.ipynb # Model training pipeline
โโโ user_manager.py # User authentication system
โโโ README.md # Project documentation
โโโ VisionARM.code-workspace # VS Code workspace
โโโ metadata.json # Dataset metadata
โ
โโโ Data/
โ โโโ visionarm_dataset_v01_01.csv # Training dataset
โ โโโusers_database.csv # User credentials
โ
โโโ SVM_Models/
โ โโโ best_label_classification_model.pkl # Trained SVM model
โ
โโโtrial.ipynb # trial notebook for rough work
โโโ arduino_code.txt # Text file containing Arduino code
- Clone the Repository
- Clone the VisionARM repository from the link provided above.
- Open the project Folder
- Open the Project Folder in you local IDE (VS Code Prefered).
- Install Arduino Extension
- Install the PlatformIO Extension in VSCode. One installed created a new project and then navigate to
main.cppfile present insidesrcfolder. - From our
VisionARMfolder, go toarduino_code.txtand copy the Arduino code from there and paste it inmain.cppfile of the Arduino project folder. Once done, do make sure that you Arduino Hardware is properly connected to the local system (laptop), and all the port labels in the Arduino code matches correctly with you Arduino board. - Once done press the โ button at the bottom left and press the โก๏ธ next to it. This will over write the Arduino code in the Arduino board's memory.
- Integrating Arduino with VisionARM project
- Navigate to
main.ipynbfile inside ourVisionARMproject folder and go to8. Arduino Serial Communicationsection and edit the Serial Port value to the one you are using to connect Arduino Hardware to your local device.
- Launch the AI System
- Navigate to
user_manager.pyfile and run it. - If it runs successfully, then navigate to
main.ipynbfile and run all the cells. - If every cell runs without any error, directly navigate to the last cell, which displays our system's UI.
- Accessing the User Interface
- Register/Login to access the system and then press `Start Detection' button. Then your system's camera would turn on, where you will perform hand gestures, based on which the Arduino system would perform operations, and metrics plots on the UI would keep on updating.
- Name:
visionarm_dataset_v01_01.csv - Version: v01_01
- Creation Date: October 19, 2025
- Total Samples: 2,000
- Features: 10 (binary finger states)
- Classes: 8 gesture types
Each sample contains 10 binary features representing finger states:
| Feature | Description |
|---|---|
left_thumb |
Left hand thumb (1=raised, 0=closed) |
left_index |
Left hand index finger |
left_middle |
Left hand middle finger |
left_ring |
Left hand ring finger |
left_pinky |
Left hand pinky finger |
right_thumb |
Right hand thumb |
right_index |
Right hand index finger |
right_middle |
Right hand middle finger |
right_ring |
Right hand ring finger |
right_pinky |
Right hand pinky finger |
stop : 200 samples
grab : 200 samples
drop : 200 samples
up : 200 samples
down : 200 samples
left : 200 samples
right : 200 samples
invalid : 600 samples
The dataset combines:
- Synthetic Data: Manually created binary patterns for gesture classes
- Real Data: MediaPipe-encoded hand landmark data from real-time captures
- Arduino Uno/Mega/Nano
- Servo motors (typically 4-6 for robotic arm)
- Power supply (5V/12V depending on servos)
- USB cable for serial communication
- Copy paste the Arduino code present in
arduino.txttomain.cppfile insrcfolder present in Arduino project we created in PlatformIO extension in VS Code. - Download all the required libraries and edit the port serial numbers for each servo-motor based on your hardware device.
- Once done press the โ button at the bottom left and press the โก๏ธ next to it. This will over write the Arduino code in the Arduino board's memory.
import serial
import time
arduino = serial.Serial('/dev/tty.usbserial-10', 9600) # Change based on your system
time.sleep(2)
def send_label(label):
message = label + '\n'
arduino.write(message.encode())
print(label)Windows: COM3, COM4, etc. macOS: /dev/tty.usbserial-* Linux: /dev/ttyUSB0, /dev/ttyACM0
- FPS (Frames Per Second): Real-time frame processing rate
- Latency: Average processing time per frame
- CPU Usage: Percentage of CPU utilization
- Memory Usage: RAM consumption
- Total Frames Processed: Cumulative frame count
- Gesture Distribution: Count of each gesture detected
- Session Duration: Active detection time
The system exposes metrics on port 8000:
# Access metrics endpoint
curl http://localhost:8000Available Metrics:
gesture_predictions_total: Counter for each gesture typeprediction_latency_seconds: Histogram of processing timescpu_usage_percent: Current CPU usagememory_usage_percent: Current memory usage
Real-time plots include:
- FPS Over Time: Line graph of frame rate
- Latency Distribution: Histogram of processing times
- Resource Usage: CPU and Memory usage over time
- Gesture Frequency: Bar chart of detected gestures
- MediaPipe (v0.10.0+): Hand landmark detection and tracking
- OpenCV (v4.8.0+): Image processing and video capture
- scikit-learn (v1.3.0+): SVM classifier and model evaluation
- NumPy (v1.24.0+): Numerical computations
- Pandas (v2.0.0+): Data manipulation and analysis
- Gradio (v4.0.0+): Interactive web UI framework
- Plotly (v5.17.0+): Interactive visualizations
- Prometheus Client: Metrics collection and exposure
- psutil: System resource monitoring
- Threading: Concurrent processing
- PySerial (v3.5+): Arduino serial communication
- hashlib: Password hashing (SHA-256)
- Drift Detection: Drift detection using NannyML.
- Enhanced Gesture Set: Support for 20+ gestures including custom gestures
- Multi-Hand Tracking: Independent control with two hands
- Mobile App: Native iOS/Android applications
- Gesture Recording: Save and replay gesture sequences
- Cloud Integration: Remote control and monitoring
- Gesture Customization: User-defined gesture mapping
- Multi-User Support: Collaborative control modes
- Adaptive Learning: Online learning from user corrections
- Context-Aware Recognition: Environment and task-specific gestures
- Robustness Improvements: Better performance in varied lighting
- Latency Optimization: Sub-10ms processing pipeline
- Energy Efficiency: Optimizations for embedded systems
- Google MediaPipe Team: For the excellent hand tracking solution
- University of Florida: For project support and resources
This project was inspired by the growing need for intuitive human-robot interaction systems and the potential of computer vision in creating accessible control interfaces.
- Professor Andrea Ramirez Salgado for her support in providing all the required hardware resources, her valuable guidance and feedback.
VisionARM Developer
- ๐ง Email: shivansh.ade@ufl.edu
- ๐ Website: visionarm-project.github.io
