Skip to content

Shiva-a1/VisionARM

Repository files navigation

VisionARM

VisionARM: A Vision Based Hand Gesture Controlled Robotic Arm System

University of Florida | Artificial Intelligence System Project

VisionARM is an intelligent hand gesture controlled robotic arm system, that uses computer vision and machine learning to interpret hand gestures in real-time. The system leverages Google MediaPipe for precise hand landmark detection and a custom-trained Support Vector Machine (SVM) classifier to recognize 8 distint hand gesture labels with 98.5% accuracy. Build with accessibility and usability in mind, VisionARM features a complete User interface powered by Gradio, secure user authentication, real-time performance monitoring with Prometheus, and seamleass Arduino integration for physical robotic arm deployment. From introducing customization in industrial setting to being an assistive technology, VisionARM demonstrates the potential of vision-based interfaces in creating more natural and accessible human-robot interaction systems. The entire pipelineโ€”from gesture detection to command executionโ€”operates in real-time with an average latency of just ~20ms, making it suitable for practical applications requiring responsive control. Key Achievement: 98.5% gesture classification accuracy with sub-20ms latency on a dataset of 2,000 hand gesture samples.


๐Ÿ“‹ Table of Contents


โœจ Features

๐ŸŽฏ Core Capabilities

  • Real-Time Hand Gesture Recognition: Detects and classifies 8 different hand gestures using MediaPipe and SVM
  • High Accuracy: Achieves 98.5% classification accuracy on test data
  • Low Latency: Average processing time of ~20ms per frame for real-time responsiveness
  • Web-Based Interface: Intuitive Gradio UI accessible from any browser

๐Ÿ” User Management

  • Secure user registration and authentication system
  • Password hashing with SHA-256 encryption
  • User feedback collection and storage
  • Session management with persistent login state

๐Ÿ“Š Performance Monitoring

  • Real-time FPS and latency tracking
  • System resource monitoring (CPU, Memory, GPU)
  • Prometheus metrics integration
  • Live performance visualization with Plotly graphs
  • Historical performance data logging

๐Ÿค– Hardware Integration

  • Arduino serial communication for robotic arm control
  • Real-time command transmission
  • Configurable serial port settings
  • Hardware connection status monitoring

๐ŸŽจ User Interface Features

  • Live webcam feed with hand landmark visualization
  • Real-time gesture prediction display
  • Confidence score visualization
  • System status indicators
  • Interactive performance dashboards
  • User feedback system

Demo


๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     VisionARM System                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                  โ”‚                  โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  Camera  โ”‚      โ”‚  Gradio    โ”‚     โ”‚  Arduino   โ”‚
   โ”‚  Input   โ”‚      โ”‚  Web UI    โ”‚     โ”‚  Hardware  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                  โ”‚                  โ”‚
        โ”‚           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
        โ”‚           โ”‚    User     โ”‚           โ”‚
        โ”‚           โ”‚ Management  โ”‚           โ”‚
        โ”‚           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
        โ”‚                                     โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚           Hand Gesture Processing Pipeline          โ”‚
   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
   โ”‚  1. MediaPipe Hand Detection                        โ”‚
   โ”‚  2. Feature Extraction (10D Binary Vector Encoding) โ”‚
   โ”‚  3. SVM Classification (8 Gesture Classes)          โ”‚
   โ”‚  4. Command Mapping & Transmission                  โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                  โ”‚                  โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚Performanceโ”‚     โ”‚  Gesture   โ”‚     โ”‚  Database  โ”‚
   โ”‚Monitoring โ”‚     โ”‚ Prediction โ”‚     โ”‚  Storage   โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

VisionARM implements an end-to-end computer vision and machine learning pipeline for real-time hand gesture recognition and robotic arm control. The system achieves 98.5% accuracy with ~20ms latency, making it suitable for responsive real-time applications.

Model Layers

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    VisionARM System Stack                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Layer 1: Input Processing     โ”‚  Webcam โ†’ OpenCV โ†’ MediaPipe     โ”‚
โ”‚  Layer 2: Feature Extraction   โ”‚  Hand Landmarks โ†’ Binary Encodingโ”‚
โ”‚  Layer 3: Classification       โ”‚  SVM (RBF Kernel)                โ”‚
โ”‚  Layer 4: Command Translation  โ”‚  Gesture โ†’ Arduino Commands      โ”‚
โ”‚  Layer 5: Monitoring           โ”‚  Prometheus Metrics              โ”‚
โ”‚  Layer 6: User Interface       โ”‚  Gradio Web UI                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Installation

Prerequisites

  • Python: 3.11 or higher
  • Webcam: Built-in or USB camera
  • Operating System: Windows, macOS, or Linux
  • Arduino: For hardware integration

Step 1: Clone the Repository

git clone https://github.com/Shiva-a1/VisionARM.git
cd visionarm

Step 2: Create Virtual Environment

# Using conda (recommended)
conda create -n visionarm python=3.11
conda activate visionarm

# Or using venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Step 3: Install Dependencies

# Core dependencies
pip install mediapipe opencv-python numpy pandas
pip install scikit-learn joblib

# Web interface
pip install gradio

# Hardware communication
pip install pyserial

# Monitoring and visualization
pip install prometheus-client psutil plotly

๐Ÿ“ Project Structure

visionarm/
โ”‚
โ”œโ”€โ”€ main.ipynb                          # Main application notebook
โ”œโ”€โ”€ Classifcation_Model_Training.ipynb  # Model training pipeline
โ”œโ”€โ”€ user_manager.py                     # User authentication system
โ”œโ”€โ”€ README.md                           # Project documentation
โ”œโ”€โ”€ VisionARM.code-workspace            # VS Code workspace
โ”œโ”€โ”€ metadata.json                       # Dataset metadata
โ”‚
โ”œโ”€โ”€ Data/
โ”‚   โ”œโ”€โ”€ visionarm_dataset_v01_01.csv   # Training dataset
โ”‚   โ””โ”€โ”€users_database.csv             # User credentials
โ”‚
โ”œโ”€โ”€ SVM_Models/
โ”‚   โ””โ”€โ”€ best_label_classification_model.pkl         # Trained SVM model
โ”‚
โ”œโ”€โ”€trial.ipynb                       # trial notebook for rough work
โ””โ”€โ”€ arduino_code.txt                 # Text file containing Arduino code

Quick Set up and Usage Guide

  1. Clone the Repository
  • Clone the VisionARM repository from the link provided above.
  1. Open the project Folder
  • Open the Project Folder in you local IDE (VS Code Prefered).
  1. Install Arduino Extension
  • Install the PlatformIO Extension in VSCode. One installed created a new project and then navigate to main.cpp file present inside src folder.
  • From our VisionARM folder, go to arduino_code.txt and copy the Arduino code from there and paste it in main.cpp file of the Arduino project folder. Once done, do make sure that you Arduino Hardware is properly connected to the local system (laptop), and all the port labels in the Arduino code matches correctly with you Arduino board.
  • Once done press the โœ… button at the bottom left and press the โžก๏ธ next to it. This will over write the Arduino code in the Arduino board's memory.
  1. Integrating Arduino with VisionARM project
  • Navigate to main.ipynb file inside our VisionARM project folder and go to 8. Arduino Serial Communication section and edit the Serial Port value to the one you are using to connect Arduino Hardware to your local device.
  1. Launch the AI System
  • Navigate to user_manager.py file and run it.
  • If it runs successfully, then navigate to main.ipynb file and run all the cells.
  • If every cell runs without any error, directly navigate to the last cell, which displays our system's UI.
  1. Accessing the User Interface
  • Register/Login to access the system and then press `Start Detection' button. Then your system's camera would turn on, where you will perform hand gestures, based on which the Arduino system would perform operations, and metrics plots on the UI would keep on updating.

๐Ÿ“Š Dataset

Dataset Overview

  • Name: visionarm_dataset_v01_01.csv
  • Version: v01_01
  • Creation Date: October 19, 2025
  • Total Samples: 2,000
  • Features: 10 (binary finger states)
  • Classes: 8 gesture types

Feature Description

Each sample contains 10 binary features representing finger states:

Feature Description
left_thumb Left hand thumb (1=raised, 0=closed)
left_index Left hand index finger
left_middle Left hand middle finger
left_ring Left hand ring finger
left_pinky Left hand pinky finger
right_thumb Right hand thumb
right_index Right hand index finger
right_middle Right hand middle finger
right_ring Right hand ring finger
right_pinky Right hand pinky finger

Class Distribution

stop     : 200 samples
grab     : 200 samples
drop     : 200 samples
up       : 200 samples
down     : 200 samples
left     : 200 samples
right    : 200 samples
invalid  : 600 samples

Data Sources

The dataset combines:

  1. Synthetic Data: Manually created binary patterns for gesture classes
  2. Real Data: MediaPipe-encoded hand landmark data from real-time captures

๐Ÿ”Œ Hardware Integration

Arduino Setup

Hardware Requirements

  • Arduino Uno/Mega/Nano
  • Servo motors (typically 4-6 for robotic arm)
  • Power supply (5V/12V depending on servos)
  • USB cable for serial communication

Arduino Code:

  • Copy paste the Arduino code present in arduino.txt to main.cpp file in src folder present in Arduino project we created in PlatformIO extension in VS Code.
  • Download all the required libraries and edit the port serial numbers for each servo-motor based on your hardware device.
  • Once done press the โœ… button at the bottom left and press the โžก๏ธ next to it. This will over write the Arduino code in the Arduino board's memory.

Serial Communication

Python Configuration

import serial
import time

arduino = serial.Serial('/dev/tty.usbserial-10', 9600)  # Change based on your system
time.sleep(2) 

def send_label(label):
    message = label + '\n'
    arduino.write(message.encode())
    print(label)

Port Detection

Windows: COM3, COM4, etc. macOS: /dev/tty.usbserial-* Linux: /dev/ttyUSB0, /dev/ttyACM0


๐Ÿ“ˆ Performance Monitoring

Metrics Tracked

System Metrics

  • FPS (Frames Per Second): Real-time frame processing rate
  • Latency: Average processing time per frame
  • CPU Usage: Percentage of CPU utilization
  • Memory Usage: RAM consumption

Application Metrics

  • Total Frames Processed: Cumulative frame count
  • Gesture Distribution: Count of each gesture detected
  • Session Duration: Active detection time

Prometheus Integration

The system exposes metrics on port 8000:

# Access metrics endpoint
curl http://localhost:8000

Available Metrics:

  • gesture_predictions_total: Counter for each gesture type
  • prediction_latency_seconds: Histogram of processing times
  • cpu_usage_percent: Current CPU usage
  • memory_usage_percent: Current memory usage

Visualization

Real-time plots include:

  • FPS Over Time: Line graph of frame rate
  • Latency Distribution: Histogram of processing times
  • Resource Usage: CPU and Memory usage over time
  • Gesture Frequency: Bar chart of detected gestures

๐Ÿ› ๏ธ Technologies Used

Computer Vision & ML

  • MediaPipe (v0.10.0+): Hand landmark detection and tracking
  • OpenCV (v4.8.0+): Image processing and video capture
  • scikit-learn (v1.3.0+): SVM classifier and model evaluation
  • NumPy (v1.24.0+): Numerical computations
  • Pandas (v2.0.0+): Data manipulation and analysis

Web Interface

  • Gradio (v4.0.0+): Interactive web UI framework
  • Plotly (v5.17.0+): Interactive visualizations

System & Monitoring

  • Prometheus Client: Metrics collection and exposure
  • psutil: System resource monitoring
  • Threading: Concurrent processing

Hardware Communication

  • PySerial (v3.5+): Arduino serial communication

Security

  • hashlib: Password hashing (SHA-256)

๐Ÿ”ฎ Future Enhancements

Planned Features

  • Drift Detection: Drift detection using NannyML.
  • Enhanced Gesture Set: Support for 20+ gestures including custom gestures
  • Multi-Hand Tracking: Independent control with two hands
  • Mobile App: Native iOS/Android applications
  • Gesture Recording: Save and replay gesture sequences
  • Cloud Integration: Remote control and monitoring
  • Gesture Customization: User-defined gesture mapping
  • Multi-User Support: Collaborative control modes

Research Directions

  • Adaptive Learning: Online learning from user corrections
  • Context-Aware Recognition: Environment and task-specific gestures
  • Robustness Improvements: Better performance in varied lighting
  • Latency Optimization: Sub-10ms processing pipeline
  • Energy Efficiency: Optimizations for embedded systems

๐Ÿค Acknowledgments

Frameworks

  • Google MediaPipe Team: For the excellent hand tracking solution

University Support

  • University of Florida: For project support and resources

Inspiration

This project was inspired by the growing need for intuitive human-robot interaction systems and the potential of computer vision in creating accessible control interfaces.

Special Thanks

  • Professor Andrea Ramirez Salgado for her support in providing all the required hardware resources, her valuable guidance and feedback.

๐Ÿ“ž Contact

Project Author

VisionARM Developer


About

VisionARM: A Vision Based Hand Gesture Controlled Robotic Arm System.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors