Skip to content

Human-Computer-Interaction-Lab-IHCLab/vlm_robotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Symmetry-Informed LLM/VLM + Deterministic Control for Robotic Object Manipulation (Educational DOFBOT)

📘 Read this in Spanish / Leer en español

📚 Citation

If you use or reference this repository, please cite the companion paper:

Gudiño-Lau, J., Durán-Fonseca, M., Anido-Rifón, L. E., & Santana-Mancilla, P. C.
A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education
(Under review in Symmetry)


This repository provides a reproducible hybrid framework that integrates a multimodal Large Language Model (LLM/VLM) as a semantic reasoning module with deterministic inverse kinematics for robotic object manipulation.
The system is implemented on the Yahboom DOFBOT educational arm and demonstrates how symmetry can be applied as a design principle rather than a learned feature.

Designed for educational and research purposes, this project aims to lower entry barriers in mechatronics education by showing how generative AI can interact with deterministic robotic control.


🧩 Overview

Robot Side (MainRobot.py)

  • Initializes and homes the robotic arm.
  • Captures a camera image and exposes it through two TCP servers:
    • Image Server on port 6103.
    • Command Server on port 6104.

Client Side (RuedaColorEs.py)

  • Connects to the robot’s image server to retrieve the latest frame.
  • Sends both text and image to an OpenAI multimodal model (gpt-4o).
  • Interprets the model’s reply (e.g., “lower right”) and sends the corresponding motion commands to the robot via TCP.

⚙️ Requirements

Hardware

  • Yahboom DOFBOT 6-DoF arm (or compatible)
  • Camera (USB/CSI)
  • Local network connection between robot and client

Software

  • Python 3.8+
  • Packages:
    pip install opencv-python openai
    

⚖️ License

This project is released under the MIT License, allowing free use, modification, and redistribution for educational and research purposes.
See the LICENSE file for details.

About

A Symmetry-Informed multimodal LLM-Driven Approach to Robotic Object Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages