Skip to content

CinarSamet/Smart-R2-D2-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Smart R2-D2 Assistant: Edge-to-Cloud AI Robotics

License: MIT Hardware License C++ Python

This project is an end-to-end robotics project that combines a custom 3D-printed design of the iconic R2-D2 character with a real-time, AI-based voice assistant. While the system handles hardware and sensor management on the edge device (ESP32), it offloads the AI workload—which requires heavy processing power—to a local Python server.

🌟 Highlighted Features

  • Real-Time Communication (Edge-to-Cloud): Transmission of RAW PCM audio data received from the microphone via I2S protocol on the ESP32 to the local server over HTTP.
  • Speech-to-Text (STT): High-accuracy voice recognition using the OpenAI Whisper (Turbo) model.
  • Artificial Intelligence (LLM): Context-aware conversational response generation fitting the R2-D2 character, integrated with the Google Gemini API.
  • Text-to-Speech (TTS): Fluent and natural voice synthesis using the Edge-TTS infrastructure.
  • Dynamic UI: State-based (Listening, Thinking, Speaking, etc.) dynamic facial animations on an OLED display using the RobotFace library.
  • Wireless Management: Network configuration with WiFiManager and remote over-the-air code update support with ArduinoOTA.

🏗️ System Architecture

The system consists of two asynchronous units: Edge and Server:

  1. ESP32 (Edge Device): Woken up by a touch sensor. It records audio from the I2S microphone and transmits it to the server in RAW PCM format. It manages the animations and plays the response received from the server through the I2S amplifier.
  2. Flask Server (Python):
    • Converts the audio to a processable WAV format using FFmpeg.
    • Converts speech to text using Whisper.
    • Analyzes the text and generates a response using Gemini LLM.
    • Converts the response to speech using Edge-TTS, converts it back to RAW PCM, and sends it back to the device.

🛠️ Hardware and 3D Design

R2-D2's outer case and inner chassis were designed from scratch, fully optimized for 3D printers.

👉 3D Model and STL Files (Thingiverse)

Main Electronic Components Used:

  • Microcontroller: ESP32 Development Board
  • Microphone: INMP441 (I2S MEMS)
  • Audio Output: MAX98357A (I2S Class D Amplifier) + Speaker
  • Display: I2C OLED Display
  • Sensors: Capacitive Touch Sensor (For wake-up)

🛠️ Hardware Components

The following table highlights the core electronic components used to bring R2-D2 to life. These are organized into a grid for clear visualization.

Component Description Component Description
ESP32 Development Board ESP32 DevKit V1: The brain of the project. Manages WiFi, I2S audio streaming, and OLED animations. INMP441 Microphone INMP441 Microphone: High-performance I2S MEMS microphone for clear voice capture.
MAX98357A Amplifier MAX98357A Amp: I2S Class D amplifier that converts digital audio data into sound. OLED Display OLED Display (SSD1306): Displays real-time facial expressions and system status.
Touch Sensor Capacitive Touch Sensor: Acts as the wake-up trigger to start the listening process. Speaker 3W Speaker: Delivers the character-specific voice responses and system sounds.
TP4056 Charge Module TP4056 Module: Lithium battery charger with protection circuit to safely charge the 18650 cell. MT3608 Boost Converter MT3608 Boost Converter: Steps up the battery voltage to a stable 5V for the ESP32 and peripherals.
18650 Li-ion Battery 18650 Battery: High-capacity rechargeable Li-ion cell providing the main power source for the robot.

📐 System Connection Diagram

The diagram below illustrates the wiring between the ESP32 and its peripherals, as well as the logical flow between the Edge (ESP32) and the Cloud (Flask Server).

System Connection and Architecture Diagram


📁 Project Structure

├── server/
│   ├── app.py                 # Main Flask server application
│   ├── config/
│   │   └── settings.json      # API, TTS, and Server settings
│   └── requirements.txt       # Python dependencies
├── esp32/
│   ├── main.ino               # Main ESP32 source code
│   ├── RobotFace.h            # OLED animation library
│   └── r2d2_ses.h             # System sounds on PROGMEM
├── .gitignore
└── README.md

🚀 Setup and Usage

1. Server (Python Backend) Setup

For the project to work, FFmpeg must be installed on your computer and added to your system's PATH.

  1. Clone the repository to your computer and enter the directory:

    git clone [https://github.com/CinarSamet/Smart-R2-D2-Assistant.git](https://github.com/CinarSamet/Smart-R2-D2-Assistant.git)
    cd Smart-R2-D2-Assistant/server
  2. Install the required Python libraries:

    pip install -r requirements.txt
  3. Add your Gemini API key as an environment variable:

    export GEMINI_API_KEY="your_api_key"
  4. Start the server:

    python app.py

2. Hardware (ESP32) Setup

  1. Open the Arduino IDE and install the necessary libraries (WiFiManager, ArduinoOTA, etc.).
  2. Update the serverUrl variable in the esp32/main.ino file with the local IP address of the computer running the server (e.g., http://192.168.1.X:5001/upload).
  3. Upload the code to the ESP32.
  4. On its first boot, the device will create a Wi-Fi network named R2D2_Kurulum. Connect to this network to configure your local internet settings on the device.

🗺️ Roadmap

  • Establishing the I2S audio pipeline on the ESP32.
  • Ensuring closed-loop communication with Whisper, Gemini, and TTS integration.
  • Creating the hardware State Machine structure and OLED interface.
  • Dockerization: Containerizing the entire Flask/AI server infrastructure using Docker to make it environment-independent and deployable.
  • Sensor Fusion: Adding autonomous movement capabilities by integrating IMU data and distance sensors.

📄 License

The software codes in this repository (ESP32 and Python) are licensed under the MIT License. See the LICENSE file for more details.

The 3D hardware designs (STL files) of the project are subject to the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. Reproduction and sale for commercial purposes are prohibited. You are free to use and develop them in your personal projects.

About

An open-source, AI-powered R2-D2 replica featuring 3D printed parts, real-time voice interaction via ESP32, Whisper STT, Gemini LLM, and Edge-TTS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors