This project is an end-to-end robotics project that combines a custom 3D-printed design of the iconic R2-D2 character with a real-time, AI-based voice assistant. While the system handles hardware and sensor management on the edge device (ESP32), it offloads the AI workload—which requires heavy processing power—to a local Python server.
- Real-Time Communication (Edge-to-Cloud): Transmission of RAW PCM audio data received from the microphone via I2S protocol on the ESP32 to the local server over HTTP.
- Speech-to-Text (STT): High-accuracy voice recognition using the OpenAI Whisper (Turbo) model.
- Artificial Intelligence (LLM): Context-aware conversational response generation fitting the R2-D2 character, integrated with the Google Gemini API.
- Text-to-Speech (TTS): Fluent and natural voice synthesis using the Edge-TTS infrastructure.
- Dynamic UI: State-based (Listening, Thinking, Speaking, etc.) dynamic facial animations on an OLED display using the
RobotFacelibrary. - Wireless Management: Network configuration with WiFiManager and remote over-the-air code update support with ArduinoOTA.
The system consists of two asynchronous units: Edge and Server:
- ESP32 (Edge Device): Woken up by a touch sensor. It records audio from the I2S microphone and transmits it to the server in RAW PCM format. It manages the animations and plays the response received from the server through the I2S amplifier.
- Flask Server (Python):
- Converts the audio to a processable WAV format using
FFmpeg. - Converts speech to text using
Whisper. - Analyzes the text and generates a response using
Gemini LLM. - Converts the response to speech using
Edge-TTS, converts it back to RAW PCM, and sends it back to the device.
- Converts the audio to a processable WAV format using
R2-D2's outer case and inner chassis were designed from scratch, fully optimized for 3D printers.
👉 3D Model and STL Files (Thingiverse)
Main Electronic Components Used:
- Microcontroller: ESP32 Development Board
- Microphone: INMP441 (I2S MEMS)
- Audio Output: MAX98357A (I2S Class D Amplifier) + Speaker
- Display: I2C OLED Display
- Sensors: Capacitive Touch Sensor (For wake-up)
The following table highlights the core electronic components used to bring R2-D2 to life. These are organized into a grid for clear visualization.
The diagram below illustrates the wiring between the ESP32 and its peripherals, as well as the logical flow between the Edge (ESP32) and the Cloud (Flask Server).
├── server/
│ ├── app.py # Main Flask server application
│ ├── config/
│ │ └── settings.json # API, TTS, and Server settings
│ └── requirements.txt # Python dependencies
├── esp32/
│ ├── main.ino # Main ESP32 source code
│ ├── RobotFace.h # OLED animation library
│ └── r2d2_ses.h # System sounds on PROGMEM
├── .gitignore
└── README.md
For the project to work, FFmpeg must be installed on your computer and added to your system's PATH.
-
Clone the repository to your computer and enter the directory:
git clone [https://github.com/CinarSamet/Smart-R2-D2-Assistant.git](https://github.com/CinarSamet/Smart-R2-D2-Assistant.git) cd Smart-R2-D2-Assistant/server -
Install the required Python libraries:
pip install -r requirements.txt
-
Add your Gemini API key as an environment variable:
export GEMINI_API_KEY="your_api_key"
-
Start the server:
python app.py
- Open the Arduino IDE and install the necessary libraries (
WiFiManager,ArduinoOTA, etc.). - Update the
serverUrlvariable in theesp32/main.inofile with the local IP address of the computer running the server (e.g.,http://192.168.1.X:5001/upload). - Upload the code to the ESP32.
- On its first boot, the device will create a Wi-Fi network named
R2D2_Kurulum. Connect to this network to configure your local internet settings on the device.
- Establishing the I2S audio pipeline on the ESP32.
- Ensuring closed-loop communication with Whisper, Gemini, and TTS integration.
- Creating the hardware State Machine structure and OLED interface.
- Dockerization: Containerizing the entire Flask/AI server infrastructure using Docker to make it environment-independent and deployable.
- Sensor Fusion: Adding autonomous movement capabilities by integrating IMU data and distance sensors.
The software codes in this repository (ESP32 and Python) are licensed under the MIT License. See the LICENSE file for more details.
The 3D hardware designs (STL files) of the project are subject to the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. Reproduction and sale for commercial purposes are prohibited. You are free to use and develop them in your personal projects.









