67 Clock Bot is an autonomous computer-vision robot that reacts to people and obstacles in real time using only a single camera, an ESP32, and an ultrasonic sensor.
It combines YOLOv8, MiDaS depth-from-RGB, and a custom finite-state machine to make consistent decisions under changing conditions.
The entire system was designed, built, and tested in 3 days, covering computer vision, embedded systems, real-time control, and robotics behavior.
-
- Install Python 3.10+
-
- Install required Python packages pip install ultralytics pip install opencv-python pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install numpy
-
- YOLOv8 model
The YOLOv8 model will automatically download yolov8n.pt on first run.
-
- MiDaS depth model
The MiDaS model (MiDaS_small) is automatically downloaded the first time depth.py runs.
-
- ESP32 Setup
Upload esp32.ino to your ESP32 board.
You’ll need:
ESP32 board support installed
Motor driver wiring (L298N or similar)
Ultrasonic sensor wired to trigger/echo pins
python main.py
Make sure to enter:
Your ESP32 IP
Your UDP command port
Your PC listening port
Your camera index or camera stream URL
inside main.py or utils.py.
The robot runs YOLOv8n at ~15 Hz and selects the closest person by tracking the largest bounding box.
This is used to determine:
- whether a person is near
- where the person is located horizontally
- how the robot should turn while retreating
MiDaS generates a dense depth map using only an RGB image, without stereo cameras.
Relative depth is used to determine whether a detected person is “close enough” to trigger retreat.
An ESP32 sends { "range_cm": value } to the PC over UDP.
This realtime distance reading allows:
- halting retreat when a wall is too close
- entering SLIDE when backing into a wall
- entering BOUNCE when escaping corners
Robot behavior is controlled by a predictable FSM:
- IDLE – no person detected
- RETREAT – person detected and close
- SLIDE – wall detected during retreat
- BOUNCE – small forward escape after sliding
This structure ensures stable and explainable reactions.
The PC controls behavior and vision; the ESP32 handles motor output and wall sensing.
The project is organized into clear, modular files:
67-clock-bot/ │ ├── main.py # Main event loop (vision → control → drive commands) ├── detector.py # YOLO person detector ├── depth.py # MiDaS depth-from-RGB estimation ├── controller.py # Finite-state machine + motion logic ├── telemetry.py # ESP32 UDP listener for wall range data └── utils.py # UDP senders + on-screen visualization
Runs YOLOv8 and returns only the largest detected person, reducing noise and improving stability.
Runs MiDaS Small and normalizes the depth map using percentile scaling, producing a stable [0,1] depth range.
Implements the FSM and computes linear/angular velocity commands based on fused CV + telemetry data.
Runs a background thread to receive ultrasonic data over UDP from the ESP32.
The real-time loop that ties everything together:
- Capture frame
- Detect person
- Estimate depth
- Read telemetry
- Update FSM
- Send motion commands
- Real-time computer vision pipeline (YOLOv8 + MiDaS)
- Depth estimation from RGB-only input
- Multi-sensor fusion between vision and ultrasonic sensing
- Finite-state machine for predictable behavior
- Modular and extensible architecture
- Reliable UDP communication at ~15 Hz
- Fully functional robot built in 3 days
- Person-following mode
- Local mapping or SLAM
- Stereo depth or hardware depth camera
- ROS2 ecosystem integration
- On-device inference (Jetson, OAK-D)
- Additional behaviors (path planning, patrolling)