Real-time object detection system using ESP32-S3-CAM with COCO-SSD model running in the browser.
- Live video streaming from ESP32-S3-CAM
- Real-time object detection using TensorFlow.js COCO-SSD model
- Detects 80 common object classes
- Bounding boxes with confidence scores
- Browser-based AI processing (no server required)
- Configurable detection interval (default: 2 seconds)
- ESP32-S3-CAM board
- USB cable for programming
- WiFi network
- Arduino IDE (1.8.x or later)
- ESP32 board support for Arduino
- Web browser (Chrome recommended for best performance)
- Open Arduino IDE
- Go to File > Preferences
- Add this URL to "Additional Board Manager URLs":
https://dl.espressif.com/dl/package_esp32_index.json - Go to Tools > Board > Boards Manager
- Search for "esp32" and install "esp32 by Espressif Systems"
- Clone or download this repository
- Open
esp32_modular.inoin Arduino IDE - Configure your WiFi credentials in the code:
const char* ssid = "YOUR_WIFI_SSID"; const char* password = "YOUR_WIFI_PASSWORD";
- Select board: Tools > Board > ESP32 Arduino > ESP32S3 Dev Module
- Select the correct COM port
- Click Upload
- After uploading, open the Serial Monitor (115200 baud)
- Wait for the ESP32 to connect to WiFi
- Note the IP address displayed in Serial Monitor
- Open your web browser and navigate to:
http://[ESP32_IP_ADDRESS] - Wait for the COCO-SSD model to load (~25MB, may take 30-60 seconds)
- Point the camera at objects to detect them
The COCO-SSD model can detect 80 object classes including:
People & Animals:
- person, dog, cat, bird, horse, cow, sheep, bear, zebra, giraffe
Vehicles:
- bicycle, car, motorcycle, airplane, bus, train, truck, boat
Indoor Objects:
- chair, couch, bed, dining table, toilet, tv, laptop, mouse, keyboard, cell phone, book
Kitchen Items:
- bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange
And many more...
Edit the interval in html_page.h:
setInterval(detectObjects, 2000); // Change 2000 to desired millisecondsExamples:
- 1 second:
1000 - 5 seconds:
5000 - 500ms:
500
Edit in esp32_modular.ino:
config.frame_size = FRAMESIZE_VGA; // Options: QVGA, VGA, SVGA, XGA, UXGAesp32_modular/
├── esp32_modular.ino # Main ESP32 code (camera, WiFi, web server)
├── html_page.h # Web interface (HTML, CSS, JavaScript)
└── README.md # This file
- ESP32-S3-CAM captures video frames and streams them via HTTP
- Web browser receives the MJPEG stream
- TensorFlow.js loads the COCO-SSD model in the browser
- JavaScript periodically captures frames and runs object detection
- Canvas overlay draws bounding boxes and labels on detected objects
- Check browser console (F12) for errors
- Ensure stable internet connection (model downloads from CDN)
- Try Chrome browser for best compatibility
- Wait longer (model is ~25MB and can take time to download)
- Check Serial Monitor for error messages
- Verify camera pin configuration matches your board
- Try lowering frame size or JPEG quality
- Double-check SSID and password
- Ensure WiFi network is 2.4GHz (ESP32 doesn't support 5GHz)
- Check Serial Monitor for connection status
- Reduce camera resolution
- Increase detection interval
- Close other browser tabs
ESP32 Side:
- Camera: OV2640/OV5640 sensor
- Resolution: VGA (640x480)
- JPEG compression: Quality 22
- Web server: ESP HTTP Server
- Stream format: MJPEG
Browser Side:
- AI Framework: TensorFlow.js 4.11.0
- Model: COCO-SSD 2.2.3
- Detection: Client-side processing
- No cloud/server required
- Model load time: 30-60 seconds (one-time)
- Detection interval: Configurable (default 2 seconds)
- Inference time: ~100-300ms per frame (browser-dependent)
- Objects per frame: Up to 100 (practical limit ~10-20)
- TensorFlow.js: https://www.tensorflow.org/js
- COCO-SSD Model: https://github.com/tensorflow/tfjs-models/tree/master/coco-ssd
- ESP32 Arduino Core: https://github.com/espressif/arduino-esp32
This project is open source and available under the MIT License.
- Add support for custom trained models
- Implement person tracking
- Add snapshot/recording functionality
- Support for multiple camera streams
- Mobile-optimized interface
- Offline model hosting on SD card