This project runs a Flask server that accepts images, runs YOLO detection (Ultralytics), queries an LLM (Ollama) for suggestions, and can send movement commands to a Raspberry Pi client. It now includes an optional Text-To-Speech (TTS) module to speak short descriptions of what the model sees.
Quick setup
- Create and activate a Python virtualenv (recommended):
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Enable TTS (optional):
- The project uses
pyttsx3for offline TTS. To enable speaking, set the environment variableENABLE_TTS=1before running the server. Example:
export ENABLE_TTS=1- Run the server:
python app.pyEndpoints
GET /- Web UIPOST /upload- Upload image as form fieldimage(multipart)GET /processed- Returns the latest annotated imageGET /detections- Returns JSON list of latest detectionsGET /ollama- Returns latest Ollama suggestionGET /get_command- Endpoint polled by Pi client for movement commandsGET /health- Health and basic status
TTS behavior
- When enabled, the server will call
tts.speak_detections(...)after processing images. The TTS module usespyttsx3by default, which works offline on many platforms.
Notes & troubleshooting
- You must have a YOLO model file (the repo includes
yolov8m.ptandyolov8n.pt). The server attempts to loadyolov8m.ptby default. Adjustinit_yolo()if you want a different model. - On headless Linux servers,
pyttsx3may require additional audio backends (e.g.,aplay/ ALSA) to be installed. If audio playback fails, TTS falls back to printing the sentence. - For Raspberry Pi robot control, see
pi_client.pyand configure GPIO pins.
Example POST using curl
curl -X POST -F "image=@/path/to/photo.jpg" http://localhost:1909/uploadExample test script
See examples/test_request.py for a small Python example that uploads an image and prints the response.
License
Add a LICENSE file appropriate to your project before publishing.