This system uses Gemma 3 via OpenRouter to detect objects from camera feed or images and identify if they have cracks. It also provides surface coordinates of detected objects relative to the top-left of the input frame.
- Real-time camera detection or single image analysis
- Object detection with crack identification
- Surface coordinate extraction (x, y position relative to top-left)
- Bounding box visualization
- Confidence levels for detections
- Supports Gemma 3 vision model via OpenRouter
pip install -r requirements.txt- Get your OpenRouter API key from https://openrouter.ai/keys
- Copy
.env.exampleto.env:cp .env.example .env
- Edit
.envand add your API key:OPENROUTER_API_KEY=your_actual_api_key_here
Run the script to start camera detection:
python detect_cracks.pyControls:
- Press
cto capture and analyze the current frame - Press
qto quit
Analyze a specific image file:
python detect_cracks.py --image path/to/image.jpg
# or
python detect_cracks.py -i path/to/image.jpgExample with provided images:
python detect_cracks.py -i cube.jpg
python detect_cracks.py -i gear.jpg
python detect_cracks.py -i knuckle.jpgThe system provides:
-
Visual Output:
- Bounding boxes around detected objects
- Red boxes for objects with cracks
- Green boxes for objects without cracks
- Center point marked with purple circle
- Coordinates displayed below each object
-
Console Output:
============================================================ Object: Gear Has Crack: True Confidence: high Position (x, y): (640, 360) Bounding Box: (540, 260) to (740, 460) Description: Metal gear with visible crack on surface ============================================================ -
Saved Results:
- Camera mode:
detection_YYYYMMDD_HHMMSS.jpg - Image mode:
detected_<original_filename>.jpg
- Camera mode:
- Captures image from camera or loads from file
- Encodes image to base64 and sends to OpenRouter API
- Gemma 3 vision model analyzes the image for:
- Object identification
- Crack detection
- Object location (coordinates relative to top-left)
- Bounding box dimensions
- Results are parsed and visualized with OpenCV
- Coordinates are displayed relative to top-left corner (0,0)
- Origin (0, 0) is at the top-left corner of the image
- X increases from left to right
- Y increases from top to bottom
- Center point (x, y) represents the object's center
- Bounding box shows the full extent of the object
- Camera not opening: Check if camera is available and not used by another application
- API errors: Verify your OPENROUTER_API_KEY in
.envfile - Import errors: Ensure all dependencies are installed with
pip install -r requirements.txt
- The Gemma 3 model used is
google/gemma-2-9b-it:free - Detection accuracy depends on image quality and object visibility
- Camera resolution is set to 1280x720 by default
- API requests may take a few seconds to process