This project came out of a fairly practical problem. I wanted a small rover that could drive to a few fixed checkpoints, stop in a repeatable way, take a picture, and decide whether something looked wrong. In theory, a full ROS navigation stack should have been the obvious solution. In practice, on this hardware, it turned into more complexity and instability than I wanted for a small inspection robot.
Going in the other direction did not work either. Simple encoder replay was easy enough, but not accurate enough on its own. The rover could get close, but not close enough to trust the camera view for repeatable inference. So I ended up building a simpler checkpoint-based system instead: teach the route once, replay it later, verify the checkpoint with LiDAR, align with an AprilTag, capture an image, and run inference locally on the rover.
In the current proof-of-concept mission, the rover visits three checkpoints. Two of them use Edge Impulse Visual Anomaly Detection to check door-related conditions, and one uses a standard classification model to check whether a stove has been left on.
This is not meant to compete with general robot navigation. It is a narrower approach for a narrower job. But for fixed inspection routes, such as home safety rounds, facility inspection, or other checkpoint-based monitoring tasks, that tradeoff can be worth it: less infrastructure, less tuning, more repeatable camera views, and a system that is easier to understand, debug, and reproduce.
Short GIF-video below, if not visible, see this YouTube video. Both videos are created by stitching together the still pictures the rover itself took every 2 seconds, leading to an astonishing frame rate of 0.5 FPS!
- Overview
- Hardware Requirements
- Rover Installation and Setup
- Data Collection and Cameras
4.1. OAK Camera
4.1.1. Capture Images for VAD
4.1.2. OAK to Edge Impulse Program
4.2. Pan and Tilt USB Camera
4.2.1. Capture Images for Classification
4.2.2. Teleop Program - Route Teach and Replay
5.1. Navigation Strategy
5.2. Teach the Route
5.3. Replay the Route - Edge Impulse Models
6.1. Visual Anomaly Detection Project
6.1.1. Create an Impulse
6.1.2. Train
6.2. Classification Project
6.2.1. Create an Impulse
6.2.2. Train
6.3. Deploy Models - Run the Full Mission
7.1. Results
7.1.1. Anyone Left the Doors Open or Blocked?
7.1.2. Did I Forget to Turn Off the Stove? - Example Applications
- Tradeoffs Compared to ROS Navigation Frameworks
- Conclusion
The rover performs autonomous inspection missions consisting of:
- Replay previously recorded encoder motion
- Verify checkpoint identity using LiDAR signature matching
- Align orientation using AprilTags
- Capture RGB inspection image
- Run Edge Impulse VAD or classification inference locally
- Store inference results and inspection image
This enables repeatable condition monitoring at fixed inspection points without requiring external infrastructure or complex navigation frameworks.
Tested platform:
- Waveshare UGV Rover PI ROS, including:
- the rover itself
- Raspberry Pi 4/5
- LD19 LiDAR
- OAK camera
- Pan and Tilt USB Camera
- AprilTags (recommended size: 10 x 10 cm or larger)
- Print out the tags on white paper and fasten them close to the checkpoints
What are AprilTags?
- AprilTags are conceptually similar to QR Codes, in that they are a type of two-dimensional bar code. However, they are designed to encode far smaller data payloads (between 4 and 12 bits), allowing them to be detected more robustly and from longer ranges. Further, they are designed for high localization accuracy: you can compute the precise 3D position of the AprilTag with respect to the camera (Source). In practice this means that with some calibration you'll know the distance from the camera to the tag, horizontal and vertical angles, and rotation. The final accuracy depends on the camera resolution, tag size, the real distance to the tag, and visibility.
Connect to the onboard Raspberry Pi over your normal field workflow, for example via SSH, serial console, or a directly attached keyboard and display. Newcomers are strongly recommended to follow the official rover instructions.
- Clone this repository onto the Raspberry Pi and work from the project root directory.
- Install the required Python packages in the environment you will use on the rover:
pip install edge_impulse_linux opencv-contrib-python pillow numpy pyserial pyyaml depthaiThis project uses two camera paths for different parts of the mission. The OAK camera for the two VAD door checkpoints, and the pan & tilt USB-camera for the stove on/off checkpoint.
The OAK camera is used for checkpoint imaging, AprilTag alignment, and VAD-oriented capture flow.
Use the OAK camera when you need:
- AprilTag-based alignment
- consistent checkpoint framing
- capture for VAD checkpoints
For VAD collection, the main goal is consistency rather than class balancing.
Use the OAK camera to collect:
- normal-condition images only for the VAD training set
- approximately similar viewpoints at each checkpoint, accepting normal rover drift in position and orientation
- enough variation in lighting, small pose offsets, and minor natural scene changes to make the model robust to real mission conditions
oak_to_ei_v321.py is useful when you want to send images directly into Edge Impulse. oak_save_frames.py can still be used for manual local frame capture, but it is not the primary workflow.
The primary OAK-side ingestion helper is oak_to_ei_v321.py.
Use it when you want to stream OAK images directly into an Edge Impulse training dataset. The script:
- opens the OAK camera through DepthAI
- captures
640 x 360RGB frames - prompts for one label
- uploads JPEG frames to the Edge Impulse ingestion API at roughly
1 FPS - keeps running until you stop it with
Ctrl+C
Before running it, set your Edge Impulse API key either in the shell environment or in a nearby .env file. You can create API keys in Edge Impulse from Dashboard/Keys.
EI_API_KEY=your_edge_impulse_keyTypical usage:
python3 Source/oak_to_ei_v321.pyThen:
- enter the label when prompted
- let it upload frames for that label
- change slowly the rover pose and surrounding lighting - don't strive for perfection (the real world is anyhow most often messy...)
- stop it with
Ctrl+C - restart it with a different label if you want to collect another class or condition
The pan and tilt USB camera is used where a movable viewpoint is needed, especially for checkpoint-specific classification flows such as the stove/lamp style inspection case.
Note: The default USB-camera on the rover has a 160° ultra-wide-angle lens which is perfect for live viewing purposes, but far from optimal for classification of a small stove lamp 70 cm from the floor. While it's easy to swap out the camera or in some cases only the lens, I solved this challenge by using traditional computer vision. In short, the stove panel is found with a reference-image feature-matching + homography approach, using OpenCV functions. This of course means that you - unless you have similar stove as me (unlikely) - need to adjust the algorithm to fit your use case.
Use the USB pan/tilt camera when the target requires a carefully aimed or adjustable viewpoint.
Recommended flow:
- start
pt_camera_teleop.py - drive and aim the camera until the target is framed correctly
- use
pfor single images orbfor bursts - collect examples for each class, such as
onandoffand don't forget to collectbackgroundimages as well - repeat from slightly different but still realistic viewpoints, also consider the lighting
This is the more natural capture path for checkpoint-specific classification tasks than the fixed OAK view.
The main USB-camera collection tool is pt_camera_teleop.py.
Use it when you want live pan/tilt control, rover movement, and image capture from the USB camera in one interactive loop.
Typical usage:
python3 source/pt_camera_teleop.pyBy default, teleop updates a lightweight preview/viewer set under:
output/mission_preview/latest.jpgoutput/mission_preview/latest.jsonoutput/mission_preview/view_latest.html
Open view_latest.html in a browser if you want a continuously refreshing live view while teleop is running. If you, like I, are using Visual Studio Code for development, you can install a light-weight HTML-viewer to have a live video feed (live in this case means 0.5-1 FPS depending on settings).
Important controls from the script:
Pan & Tilt Camera:
itilt upktilt downjpan leftlpan right
Rover Control:
eforwarddbackwardsturn leftfturn right
Capture images by:
psave one timestamped snapshotbsave a burst of snapshots
Additional commands:
xstop rover nowccenter the gimbalqquit
While teleop is running, latest.jpg is refreshed automatically and view_latest.html polls latest.json for updates. Burst capture is especially useful for collecting multiple labeled examples quickly.
This project intentionally avoids ROS navigation frameworks in order to reduce system complexity and improve reproducibility on embedded hardware platforms.
Instead, navigation is based on:
- encoder teach-and-replay motion
- LiDAR checkpoint signature verification
- AprilTag alignment for final positioning
This produces stable and repeatable checkpoint inspection behaviour with minimal infrastructure requirements.
During the teach phase:
- rover is manually driven between checkpoints
- encoder ticks are recorded
- LiDAR checkpoint signatures are captured
- checkpoint orientations are stored
These recordings define the inspection route.
Use keyboard_cp_teach.py to record a taught route:
python3 source/keyboard_cp_teach.py --record-route --route-out /home/ws/EI_VAD/data/route_teach_latest.yamlControls:
eforwarddbackwardsturn leftfturn rightxorspacestoprrun alignccapture checkpointttoggle route logging on or offqquit and save
Recording flow:
- Start the script with
--record-route. - Manually drive the rover with
e,d,s, andf. - At each checkpoint, stop and press
c. - Enter the checkpoint name when prompted, for example
cp1. - Optionally enter
orient_right_ticksfor that checkpoint, or leave it blank. - Repeat for all checkpoints, then press
qto save the route.
What is saved:
- route YAML at
--route-out, default/home/ws/EI_VAD/data/route_teach_latest.yaml - dense route-signature sidecar JSON
- route
motions,eventswithcp_capture, andlidar_samples
Useful note:
- teach defaults are
--speed 0.08and--turn-speed 0.08; matching replay speed to teach speed helps preserve route geometry
Use the current mission runner:
python3 /home/ws/EI_VAD/cp_mission/run_cp_mission.py --config /home/ws/EI_VAD/cp_mission/mission.yamlFor direct route replay without full mission orchestration:
python3 source/replay_taught_route.py --route /home/ws/EI_VAD/data/route_teach_latest.yamlDuring autonomous execution:
- encoder motion replay moves rover toward checkpoint
- LiDAR signature verifies checkpoint identity
- AprilTag detection corrects final orientation
- RGB inspection image captured
- Edge Impulse inference executed
- anomaly score stored
Live view from inside VS Code:
Once you have collected some images, in this case from the OAK-camera, it's time to build the model in Edge Impulse. I recommend that you develop in iterations, start with a hundreds images or so, train, and test the model. Most probably you need to collect more images, but in some cases you might even decide to start over if you made some mistakes or the model does not perform well, or at all.
I've found a resolution of 160x160 to work well on the RPi5.
After the impulse is saved, select Image from the left hand menu, then Save parameters, and finally Generate features
Set up the model like below, I found the EfficientNet V2B0 to be optimal for this project, then click Save & train.
As mentioned, in this project the USB-camera is used to check if the stove is on or off. So this is a typical classification project with only three classes: on, off, and background (no_lights).
Also here I went for the sweet spot on a RPi 5 with a resolution of 160 x 160.
Note: As the stove lights are close to the left side of the panel, they were in many cases cropped out when using the default Fit shortest axis resize mode. By changing to Squash this issue got resolved.
After the impulse is saved, select Image from the left hand menu, then Save parameters, and finally Generate features
As I did not get sufficient accuracy by using a MobileNetV2 model, I again tested with an EfficientNet model and got very good results even with the lowest B0 model, and perfect results with B2. As the RPi 5 is not a really constrained device, why not use its full capacity.
To be able to use the models on the rover, you of course need to deploy them.
- Navigate to the
Deploymentmenu - The two VAD-projects were deployed as EIM-binary files
- If you want, you can test inferencing on your smartphone by scanning the QR-code. However, as the camera optics are different compared to the rover cameras, don't be surprised if the results are not identical.
- The classification model was deployed as a Tensorflow Lite file via the
Dashboard, using the full float32 file.
- Regardless of the model used, you'll end up with one file per model. Copy these to the RPi.
- Check the
mission.yamlfile for how the models are connected to each checkpoint. As an example, the extract below shows how the stove on/off classification model is connected to checkpoint 2.
...
model_path: /home/ws/EI_VAD/models/cp2.lite
tflite_python: /home/ws/ei_tflite/bin/python3
tflite_norm: none
precrop: stove_panel
crop_config: /home/ws/EI_VAD/config/stove_indicator_example.json
crop_mode: panel
expand_left: 0.32
expand_right: 0.12
expand_top: 0.32
expand_bottom: 0.18
...
Use the mission runner:
python3 /home/ws/EI_VAD/cp_mission/run_cp_mission.py --config /home/ws/EI_VAD/cp_mission/mission.yamlThe mission flow can publish preview updates to the same viewer pattern:
output/mission_preview/latest.jpgoutput/mission_preview/latest.jsonoutput/mission_preview/view_latest.html
Open view_latest.html during a run if you want to monitor the latest mission image, metadata, and overlays.
Expected behaviour:
- CP1 reached
- signature verified
- AprilTag aligned
- image captured
- anomaly or classification score reported
As examples from a real mission run, the model detected anomalies when the doors were ajar. The text Anomaly = True is superimposed on the live pictures, and below the picture you'll also see the anomaly score.
Similarly as with the VAD-checkpoints, you'll also clearly see the result of the classification model. In this case the stove was indeed on as can be seen from the raw image.
To give some food for thoughts, this system can be adapted for:
- facility inspection robots
- warehouse monitoring checkpoints
- home safety patrol robots
- industrial inspection routines
- energy infrastructure condition monitoring
You might wonder why I didn't use the ROS navigation framework. That was actually the initial plan, and I did spend endless hours in getting it to work consistently every time. In the end I was fighting both software and hardware, the latter in the sense that I suspect the Sony Murata 18650 Li-Ion batteries do not always provide enough amperage, even if they are rated at 30A. The rover has many subsystems with servos, motors, LiDAR, cameras, ESP32, and Raspberry Pi 5, so there will from time to time be large current needs. Random issues occurred too frequently, leading to crashes, reboots, and restarting Docker images.
Eventually I decided to try developing a much simpler navigation method in order to improve reproducibility and reduce system complexity. However, this approach also introduces some limitations compared to full robotic navigation frameworks.
Advantages of this approach:
- significantly simpler system setup
- fewer dependencies
- deterministic checkpoint behaviour
- well suited for fixed-route inspection tasks
- easier reproducibility for teaching and experimentation
Limitations compared to ROS navigation frameworks:
- requires a teach phase before operation
- does not support dynamic path planning
- limited obstacle avoidance compared to SLAM-based navigation
- route flexibility is lower
- environment changes may require re-teaching checkpoints
- not intended for general-purpose autonomous navigation
When ROS navigation may be preferable:
- large environments
- changing environments
- multi-room navigation
- map-based localization requirements
- dynamic obstacle-rich scenarios
- multi-robot coordination systems
For repeatable checkpoint inspection missions, however, the lightweight teach-and-replay approach provides a practical and robust alternative with much lower infrastructure requirements.
This project shows that a small rover can do useful inspection work without a full navigation stack. By combining teach-and-replay driving, LiDAR checkpoint verification, AprilTag alignment, camera capture, and local Edge Impulse inference, the rover can move between fixed checkpoints, stop in a repeatable pose, and inspect what it sees in a consistent way.
In the proof-of-concept mission, that approach worked well in practice. The door checkpoints could be checked with visual anomaly detection, and the stove checkpoint could be handled with a simple classification model. The result is a system that is easier to understand, easier to debug, and easier to adapt than a much heavier robotics setup, while still being capable enough for real checkpoint-based inspection tasks.
The main benefit is not that this replaces general-purpose robot navigation. It is that for fixed routes and clearly defined inspection points, a simpler system can often be the better tool. It keeps the hardware and software demands reasonable, makes iteration faster, and lowers the barrier to building something that is genuinely useful. For that kind of work, this approach is a solid place to start.


















