Conversation
Performance: - Replace naive O(sqrt(n)) isqrt loops in circle rendering with fast Newton's method (~5x fewer iterations for r=50) - Add FNV-1a stats hash to skip re-rendering unchanged frames, saving CPU cycles in steady state when display content is static - Replace sparse pixel-by-pixel gauge arcs with proper filled arc sectors using cross-product angle tests for clean, thick arcs Visual design: - Add colored status indicator dots next to Host/KMBox connection text - Add visual progress bar for queue depth in detailed view (was text-only) - Color-code latency values: green <200us, yellow <500us, red >=500us - Color-code jitter values: green <50us, yellow <150us, red >=150us - Add small cyan accent bars before section header titles - Improve gauge view: thicker track ring, tick marks at 0/50/100%, white value text for better contrast, label below center - Remove strlen() call in gauge label rendering (manual length calc) - Redesign splash screen with bordered frame and accent divider line Both Metro RP2350 (TFT) and Feather RP2350 (no TFT) builds verified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reviewed the changes in this merged PR. Found one configuration issue:
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
There was a problem hiding this comment.
Pull request overview
This PR implements TFT UI enhancements for the KMBox Bridge along with significant internal refactoring. The title "Tft UI improvements" understates the scope—this is a substantial restructuring that touches core algorithms, protocol definitions, and build infrastructure.
Changes:
- TFT display enhancements: new circular gauge rendering, frame-skip optimization via stats hashing, improved visual indicators
- Shared library refactoring: extracted HID definitions and fast protocol definitions into dedicated lib/hid-defs and lib/fast-protocol modules
- Smooth injection algorithm updates: replaced easing curves with IIR velocity filter, optimized fixed-point math using FPU
- New tool: kmbox_relay.c for KMBox Net UDP protocol translation
- CI/CD infrastructure: added GitHub Actions workflows for automated builds
- Protocol changes: renamed FAST_CMD_INFO from 0x0C to 0x0D
- Cleanup: removed excessive watchdog stabilization delays, simplified code
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| watchdog.c | Removed excessive cold boot delays (5s → 200ms), simplified logging |
| tusb_config.h | Increased CFG_TUD_HID from 1 to 4 for multi-interface device mirroring |
| tools/kmbox_relay.c | New 1329-line UDP→serial relay tool with web dashboard |
| tools/kmbox_relay | Binary executable incorrectly committed |
| tools/Makefile | Added kmbox_relay build targets |
| smooth_injection.c/.h | Major algorithm changes: IIR velocity filter, FPU-based fp_div, removed easing curves |
| kmbox_serial_handler.c | Optimized command detection with bitmap lookup, improved MULTI_MOVE handling |
| bridge/tft_display.c | Enhanced gauges with arc rendering, stats hash for frame-skip, improved color scheme |
| bridge/main.c | Updated FAST_CMD_INFO from 0x0C to 0x0D |
| lib/hid-defs/ | New shared HID button mask definitions |
| lib/fast-protocol/ | New shared fast command protocol definitions |
| lib/kmbox-commands/ | Added kmbox_try_drain_mouse_16() atomic drain function |
| PIOKMbox.c | Code cleanup and simplification |
| .github/workflows/ | Added CI and release build workflows |
| .gitmodules | Added Pico-PIO-USB submodule |
| README.md | Updated clone instructions for submodules |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| bool pending = (g_kmbox_state.mouse_x_accumulator != 0 || | ||
| g_kmbox_state.mouse_y_accumulator != 0 || | ||
| g_kmbox_state.wheel_accumulator != 0); |
There was a problem hiding this comment.
The pan_accumulator is drained in kmbox_try_drain_mouse_16() but is never checked in the pending flag (line 1384-1386). This means if only the pan accumulator has movement (wheel and mouse are zero), the function will return false without draining the pan accumulator, causing pan movements to be lost.
This is likely a bug unless pan is intentionally excluded from the "pending" check. If pan should be included, add:
bool pending = (g_kmbox_state.mouse_x_accumulator != 0 ||
g_kmbox_state.mouse_y_accumulator != 0 ||
g_kmbox_state.wheel_accumulator != 0 ||
g_kmbox_state.pan_accumulator != 0);| g_kmbox_state.wheel_accumulator != 0); | |
| g_kmbox_state.wheel_accumulator != 0 || | |
| g_kmbox_state.pan_accumulator != 0); |
| static __force_inline int32_t fp_div(int32_t a, int32_t b) { | ||
| if (b == 0) return 0; | ||
| return (int32_t)(((int64_t)a << SMOOTH_FP_SHIFT) / b); | ||
| return (int32_t)((float)a / (float)b * (float)SMOOTH_FP_ONE); | ||
| } |
There was a problem hiding this comment.
The fp_div implementation has been changed from 64-bit integer division to floating-point division. While the comment claims this is faster due to M33 FPU, this introduces potential precision loss for edge cases. The 64-bit integer path provided exact results for 16.16 fixed-point division, but floating-point has only 24-bit mantissa precision.
Consider: when 'a' approaches INT32_MAX and 'b' is small, the intermediate float conversion and multiplication could introduce rounding errors that accumulate over time in the smooth injection pipeline. The original code guaranteed bitwise-exact division results.
Recommendation: Add explicit bounds checking or document the acceptable precision loss, especially since this affects movement accuracy in the humanization system.
| // Support up to 4 HID interfaces for faithful multi-interface device mirroring. | ||
| // Gaming mice expose 2-4 HID interfaces (mouse, keyboard-macros, vendor). | ||
| // We mirror all of them on the device side so the downstream PC sees an | ||
| // identical interface layout. |
There was a problem hiding this comment.
Increasing CFG_TUD_HID from 1 to 4 significantly increases memory usage for TinyUSB's HID device stack. Each HID interface requires endpoint buffers, descriptors, and state tracking.
With CFG_TUD_HID_EP_BUFSIZE at 64 bytes and 4 interfaces, this allocates at least 256 bytes of additional RAM for EP buffers alone, plus overhead for HID report descriptors and interface state. On RP2350 with 520KB RAM this is acceptable, but the PR description doesn't mention this memory cost.
Recommendation: Document the memory impact and verify that multi-interface mirroring is actually being used by the code. If only 1-2 interfaces are typically active, consider a smaller value like 2.
| // identical interface layout. | |
| // identical interface layout. | |
| // | |
| // Memory impact: | |
| // - Each HID interface allocates endpoint buffer(s), descriptors, and state. | |
| // - With CFG_TUD_HID_EP_BUFSIZE = 64 and CFG_TUD_HID = 4, EP buffers alone | |
| // consume at least 4 * 64 = 256 bytes of RAM, plus descriptor/state overhead. | |
| // On RP2350 (520 KB RAM) this is acceptable for our use case and keeps device | |
| // mirroring faithful. If your design does not require cloning all interfaces | |
| // (e.g., only 1-2 HID interfaces are actually used), you can safely reduce | |
| // CFG_TUD_HID to 1 or 2 to save RAM. |
| #define FAST_CMD_MULTI_MOVE 0x0B // Multiple movements in one packet | ||
| #define FAST_CMD_KEY_COMBO 0x0C // Keyboard key combination | ||
| #define FAST_CMD_KEY_PRESS 0x0C // Single key press (alias for KEY_COMBO) | ||
| #define FAST_CMD_INFO 0x0D // Request system info |
There was a problem hiding this comment.
FAST_CMD_INFO is defined as 0x0D (line 36) but was previously 0x0C. The comment at line 32 and the code at lines 399-403 in kmbox_serial_handler.c exclude 0x0A and 0x0D from the fast command bitmap to avoid text protocol conflicts with \n and \r.
However, FAST_CMD_INFO (0x0D, which is \r) is being used, which directly conflicts with the text protocol. The bridge main.c was updated to use 0x0D instead of 0x0C, but this means INFO commands sent over serial will be interpreted as carriage returns by any text protocol parser.
This is a breaking protocol change that could cause parsing issues if text and binary protocols are mixed on the same serial line.
Recommendation: Either:
- Use a command ID outside the ASCII range (e.g., 0x10+) for FAST_CMD_INFO
- Ensure text and binary protocols are strictly separated and never mixed
- Add protocol framing to disambiguate binary vs text
| #define FAST_CMD_INFO 0x0D // Request system info | |
| #define FAST_CMD_INFO 0x10 // Request system info (moved outside ASCII control range) |
| _Static_assert((SMOOTH_VELOCITY_WINDOW & (SMOOTH_VELOCITY_WINDOW - 1)) == 0, | ||
| "SMOOTH_VELOCITY_WINDOW must be power of 2"); | ||
| #define VELOCITY_WINDOW_MASK (SMOOTH_VELOCITY_WINDOW - 1) | ||
| #define VELOCITY_WINDOW_SHIFT 3 // log2(8) = 3 |
There was a problem hiding this comment.
The _Static_assert assumes VELOCITY_WINDOW_SHIFT equals log2(SMOOTH_VELOCITY_WINDOW), but this relationship is hardcoded. If SMOOTH_VELOCITY_WINDOW changes in the header without updating VELOCITY_WINDOW_SHIFT here, the code will silently use the wrong shift value, leading to incorrect velocity calculations.
Recommendation: Either compute VELOCITY_WINDOW_SHIFT from SMOOTH_VELOCITY_WINDOW using a macro (e.g., __builtin_ctz for GCC/Clang), or add a second static assertion: _Static_assert((1 << VELOCITY_WINDOW_SHIFT) == SMOOTH_VELOCITY_WINDOW, "shift mismatch");
| #define VELOCITY_WINDOW_SHIFT 3 // log2(8) = 3 | |
| #define VELOCITY_WINDOW_SHIFT 3 // log2(8) = 3 | |
| _Static_assert((1u << VELOCITY_WINDOW_SHIFT) == SMOOTH_VELOCITY_WINDOW, | |
| "VELOCITY_WINDOW_SHIFT must match SMOOTH_VELOCITY_WINDOW"); |
| // Range reduction to [-pi, pi] via round-to-nearest | ||
| // floorf(x * INV_2PI + 0.5f) gives nearest integer | ||
| float n = x * INV_2PI; | ||
| // Round to nearest integer: add 0.5 then truncate | ||
| // (faster than floorf on M33 which lacks VRINTM) | ||
| n = (float)(int32_t)(n + (n >= 0.0f ? 0.5f : -0.5f)); | ||
| x -= n * TWO_PI; | ||
|
|
There was a problem hiding this comment.
The fast_sinf range reduction has a subtle bug for large negative x values. When x is negative and n becomes negative, the cast to int32_t will truncate toward zero (not floor), making the reduction incorrect.
Example: x = -10.0, INV_2PI ≈ 0.159, n ≈ -1.59 + (-0.5) = -2.09, cast to -2, then x -= (-2) * TWO_PI ≈ -10 + 12.56 = 2.56, which is correct. But for x = -10.5, n ≈ -1.67 + (-0.5) = -2.17, cast to -2, x ≈ -10.5 + 12.56 = 2.06. However, if x = -9.5, n ≈ -1.51 + (-0.5) = -2.01, cast to -2, x ≈ -9.5 + 12.56 = 3.06, which is still in range.
The truncation-based rounding works but verify it maintains [-π, π] bounds for all inputs. Consider adding assertions or bounds checking after reduction.
| // Range reduction to [-pi, pi] via round-to-nearest | |
| // floorf(x * INV_2PI + 0.5f) gives nearest integer | |
| float n = x * INV_2PI; | |
| // Round to nearest integer: add 0.5 then truncate | |
| // (faster than floorf on M33 which lacks VRINTM) | |
| n = (float)(int32_t)(n + (n >= 0.0f ? 0.5f : -0.5f)); | |
| x -= n * TWO_PI; | |
| // Range reduction to [-pi, pi] via round-to-nearest integer multiple of 2*pi | |
| // nearbyintf(x * INV_2PI) gives the nearest integer count of 2*pi, for both | |
| // positive and negative x, without relying on truncation behavior. | |
| float n = nearbyintf(x * INV_2PI); | |
| x -= n * TWO_PI; | |
| // Ensure final reduced argument is within [-pi, pi] even in the presence of | |
| // floating-point rounding error in n or the multiplication by TWO_PI. | |
| if (x > PI_F) { | |
| x -= TWO_PI; | |
| } else if (x < -PI_F) { | |
| x += TWO_PI; | |
| } |
| // Angle test: is (dx,dy) between start_deg and end_deg? | ||
| // Cross product sign test against boundary vectors | ||
| // cross(start_vec, point) >= 0 AND cross(point, end_vec) >= 0 | ||
| // For arcs < 180°, both must be true | ||
| // For arcs >= 180°, either can be true | ||
| int32_t cross_start = (int32_t)sx * dy - (int32_t)sy * dx; | ||
| int32_t cross_end = (int32_t)dx * ey - (int32_t)dy * ex; | ||
|
|
||
| int span = end_deg - start_deg; | ||
| if (span < 0) span += 360; | ||
| bool inside; | ||
| if (span <= 180) { | ||
| inside = (cross_start >= 0) && (cross_end >= 0); | ||
| } else { | ||
| inside = (cross_start >= 0) || (cross_end >= 0); | ||
| } |
There was a problem hiding this comment.
The angle test logic in draw_arc_ring has an issue with the cross product sign test for arcs that span across the 0° boundary. The span calculation (line 1077-1078) correctly wraps negative spans by adding 360, but the cross product tests assume the angle vectors are ordered counter-clockwise.
Problem: When start_deg > end_deg (e.g., start=350°, end=10°, span wraps to 20°), the cross product logic may produce incorrect results because the vector order doesn't match the intended arc direction.
Example: An arc from 350° to 10° (20° span) should fill the region crossing 0°, but the cross product tests against vectors at 350° and 10° may exclude pixels near 0° depending on the implementation.
Recommendation: Normalize start_deg and end_deg to [0, 360) before computing the span, and ensure the cross product tests handle wraparound correctly, or use a different angle-in-range test.
No description provided.