Show ADC recovery instead of stuck zero#47
Conversation
2a86733 to
5a9b5e5
Compare
tadelv
left a comment
There was a problem hiding this comment.
Thanks for the work here. A few changes requested before merge.
Context note: per our internal HDS investigation, the field-reported stuck-zero is root-caused to a cold solder joint on U21 (raw pinned at 0xFFFFFF, DRDY clocking normally). In that case scale.update() keeps returning true and getSignalTimeoutFlag() never trips, and raw_weight is large (not zero), so neither of the two recovery paths added here will engage. The PR is still useful for genuine ADC sample-loss, but the PR body overstates coverage — please reword to match what the code actually catches.
Requested changes
1. t_tareByButton reuse for the quick-zero hold timeout (src/hds.ino:1110)
if (b_weight_quick_zero && millis() - t_tareByButton > QUICK_ZERO_HOLD_TIMEOUT) { ... }t_tareByButton is set only on the physical button path. If b_weight_quick_zero is ever set true by another path (BLE/USB tare, boot, programmatic tare) without also stamping t_tareByButton, the timestamp is stale (or 0) and the timeout fires immediately on the next loop — releasing the zero hold before it ever takes effect.
Please introduce a dedicated t_quickZeroStart and stamp it everywhere b_weight_quick_zero is set true, then compare against that instead.
2. Sentinel value -999.9 leaking onto BLE / ESP-NOW / USB / WebSocket
ADC_ERROR_WEIGHT = -999.9 is now published unconditionally over every wire protocol (ble.h:556, ble.h:672, espnow.h:15, espnow.h:63, usbcomm.h:588, hds.ino:1414). Receivers (Decent app, partner integrations, websocket clients) will render a literal -999.9 g unless they're updated in lockstep, and encodeWeight() may clamp/wrap the value into something even more confusing.
Please use a non-sentinel approach: either a separate "ADC error" status flag/byte in the packet, or suppress weight publishes entirely while b_adc_recovery_active is true and let the existing connection-loss handling on the receiver side take over. Whichever you choose, coordinate with the app team before merge.
3. Globals defined in parameter.h (include/parameter.h:160-161)
bool b_adc_recovery_active = false;
uint8_t i_adc_recovery_count = 0;Non-static definitions in a header. Works only because we currently have a single translation unit; it's an ODR landmine if the header is ever included from a .cpp. Please mark both static to match the surrounding pattern (e.g. b_stable_output_enabled).
4. i_adc_recovery_count only resets on a successful sample
Counter clears in the scale.update() branch only. Once it reaches ADC_ERROR_RECOVERY_COUNT, the OLED latches to ADC ERROR / CHECK SCALE and stays there across sleep/wake, USB re-init, and BLE reconnects until a new sample lands. Please reset the counter (and b_adc_recovery_active) on the same code paths that reinitialize the ADC stack — at minimum the charging-wake recovery path and any explicit scale.begin()/reset entry points.
5. Whitespace-only reformatting
Large regions of pureScale, loop, updateOled, and drawTime are reindented with no behavioral change. Please revert the whitespace-only churn so the diff (and future git blame) reflects only the real changes.
6. (char*) cast in drawAdcRecovery (src/hds.ino)
u8g2.drawStr(AC((char*)line1), AM() - 12, line1);The cast discards const to satisfy the AC() macro signature. Please change AC() to take const char* instead — the cast hides the real fix and the macro almost certainly doesn't mutate its argument.
Optional follow-up
While we're here: the cheapest improvement for the documented field bug is surfacing the raw ADC value in the Doctor report (test #5 in the HDS NWD note). Costs zero new firmware code and lets Doctor distinguish 0xFFFFFF (cold joint / open input) from 0x000000 (shorted) from a true mid-scale hang. Worth a separate PR.
5a9b5e5 to
fea4e16
Compare
…allers pass const char*
Summary
This PR hardens firmware-side ADC sample-loss and display-zero recovery paths. It does not address the separate U21 cold-solder failure mode where DRDY continues clocking and raw readings are pinned at
0xFFFFFF.Changes:
ADC RECOVER, thenADC ERROR/CHECK SCALEafter repeated recovery attemptsWhy
The existing firmware could make genuine ADC sample-loss and unresolved tare/filter states look like a legitimate
0.0greading. This PR makes those firmware-detectable states visible locally and avoids publishing misleading zero-weight updates remotely.Per review feedback, the field-reported stuck-zero case tied to a U21 cold solder joint is a different hardware failure mode. In that case the ADC can continue producing ready samples, so
scale.update()may keep succeeding and this PR's sample-loss recovery path will not trigger.Validation
platformio run