Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WLED Crashes and Reboots #3609

Closed
1 task done
Doyle4 opened this issue Dec 22, 2023 · 28 comments
Closed
1 task done

WLED Crashes and Reboots #3609

Doyle4 opened this issue Dec 22, 2023 · 28 comments
Labels
bug confirmed The bug is reproducable and confirmed fixed in source This issue is unsolved in the latest release but fixed in master

Comments

@Doyle4
Copy link

Doyle4 commented Dec 22, 2023

What happened?

Changing colours can cause WLED to crash and reboot (Lost connection to device warning).

To Reproduce Bug

Crashed on me when using Bouncing Balls effect and selecting a new colour.
Crashed when Aura was selected and tried to increase brightness in the effect settings slider.
Meteor Smooth Crashed when changed colour

Expected Behavior

Not to crash and reboot and effect colour changes

Install Method

Self-Compiled

What version of WLED?

0.14.1.b1

Which microcontroller/board are you seeing the problem on?

ESP8266

Relevant log/trace output

No response

Anything else?

Downgraded to Gold Release of 14.0 and everything working as expected.
Says I self compiled which I didn't unless that means also downloading the file from here and installing, wasn't edited in anyway nor did I install it via the web installer.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Doyle4 Doyle4 added the bug label Dec 22, 2023
@blazoncek blazoncek added the cannot reproduce Developers are not able reproduce. Might be fixed already, or report is missing important details label Dec 22, 2023
@blazoncek
Copy link
Collaborator

Sorry, cannot reproduce.
Please use debug build and post crash dump.

@softhack007
Copy link
Collaborator

softhack007 commented Dec 22, 2023

🤔 the common part of this problem description is performing UI actions (changing color, change global brightness) on 8266. Depending on how the UI sliders were used ("dragged" or "tapping") this could create a load of WS messages.

We have a known issue on 8266 when many UI events are received in short time, and memory is low. Maybe these problems are related: #3443, #3458, #3382, #3492

@Doyle4
Copy link
Author

Doyle4 commented Dec 22, 2023

I just find it strange how downgrading has fixed it, updating to latest beta release same issue occurs.
Will try the debug build and see if it happens and post any logs.

@ihavenonick
Copy link

I can confirm the problem.
If I want to create a preset for candle multi, for example, and then change the colors or brightness, the D1 mini (ESP8266) loses the connection. If that still works, you can't start the preset with the quick load button.

This problem only occurs in 0.14.1-b1, it is not a problem with 0.14.0.

@blazoncek
Copy link
Collaborator

Pleas check available heap prior to crash and report.
It would also be helpful if you'd post configuration (cfg.json) and presets (preset.json)

@ihavenonick
Copy link

@blazoncek
Copy link
Collaborator

I was able to reproduce, though it is sporadic and is not consistently reproducible.
It looks like temporary RAM depletion.

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (3):
epc1=0x40102ac5 epc2=0x00000000 epc3=0x40103428 excvaddr=0x4002b9b1 depc=0x00000000

LoadStoreError: Processor internal physical address or data error during load or store
  epc1=0x40102ac5 in umm_malloc_core at umm_malloc.cpp:?
  epc3=0x40103428 in _notifyPWM at core_esp8266_waveform_pwm.cpp:?

>>>stack>>>

ctx: cont
sp: 3ffffd30 end: 3fffffd0 offset: 0150
3ffffe80:  31353101 00003933 3ffffe76 3ffe8758  
3ffffe90:  3fff3000 402552f0 00000020 40102d4c  
3ffffea0:  3fff3000 3fff228c 00000010 40266c93  
3ffffeb0:  3ffe9d39 3fff2e60 3fff353c 402552f0  
3ffffec0:  4026357c 3ffe9d37 3fff353c 3ffe8758  
3ffffed0:  00000763 00000005 3fff353c 40264b74  
3ffffee0:  3fff3000 3fff2e60 3fff353c 4024d215  
3ffffef0:  3fff41ec 00000763 fe46a8c0 3fff26ec  
3fffff00:  3fff045c 3fff2c6c 3fff045c 4023efe1  
3fffff10:  00003b58 00000000 153f7ced 00000763  
3fffff20:  3fff0300 402704d8 8a46a8c0 3fff3750  
3fffff30:  3fff3000 00000001 3fff2ffc 402379c8  
3fffff40:  3fffdad0 00000000 3fff3724 40237a0e  
3fffff50:  3fff0448 00000000 3fff2f64 3fff3750  
3fffff60:  3fffdad0 00000000 3fff3724 40247f74  
3fffff70:  402704d8 8a46a8c0 3fff353c 40264b74  
3fffff80:  00000000 0000000e 0009b538 3fff0300  
3fffff90:  00000000 0011001f 4026e0b0 3fff3750  
3fffffa0:  3fffdad0 00000000 3fff3724 3fff3750  
3fffffb0:  3fffdad0 00000000 3fff3724 402670a0  
3fffffc0:  feefeffe feefeffe 3fffdab0 40101f01  
<<<stack<<<

0x402552f0 in AsyncWebSocket::makeBuffer(unsigned int) at ??:?
0x40102d4c in malloc at ??:?
0x40266c93 in operator new(unsigned int) at ??:?
0x402552f0 in AsyncWebSocket::makeBuffer(unsigned int) at ??:?
0x4026357c in HardwareSerial::write(unsigned char const*, unsigned int) at ??:?
0x40264b74 in Print::println(unsigned int, int) at ??:?
0x4024d215 in sendDataWs(AsyncWebSocketClient*) at ??:?
0x4023efe1 in NetworkClass::isConnected() at ??:?
0x402704d8 in StreamNull::~StreamNull() at ??:?
0x402379c8 in updateInterfaces(unsigned char) at ??:?
0x40237a0e in handleTransitions() at ??:?
0x40247f74 in WLED::loop() at ??:?
0x402704d8 in StreamNull::~StreamNull() at ??:?
0x40264b74 in Print::println(unsigned int, int) at ??:?
0x4026e0b0 in std::_Function_handler<void (ota_error_t), WLED::setup()::{lambda(ota_error_t)#2}>::_M_manager(std::_Any_data&, std::_Function_handler<void (ota_error_t), WLED::setup()::{lambda(ota_error_t)#2}> const&, std::_Manager_operation) at wled.cpp:?
0x402670a0 in loop_wrapper() at core_esp8266_main.cpp:?
0x40101f01 in cont_wrapper at ??:?


--------------- CUT HERE FOR EXCEPTION DECODER ---------------

If you do not need websockets and/or MQTT please compile ESP8266 version without websockets and MQTT to free some RAM.

@blazoncek blazoncek removed the cannot reproduce Developers are not able reproduce. Might be fixed already, or report is missing important details label Dec 23, 2023
@Doyle4
Copy link
Author

Doyle4 commented Dec 24, 2023

Forgot to mention I'm using a D1 Mini also as I see someone mentioned and managed to reproduce.
Will compile with latest beta without websockets etc.

Thanks to all looking into this.

@jcPOLO
Copy link

jcPOLO commented Dec 24, 2023

it happened to me with an ESP32 too. I will try to give more information in a few days.

@fribse
Copy link

fribse commented Dec 25, 2023

I'm seeing this as mentioned in #3613, this is on a brand new d1 mini esp32, mounted in a DigUno with a 5V 6A PSU.
I rebuilt the config, so that's brand new, but my presets were backed up and restored from the previous esp8266.
I've attached my presets here:
wled_presets_Christmas tree.json

@blazoncek
Copy link
Collaborator

If you want a speedy resolution, get a debug build and post crash dump something similar to above.

blazoncek added a commit that referenced this issue Dec 25, 2023
@fribse
Copy link

fribse commented Dec 25, 2023

Hi @blazoncek
I don't have any urgency, just trying to help the little I can, I don't have time to do proper debug on this as well, too many projects already. I just tried going to factory reset, and then I added glitter for the 150 LED's, and as soon as I slowed it way down it crashed.
I look forward to the b2 of the firmware...

@willmmiles
Copy link
Contributor

I ran in to an issue just like this yesterday and tracked it back to the Segment backup copy in deserializeSegment incorrectly free'ing the original Segment's FX data, resulting in a use-after-free that corrupted the heap -- ie. exactly the issue @blazoncek just posted a fix for with 5ebc345. I was going to send a PR today but it looks like it's already been taken care of. I can confirm that that patch fixed it for me.

@blazoncek
Copy link
Collaborator

@willmmiles you seem capable. care to help?

@willmmiles
Copy link
Contributor

Sure, what do you need? As far as I can tell, the patch you've written does the trick for fixing the heap crashes following UI config updates.

@zigomatichub
Copy link

On d1mini, 50led ws2801, power on.
Effect is candle multi via a pre-configured preset
Change color to another color
Then crashing.
That may help to reproduce.

@blazoncek
Copy link
Collaborator

Sure, what do you need?

Nothing in particular but we'd need people that understand the code. There are plenty of TODOs in the code.
Contact me on Discord if you have time to spare for WLED.

@Ucsus
Copy link

Ucsus commented Dec 26, 2023

Same problem.

  • Nodemcu (esp8266)
  • WLED 14.1-b1
  • Animated Staircase usermod
  • WS2811
  • 154 LEDs
  • PIR sensor
  • Ultrasound PIR sensor
    When switching manually to some effects it reboots.
    In addition, there is a problem with segments, I need 11 segments, but if I install more than 10, the error “connection failed” appears and the segments and presets are reset. I also don’t see the device sensors in Home Assistant for mqtt

@blazoncek
Copy link
Collaborator

@zigomatichub @Ucsus please read above. Fix has been committed.

@blazoncek blazoncek added confirmed The bug is reproducable and confirmed fixed in source This issue is unsolved in the latest release but fixed in master labels Dec 26, 2023
zanhecht pushed a commit to zanhecht/WLED that referenced this issue Dec 27, 2023
@orichienal
Copy link

hi
got same problem with esp8266 d1 mini and Candle Multi with red and orange.
Where can i find a debug bin to check

thx

@softhack007 softhack007 added this to the 0.14.1 candidate milestone Dec 28, 2023
@Doyle4
Copy link
Author

Doyle4 commented Dec 29, 2023

hi got same problem with esp8266 d1 mini and Candle Multi with red and orange. Where can i find a debug bin to check

thx

This has been fixed in latest master source, so debug not needed, debug firmware is the same, just used to log data so can find the issue more easily. You need to download the latest source file and compile the firmware yourself. If not able to create your own, downgrade to Master 0.14.0 release until 0.14.1 Master is released. :)

@fribse
Copy link

fribse commented Dec 29, 2023

Where can i find a debug bin to check

Get the B2, it looks like it's fixed

softhack007 pushed a commit to MoonModules/WLED that referenced this issue Dec 29, 2023
@orichienal
Copy link

Got the B2 installed and let it run the whole night with candle multi and its still "burning", nice
great job and thanks for the work

@orichienal
Copy link

I think the joy was premature and too great, but unfortunately I still have the problem, now it takes longer until it occurs, but after a certain time it restarts and then lights up in standard orange.
Can anyone else confirm this?

@willmmiles
Copy link
Contributor

I think the joy was premature and too great, but unfortunately I still have the problem, now it takes longer until it occurs, but after a certain time it restarts and then lights up in standard orange. Can anyone else confirm this?

I'm also observing occasional reboots on my ESP8266 setup. I think it's a different issue, though. I've been trying to pin it down for about a week -- it's definitely not related to the FX or transition logic, nor is it a heap exhaustion issue (I've enabled the allocator instrumentation to be sure) -- though applying heap pressure does seem to make it more likely to occur, which leads me to believe we might be looking for another use-after-free somewhere in the network layers.

@blazoncek
Copy link
Collaborator

Thanks @willmmiles for troubleshooting.

ATM I am afraid all of the bells and whistles we added to 0.14 may be a bit too much (paired with newer ESP core needed for 0.14) for poor ESP8266. If possible switch to ESP32 or use WLED 0.13.3 for the time being.

@willmmiles
Copy link
Contributor

I'm beginning to suspect it's an internal bug in the newer ESP core. Still investigating; these "hard wdt" crashes don't have much to say on the console, can take a long time to reproduce, and I'm still learning how to elicit more useful debugging information.

@blazoncek
Copy link
Collaborator

@willmmiles We can see heap corruption (#3641 ) on some ESP8266 which happens somewhere in TCP code. I'd be glad if your expertise can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug confirmed The bug is reproducable and confirmed fixed in source This issue is unsolved in the latest release but fixed in master
Projects
None yet
Development

No branches or pull requests

10 participants