Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14.1 can't connect to router on 14.1 on ESP8266 Wemos D1 mini #3690

Open
1 task done
jclsn opened this issue Jan 15, 2024 · 80 comments
Open
1 task done

14.1 can't connect to router on 14.1 on ESP8266 Wemos D1 mini #3690

jclsn opened this issue Jan 15, 2024 · 80 comments
Labels
connectivity Issue regarding protocols, WiFi connection or availability of interfaces

Comments

@jclsn
Copy link

jclsn commented Jan 15, 2024

What happened?

After the upgrade to 14.1 WLED wasn't accessable anymore. Erasing the flash and flashing 14.0 solved the issue. Another upgrade to 14.1 resulted in broken wifi again. If you flash 14.1 manually, you can connect to the WLED-AP, but after you configure the router connection, things break.

To Reproduce Bug

Install 14.1 and connnect to existing network

Expected Behavior

WLED should connect

Install Method

Binary from WLED.me

What version of WLED?

14.1

Which microcontroller/board are you seeing the problem on?

ESP8266

Relevant log/trace output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jclsn jclsn added the bug label Jan 15, 2024
@blazoncek
Copy link
Collaborator

I would suspect WiFi issues (channel overlap, interference, etc).
The new Arduino core used in 0.14+ for ESP8266 is more susceptible to WiFi issues.
Try scanning for least used channel and force your AP to use that channel.
Some settings like BSS transition and/or fast roaming are known to cause issues as well.

@raufis27
Copy link

Same issue with nodemcu. 14.0 works fine. Upgrade brakes wifi connectivity (using Unifi network).

@blazoncek
Copy link
Collaborator

I will describe the issue I had with one of ESP8266 devices which was sitting 30cm from access point and was unreachable over WiFi. There was another ESP32-S2 with similar issues though that one was far from any AP (>10m outdoor).

With 0.13.x the ESP was reachable but was occasionally dropping in and out of WiFi. That was observable in UniFi controller (I use Ubiquiti UniFi with several APs and switches). The drops were rare and didn't last long.

After I updated it to 0.14 (somewhere in July 2023) I immediately noticed that the device was visible and connected to the network but I was unable to connect to UI. It would just stall loading it. I went ahead an purchased another AP (UAP-AC-M) and installed it about 30cm from the device as it was in an awkward spot where WiFi signal was poor. It didn't help.
I have several other WLED devices, one of them being the sync master. Whenever sync master sent notification packet the problematic device picked it immediately and all timed presets were triggering normally. That was a clear indication that the device itself was receiving WiFi signal (including NTP reponses) and when UI would load I could see signal strength to be almost perfect 100%.
As it was an outdoor device (which participated in Christmas display successfully) it wasn't until last week that I finally delved into the problem and solved it. Not by changing WLED code but by reconfiguring WiFi.

It turned out that I had to fix APs to have channel allocation permanent and dispersed as far as possible (the two APs on the same channel had to be physically furthest from each other), I also disabled BSS transitions and Fast roaming though these two didn't seem t have any real effect in my set-up.

After all my APs were set to fixed channels (1, 6 and 11) and APs using the same channel were separated as far as possible, all connectivity problems went away. Immediately.

@sl1txdvd
Copy link

I have the same or a similar problem and it is definitely not the same as the 14.0 Wifi issues.

With 14.0 I had connection problems that I thought came from a new device (which I had not used with WLED < 14), but maybe it's the new Arduino core then. Anyway I changed my Wifi settings to less crowded channels and it helped to some extent.

However, immediately after the update to 14.1 the device (now hanging on the wall controlling an LED strip, 50cm to the AP, working flawlessly so far) became unresponsive. The UI would load slowly or not at all. I needed to change something in LED settings and it was impossible to get there.

In the browser I was able to directly open the update section after a reboot and install 14.0. It took pretty long but eventually worked. After reboot everything worked again. So my problem is definitely something in 14.1.

My device is NodeMCU v2 (ESP8266)

@blazoncek
Copy link
Collaborator

For list of changes and possible causes see changelog and explanations in #3685.

As far as I can see there were no changes (except an option to force G mode on ESP8266) to networking. And the change of Arduino core happened somewhere in between 0.14.0-b1 and 0.14.0-b2 and not in 0.14.1.

#3526, #3502, #3496, #3484, #3487, #3445, #3466, #3296, #3382, #3312, #3593, #3490, #3573, #3517, #3561, #3555, #3541, #3536, #3515, #3522, #3533, #3508, #3622, #3613, #3609, #3632, #3566, #3665, #3672

@sl1txdvd what you are describing was exactly my case. Solved by WiFi reconfiguration. Unfortunately I do not know if we have any option to change WiFi operation from within WLED as we use standard approach which hasn't changed in a few years.

@sl1txdvd
Copy link

As I said, that solved my problems introduced by 14.0. Which works for me. But it does not solve the problems introduced by 14.1. As you say, Arduino core change was in 14.0. The same problem would not return in 14.1 then under the exact same circumstances while 14.0 is not affected. It's a different issue.

@sl1txdvd
Copy link

FWIW I just tried
14.0 working fine
14.0 to 14.1-b2 fast flash, fast reboot
14.1-b2 generally working, two timeouts
14.1-b2 to 14.1-b3 slow flash, one failed, the other was "successful" according to the UI
14.1-b3 device does not work at all, had to flash via USB

Back to 14.0, everything's fine.

@blazoncek
Copy link
Collaborator

Unfortunately I have no resolution for you but to stay on 0.14 or earlier.
FYI 0.15 builds on top of 0.14.1.

@willmmiles
Copy link
Contributor

@sl1txdvd Would you mind sending me a copy of your configuration, from the 'backup configuration' feature? I haven't been able to replicate your issue here on my test setup. I'd like to try with your specific settings.

@DibblesNL
Copy link

another one with the same issue. also tried a reflash but still failed. 4 other devices running 14.1 are ok. but one doesnt like it.

@sl1txdvd
Copy link

@willmmiles I try to attach the config here. I changed the ssid of my wifi.
I run a strip with 60 WS2812b LED on a 2.4A driver if that's relevant.

cfg.json
presets.json

@willmmiles
Copy link
Contributor

Thanks, I think that's helped pin it down, I've been able to reproduce the OTA issue here.

@blazoncek
Copy link
Collaborator

@sl1txdvd does anything change if you use DHCP?

@CjonesLAB
Copy link

0.14.1 is buggy
No Wireless

@Depechie
Copy link

Depechie commented Jan 20, 2024

As of 0.14.1 the ESP's with WLED are no longer able to connect to my UniFi AP's.
They are configured to broadcast 2.4 and 5GHz

Reverting to 0.13.3 fixes the issue

@ssirkakriss
Copy link

ssirkakriss commented Jan 20, 2024

Could this be a problem with UniFi AP's. in general? Or perhaps with having multiple UniFi AP's?

Because I seeing the same thing with versions >13.1 on all esp8266's . And I also have UniFi AP's. 2 older UAP-AC.
Edit: The AP's are 14 meter's apart -/+, channel's 6 And 11 with same SSID.

Although I managed one good build with 14.0 for a esp8266. But cannot I reproduce the build now for some reason.

@LordMike
Copy link

LordMike commented Jan 20, 2024

Running UniFi too. I don’t recall any issues on 0.14.0, but I’ve just reverted to it from 0.14.1 because of this thread, so I’ll take note from here on.

In my situation the ESP32 just disappears from my network after approximately 10 hours. It does not create its own network even after I changed it to create it on network disconnect and not just on boot w/o network.

There are certainly a lot of UniFi networks in this thread. I have 4 APs in my network. Two of them are definitely in range.

Bonus: i have three wled devices which should all be ESP32’s. Only one is experiencing issues on 0.14.1 (which is now downgraded to 0.14.0). Scratch that - I just checked my unifi logs, and it does not seem to indicate that the other two 0.14.1's aren't having problems. While they reconnect, shortly after, my problem device doesn't.

Unifi logs. In this view, wled-light-1-6 is my now downgraded device. wled-light-50-2 and wled-light-5-3 are "fine" on 0.14.1, as in, they reconnect. It does not seem that either of the devices had any issues prior to January 14th, where 0.14.1 was released. Iirc, I updated on the 15th.

@LordMike
Copy link

LordMike commented Jan 20, 2024

Dumping more info because I have a working/not working situation, so maybe something is useful.

The network

  • 4x Unifi AP's
  • SSID is hidden
  • SSID runs 2.4 Ghz only
  • WPA2 protection
  • Logs seems to go back ~14 days, so back to January 6th

wled-light-1-6

wled-light-50-2

  • Device: Quinled Dig Octa, ESP32.
  • Version: 0.14.1
  • Uptime: 1d 8h (did maintenance here at this time)
  • Issue: Seems to disconnect every once in a while but reconnect shortly after
  • Unifi logs: image - reconnects 1-20 minutes after disconnect
  • Config: wled_cfg_light-50-2.json

wled-light-5-3

  • Device: Athom WLED ESP32 Music Addressable LED Strip Controller, should be ESP32
  • Version: 0.14.1
  • Uptime: 5d 9h
  • Issue: Seems to disconnect every once in a while but reconnect shortly after
  • Unifi logs: image - this disconnect seems warranted, AP Terrace is very far from this device.
  • Config: wled_cfg_light-5-3.json - I just discovered that this device is using a different SSID, which is a mistake. I'll rectify this now.

@crestan
Copy link

crestan commented Jan 20, 2024

This issue is not isolated to UniFi, I am experiencing the same and I am running eero (6 Pro x2).

No issues with esp32 however all esp8266's required rolling back to 0.14.0.

@moskovskiy82
Copy link

Same here with mikrotik network

@damocles-dev
Copy link

damocles-dev commented Jan 24, 2024

Same bug here, WLED 14.1 is unusable using a hw-622 based relay board with official binary:

version 14.1:

root@ap-01:~# ping 192.168.1.170
PING 192.168.1.170 (192.168.1.170): 56 data bytes
64 bytes from 192.168.1.170: seq=0 ttl=255 time=191.526 ms
64 bytes from 192.168.1.170: seq=3 ttl=255 time=28.830 ms
64 bytes from 192.168.1.170: seq=4 ttl=255 time=100.870 ms
64 bytes from 192.168.1.170: seq=5 ttl=255 time=155.209 ms
64 bytes from 192.168.1.170: seq=6 ttl=255 time=205.415 ms
64 bytes from 192.168.1.170: seq=7 ttl=255 time=44.909 ms
64 bytes from 192.168.1.170: seq=9 ttl=255 time=456.499 ms
64 bytes from 192.168.1.170: seq=11 ttl=255 time=96.057 ms
64 bytes from 192.168.1.170: seq=12 ttl=255 time=169.681 ms
64 bytes from 192.168.1.170: seq=13 ttl=255 time=222.849 ms
64 bytes from 192.168.1.170: seq=14 ttl=255 time=275.832 ms
64 bytes from 192.168.1.170: seq=17 ttl=255 time=80.973 ms
64 bytes from 192.168.1.170: seq=18 ttl=255 time=133.357 ms
64 bytes from 192.168.1.170: seq=19 ttl=255 time=186.415 ms
^C
--- 192.168.1.170 ping statistics ---
28 packets transmitted, 14 packets received, 50% packet loss
round-trip min/avg/max = 28.830/167.744/456.499 ms

version 14.0:

root@ap-01:~# ping 192.168.1.170
PING 192.168.1.170 (192.168.1.170): 56 data bytes
64 bytes from 192.168.1.170: seq=0 ttl=255 time=5.305 ms
64 bytes from 192.168.1.170: seq=1 ttl=255 time=3.813 ms
64 bytes from 192.168.1.170: seq=2 ttl=255 time=3.317 ms
64 bytes from 192.168.1.170: seq=3 ttl=255 time=4.153 ms
64 bytes from 192.168.1.170: seq=4 ttl=255 time=2.788 ms
64 bytes from 192.168.1.170: seq=5 ttl=255 time=10.441 ms
64 bytes from 192.168.1.170: seq=6 ttl=255 time=4.197 ms
64 bytes from 192.168.1.170: seq=7 ttl=255 time=14.618 ms
64 bytes from 192.168.1.170: seq=8 ttl=255 time=2.988 ms
64 bytes from 192.168.1.170: seq=9 ttl=255 time=22.101 ms
64 bytes from 192.168.1.170: seq=10 ttl=255 time=11.045 ms
64 bytes from 192.168.1.170: seq=11 ttl=255 time=5.834 ms
64 bytes from 192.168.1.170: seq=12 ttl=255 time=5.880 ms
64 bytes from 192.168.1.170: seq=13 ttl=255 time=3.900 ms
64 bytes from 192.168.1.170: seq=14 ttl=255 time=90.145 ms
64 bytes from 192.168.1.170: seq=15 ttl=255 time=4.071 ms
64 bytes from 192.168.1.170: seq=16 ttl=255 time=6.595 ms
64 bytes from 192.168.1.170: seq=17 ttl=255 time=9.376 ms
^C
--- 192.168.1.170 ping statistics ---
18 packets transmitted, 18 packets received, 0% packet loss
round-trip min/avg/max = 2.788/11.698/90.145 ms

@LordMike
Copy link

Chiming in to say I've lost contact with my now 0.14.0 device. So for those reporting 0.14.0 to also be an issue, I'm onboard there. I still can't say at present why my two 0.14.1's aren't showing the same inability to stay online. :/

@RenWal
Copy link

RenWal commented Jan 24, 2024

I've just observed this issue with a D1 Mini NodeMCU based on the ESP8266-12F. Fine on 0.14.0, unusable on 0.14.1. The device connects to my Unifi U6-Pro, but I can get close to no data through. It works intermittently for a few seconds, then nothing at all, this then repeats. It appears that turning off Fast BSS Transition helps a little, but it does not fully resolve the issue.

Interestingly, I have another non-Mini NodeMCU which initially showed the same issue on a Unifi U6-Mesh, but that cleared up after power cycling the device and the access point. I still see frequent reconnects, about every 30 minutes, but it's usable.

@roninniagara
Copy link

Chiming in. All my WLED's were on 13.3 - no issues.

Updated all of them to 14.1 today

They now ALL fail %100. All offline. An unplug/plug will have them connect for a few minutes, but then they all drop off again shortly.

Reflashing these will be a nightmare where some are located. :\

@LordMike
Copy link

LordMike commented Jan 25, 2024

Reflashing these will be a nightmare where some are located. :\
@roninniagara

If you've enabled the self-hosted AP, you could disable your own wifi for a while while you reboot them and then flash them via a phone/laptop.. I dread this situation :O

@blazoncek
Copy link
Collaborator

Please read the whole thread but pay attention to this post:
esp8266/Arduino#8950 (comment)

@roninniagara
Copy link

roninniagara commented Jan 26, 2024

Please read the whole thread but pay attention to this post: esp8266/Arduino#8950 (comment)

good post - but also 2 friends of mine updated using different boards and had to also roll back (we popped 14.1 on them to test)

same result.

And I'm sure lots of "set it and forget it" boards would have the same issue but won't be updated at all.

The ones i'm using have been stable for a VERY long time. zero drops. they also have antennas that clip on to improve signal.

Multiple people in here with various boards all having the exact same issue. I don't buy the "it's weak antenna" reasoning when those same boards worked fine for a very long time, then failed after update, to only work again perfectly once rolled back.

edit: taking a peek at other issues open, seems like MANY others are having many of the same issues with 14.1 - not just this thread.

@doronazl
Copy link

Please read the whole thread but pay attention to this post: esp8266/Arduino#8950 (comment)

Please read the whole thread but pay attention to this post: esp8266/Arduino#8950 (comment)

confused, the entire thread is full of people telling you they are having problem solely with 0.14.1, yet you keep pointing out wifi issues ?

@blazoncek
Copy link
Collaborator

There were no changes to networking in 0.14.1 but we did switch ESP8266 platform for 0.14.0 as required with newer NeoPixelBus (requiring newer C++ compiler).
As mentioned with above thread older cores (platform) allowed faulty hardware to perform adequately (don't ask me why as IDK) while it may have issues with newer core.

@doronazl there were no changes to wifi implementation in 0.14.1 compared to 0.14.0. Not a single one.

@LordMike
Copy link

LordMike commented Feb 4, 2024

@willmmiles my bisect ended at a commit of yours, I believe: cdc8640. It adds some locking for webserver replies, and at this commit, I can make the ESP32 "stop". Just prior to this commit, it processes commands "just fine".

I've also noted that prior to this commit, the devices seems more responsive. My automations set the colors on 4 segments, and previously I experienced all four segments updating at the same time. But lately, and on this commit (judging from Peek, because currently it isn't physically connected to lights), I can see each segment updating in turn.

I'll see if I can revert this commit on main, and see if my issue disappears.

I've pushed my changes here. LordMike@5eb1fe3. Interested parties can try out the builds. My device seems stable now, so I'll put it back where it came from and see if it can stay online for weeks.

@willmmiles
Copy link
Contributor

Interesting! I introduced that approach because the global JSON buffer was getting re-used while it was still serializing a reply, resulting in a use-after-free and crashes when the string data was overwritten the global buffer. It should (correctly!) have a side-effect of preventing additional updates from being processed until each reply is fully serialized, which might definitely slow things down. That would also introduce a risk of OOM events if many requests are being sent simultaneously.

@willmmiles
Copy link
Contributor

I take it that the queuing approach from the willmmiles-webserver did not help?

@LordMike
Copy link

LordMike commented Feb 4, 2024 via email

@blazoncek
Copy link
Collaborator

I was able to reproduce a disconnect, so I compiled WLED with one of the DEBUG levels (4)

Use -D WLED_DEBUG as well to get any meaningful output from WLED too.

@willmmiles
Copy link
Contributor

@willmmiles my bisect ended at a commit of yours, I believe: cdc8640. It adds some locking for webserver replies, and at this commit, I can make the ESP32 "stop". Just prior to this commit, it processes commands "just fine".

I've reinspected the code, and spotted a lock violation here - thanks for taking the time to isolate it. It's also identified a weakness in the queuing approach I'd been testing in that other branch. I'm working on the locking fix now.

willmmiles added a commit to willmmiles/WLED that referenced this issue Feb 4, 2024
GlobalBufferAsyncJsonResponse was still trying to init AsyncJsonResponse
with the shared document, even when it failed to acquire the lock.
This was still corrupting memory and causing crashes.

Possibly fixes Aircoookie#3690, Aircoookie#3685, and other 0.14.1 issues.
@willmmiles
Copy link
Contributor

PR #3643 fixes the issue I found by inspection. This seems to improve stability in my local tests, albeit the responsiveness is still problematic as the lock is held for long enough to collide with other requests.

@LordMike
Copy link

LordMike commented Feb 4, 2024 via email

@blazoncek
Copy link
Collaborator

Let me remind you that we are still talking about MCU and not a fully fledged web server handling 1000s requests/s. 😉
If you hit it hard, you'll bring it down.

@willmmiles
Copy link
Contributor

Is the lock only for the response content buffer?

Yes, and only in the case that the response wants to use the large shared JSON buffer. Static content and some smaller dynamic data don't use the lock.

Aren’t requests handled really fast? - it seems to me that maybe the request processing is slower than need be? I’d each request only modifies state, but doesn’t also wait for fex the lights to change their colors, they should be done in milliseconds? In any case.

Honestly, I'd expect the same, but the web server library code seems to be really inefficient. I'm looking at what can be done to improve it or replace it outright, there's a lot that could be done.

With the lock, the current big issue is that requests that contend are responded with a 'come back in 1s' http reply, which really slows things down. I'm hoping I can change that with the new queuing logic.

I experienced a different bug but slightly related. If I made enough concurrent requests, the device stopped responding, even on ”good” versions before this commit. I haven’t reproduced that one, as I imagine it’s likely “always” been there, so I’d just be chasing ghosts instead.

Yup, that selfsame web server code currently has no resource limits, so more than a couple concurrent requests will OOM an 8266 and cause crashes, hangs, or reboots. Working on that too!

@casesolved-co-uk
Copy link

I found I had to change my access point to use 20MHz bandwidth only and no issues with WiFi using v14.1 after that. 40MHz bandwidth is a 802.11n feature but is not supported by ESP8266. No idea about ESP32.

@LordMike @blazoncek @willmmiles

@blazoncek
Copy link
Collaborator

@casesolved-co-uk thank you for the insight.
As I mentioned earlier, connectivity issues are related to "improperly" configured WiFi.

I will leave the interpretation of "improperly" to each one.

@blazoncek blazoncek added connectivity Issue regarding protocols, WiFi connection or availability of interfaces and removed bug labels Feb 8, 2024
@JvDrunen
Copy link

JvDrunen commented Feb 8, 2024

That's not entirely through.
I have my wifi setup to the 20mhz band but stil after the upgrade the boards where unreachable. Downgrade solved it.

@raufis27
Copy link

raufis27 commented Feb 8, 2024

That's not entirely through. I have my wifi setup to the 20mhz band but stil after the upgrade the boards where unreachable. Downgrade solved it.

Same. In my case setting wifi to 20mhz or 40 doesn't fix the problem. In fact I have different generations of nodemcu esp8266 s working with 40MHZ and 4.1 version.

@LordMike
Copy link

LordMike commented Feb 8, 2024

Beware that this may be a symptom of other issues. I made the same mistake in the beginning by assuming it was a wifi thing. On my end, it was a fault/oom/crash thing that manifested as being unavailable.

So for some it may be wifi while for others it won’t.

@willmmiles I have not experienced any noticeable issues after the change I made. As I check the logs to see how long it’s been connected it has multiple stretches of 1-2 days each. It reconnects each time, which matches the behavior of the other devices I have.

@blazoncek
Copy link
Collaborator

It has been 3 weeks, many, many posts and I have yet to see a single debug output.

@jclsn
Copy link
Author

jclsn commented Mar 17, 2024

Seems to be magically fixed for me in 14.2. Can anyone else confirm this?

@LordMike
Copy link

LordMike commented Mar 17, 2024 via email

@jclsn
Copy link
Author

jclsn commented Mar 17, 2024

Okay, then it is at least not magic. Funny though, that someone with an esp32 complained about the same issue now.

@ASTDrones
Copy link

Seems to be magically fixed for me in 14.2. Can anyone else confirm this?

I have tried with a number of ESP8266 including D1Mini and still can't connect via WiFi and reverting back to 0.14.0 fixes the issue.

@blazoncek
Copy link
Collaborator

Try erasing flash.

@jclsn
Copy link
Author

jclsn commented Mar 20, 2024 via email

@jclsn
Copy link
Author

jclsn commented Mar 21, 2024

So WLED becomes unreachable over network again after some time. Rebooting it helps though. May have to do with HA automations, which I have configured for this one.

@Depechie
Copy link

Reverted again back to 0.13.3, 0.14.2 has the effect that my ESP8266 just keeps resetting ( to red light 50% )

@valkoh
Copy link

valkoh commented Apr 5, 2024

Is this safe to update now? Ive been back to 0.14.0 for a while now, to counter these issues. Its kinda okay even if i stay here for good, but HA keeps whining for an update... Also there are alternatives that i could use to make the simple animations i have on them, but im used to WLED and its been flawless before 0.14.1. All my rgb in the house uses the same controller.

@jclsn
Copy link
Author

jclsn commented Apr 5, 2024

I don't have issues anymore, but other people still do. Guess you'll have to try yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connectivity Issue regarding protocols, WiFi connection or availability of interfaces
Projects
None yet
Development

Successfully merging a pull request may close this issue.