Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling syslog.log too early in the boot process prevents esp8266 from booting #7

Closed
pmannk opened this issue May 23, 2022 · 15 comments

Comments

@pmannk
Copy link

pmannk commented May 23, 2022

Hi there!
Firstly thanks for this great component, I've been using it to log time-based events and it's been very useful.

This evening I stumbled on a scenario where I believe esphome_syslog prevents an esp device from booting successfully.
In this scenario the esphome api, web_server and remote logger all stop functioning. Fortunately OTA is still available.
I have a script which is called by "on_boot" and under normal circumstances it never calls syslog.log until well after the esp boots, time syncs, api connects, etc.
Tonight the script tried to call syslog.log immediately after a reboot and that's where things went awry.

I use the following syslog configuration:

syslog:
  enable_logger: false
  ip_address: "192.168.x.x"
  port: 514

The following on_boot reproduces the problem:

esphome:
  on_boot:		  
    priority: 700.0
    then:
      - wait_until:
          wifi.connected:
      - syslog.log:
          level: 5
          tag: "ESPHome"
	  payload: "This will break things"

This on_boot also causes issues:

esphome:
  on_boot:		  
    priority: -100.0
    then:
      - syslog.log:
          level: 5
          tag: "ESPHome"
	  payload: "This will break things"

However this on_boot works fine:

esphome:
  on_boot:		  
    priority: 700.0
    then:
      - wait_until:
          time.has_time:
          #also working is api.connected:
      - syslog.log:
          level: 5
          tag: "ESPHome"
          payload: "This will work fine"

I can see you have return setup_priority::LATE; defined, so I'm not sure where else to go looking for the cause of this issue.
For the moment I have a workaround in place (testing for api.connected anywhere that I call syslog.log).

Thanks!

@TheStaticTurtle
Copy link
Owner

Humm this is interesting, I don't really know what's going on here.
Is there any error or it just stops working ?

@pmannk
Copy link
Author

pmannk commented May 25, 2022

I hooked up a serial connection tonight to see what's going on and it appears calling syslog.log early is creating a boot loop.
Eventually the device boots into safe mode which is why only OTA is working.

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0006dea0
@cp:B0
ld
[I][logger:242]: Log initialized
[C][ota:461]: There have been 0 suspected unsuccessful boot attempts.
[I][app:029]: Running through setup()...
[C][wifi:037]: Setting up WiFi...
[D][wifi:373]: Starting scan...
[D][wifi:388]: Found networks:
[I][wifi:431]: - 'xxxx-IoT' (C0:xxxx) ▂▄▆█
[D][wifi:433]:     Channel: 11
[D][wifi:434]:     RSSI: -31 dB
[I][wifi:248]: WiFi Connecting to 'xxxx-IoT'...
[I][wifi:505]: WiFi Connected!
[C][wifi:353]:   Local MAC: 9C:xxxx
[C][wifi:354]:   SSID: 'xxxx-IoT'
[C][wifi:355]:   IP Address: 192.168.xxxx
[C][wifi:356]:   BSSID: C0:xxxx
[C][wifi:358]:   Hostname: 'xxxx'
[C][wifi:360]:   Signal strength: -34 dB ▂▄▆█
[C][wifi:364]:   Channel: 11
[C][wifi:365]:   Subnet: 255.255.254.0
[C][wifi:366]:   Gateway: 192.168.xxxx
[C][wifi:367]:   DNS1: 192.168.xxxx
[C][wifi:368]:   DNS2: 0.0.0.0
[D][wifi:514]: Disabling AP...

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

Exception (28):
epc1=0x4021f1ba epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: cont
sp: 3ffffb30 end: 3fffffc0 offset: 0190
3ffffcc0:  00000004 3fff0b2c 000001da 40220f10
3ffffcd0:  3ffffda0 3fff0b84 3fff0b2c 40207765
3ffffce0:  00000000 00000000 3ffe8cc5 00000102
3ffffcf0:  00000000 3fff1560 00000202 00000102
3ffffd00:  3fff1544 3ffffd70 00000005 4021f25c
3ffffd10:  3fff1544 3ffffd70 00000005 4020a613
3ffffd20:  00000000 00000000 00000010 402345b0
3ffffd30:  00000000 3ffffd00 40228c80 00000000
3ffffd40:  3fff1560 00000202 3fff157c 3ffffd90
3ffffd50:  3fff0000 3ffffd90 00000010 402145e9
3ffffd60:  4853554d 00000005 3fff2194 40214646
3ffffd70:  3fff06cc 00000016 00000016 00000001
3ffffd80:  4025edb3 4bc6a7f0 3ffffd90 00000007 <
3ffffd90:  48505345 00656d6f 4bc6a7f0 00000102
3ffffda0:  3fff0fbc 0000000e 3fff2194 40212d33
3ffffdb0:  00000007 00000000 3fff142c 40226bbd
3ffffdc0:  3fff0fbc 0000000e 3fff142c 40212e60
3ffffdd0:  3ffffdd8 00000007 656d6974 0074756f
3ffffde0:  3fff0fbc 0000000e 3fff0fbc 4020a151
3ffffdf0:  3fff0fbc 00f95b18 3ffef93c 40212e78
3ffffe00:  3fff0fbc 0000000e 3ffef93c 4022694c
3ffffe10:  3fff05e8 3ffffe30 3ffef93c 40226a18
3ffffe20:  3fff0fbc 0000000e 3ffef93c 4021186d
3ffffe30:  0000000f 0000000f 3fff240c 40100fb2
3ffffe40:  0000003c 00000028 0000000a 4023467c
3ffffe50:  00000000 00000000 00000008 00000000
3ffffe60:  3fff0ee4 3fff142c 3ffef9f8 40214587
3ffffe70:  3fff11c4 00000017 0000001e 00000000
3ffffe80:  00000000 00000000 00000000 00000000
3ffffe90:  3ffffe98 0000000c 39775630 50476e72
3ffffea0:  39306832 00000000 00000000 00000000
3ffffeb0:  00000000 00000000 00000000 00000000
3ffffec0:  00000000 00000000 3ffffed0 0000000c <
3ffffed0:  4853554d 4d4f4f52 546f492d 00000000
3ffffee0:  00000000 00000000 3fff111c 0000001e
3ffffef0:  0000001e 00000000 00000000 00000000
3fffff00:  00000000 00000000 00000000 00000000
3fffff10:  00000000 00000000 00000000 00000000
3fffff20:  3fff142c 3fff2194 00000000 00000000
3fffff30:  00656d6f 73756d2e feefeffe 00000016
3fffff40:  00000000 00000000 00000000 00000000
3fffff50:  00000000 00000000 00000000 6f6f7268
3fffff60:  feef006d 3fffff6c 0000000d 3170746e
3fffff70:  73756d2e 6f6f7268 feef006d feefeffe
3fffff80:  3fff1b20 00000001 3fff2194 00000005
3fffff90:  feefeffe feefeffe feefeffe 3ffefd74
3fffffa0:  3fffdad0 00000000 3ffefd60 40222e84
3fffffb0:  feefeffe feefeffe 3ffe8640 401004a5
<<<stack<<<

--------------- CUT HERE FOR EXCEPTION DECODER ---------------

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0006e570
~ld
[I][logger:242]: Log initialized
............................

And after 10 unsuccesful boot attempts:

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0006e570
~ld
[I][logger:242]: Log initialized
[C][status_led:014]: Setting up Status LED...
[C][ota:461]: There have been 10 suspected unsuccessful boot attempts.
[E][ota:468]: Boot loop detected. Proceeding to safe mode.
[I][app:029]: Running through setup()...
[C][wifi:037]: Setting up WiFi...
......... usual safe boot follows from here

@TheStaticTurtle
Copy link
Owner

Thanks, I think I'll have time to look at it this weekend, hopefully I can find the cause

@pmannk
Copy link
Author

pmannk commented May 25, 2022

I think I've found the cause, but I'm not convinced I have a good solution.
It appears syslog.log is called before the syslog component setup actually completes.

I modified syslog_component.cpp to add some serial logging and commented out the syslog.log call as follows:

	Serial.println ("syslog.log called");
	Serial.print ("Payload: ");
	Serial.println (payload.c_str());
    // syslog.log(esphome_to_syslog_log_levels[level],  payload.c_str());

During the boot event I see the following:
... normal boot to this point, trimmed for brevity.
[21:55:47][D][wifi:514]: Disabling AP...
[21:55:47]syslog.log called
[21:55:47]Payload: on_boot test payload

[21:55:47][C][web_server:096]: Setting up web server...
[21:55:47][C][ota:085]: Over-The-Air Updates:
[21:55:47][C][ota:086]: Address: xxxx
[21:55:47][C][api:025]: Setting up Home Assistant API server...
[21:55:47][C][sntp:025]: Setting up SNTP...
[21:55:47]syslog.log called
[21:55:47]Payload: Syslog started
[21:55:47][I][syslog:032]: Started

[21:55:47][I][app:062]: setup() finished successfully!

@TheStaticTurtle
Copy link
Owner

My guess is that the Wi-Fi client is initialized but not actually connected yet. Causing the crash the moment it tries to send

@pmannk
Copy link
Author

pmannk commented May 25, 2022

I thought that might be the case too, however testing that theory just now (I included ESP8266WiFi.h and tested for WiFi.Status()) and it's returning a state of WL_CONNECTED.

I'm getting out of my depth at this point unfortunately.

@candeo911
Copy link

I believe I am experiencing something similar. After installing the code and using the below config my esp won't boot at all. Hooking up directly with serial, looks like the wifi cant be found anymore?? Stuck in a loop until the fallback AP kicks in then dies again. FYI - In the end I commented out the config to get the board to boot again

new 1.txt

@candeo911
Copy link

I believe I am experiencing something similar. After installing the code and using the below config my esp won't boot at all. Hooking up directly with serial, looks like the wifi cant be found anymore?? Stuck in a loop until the fallback AP kicks in then dies again. FYI - In the end I commented out the config to get the board to boot again

new 1.txt

Odd. I moved the syslog location after the wifi startup and now the board boots fine!! Thanks for an esphome syslog ability :-)

esphome:
  name: "security"

esp8266:
  board: nodemcuv2

# Enable Home Assistant API
api:

ota:
  

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Security Fallback Hotspot"
    password: "VP02iGJmSiRU"



# Enable logging
logger:

syslog:
  ip_address: "192.168.1.101"
  port: 1518
  min_level: DEBUG



captive_portal:


i2c:

@TheStaticTurtle
Copy link
Owner

Odd. I moved the syslog location after the wifi startup and now the board boots fine!! Thanks for an esphome syslog ability :-)

Weird, maybe the upload got corrupted at some point because the order of the YAML should have any impact as you need to specify the order manually in the component (See SyslogComponent::get_setup_priority())

@TheStaticTurtle
Copy link
Owner

@pmannk
Finally, got some time to fix this while waiting, for a 3D print

So fixing the boot loop was dead simple:
Using a priority of setup_priority::LATE and calling the log() function too early meant that the setup() function was not called yet, resulting in the udp_ pointer being rubbish, and you can probably guess what happened next 😅

Now, I changed the priority to setup_priority::AFTER_WIFI and added a check in the log() function to make sure that the pointer is not NULL but it still doesn't send the message, so I need to keep investigating, current guess is no internet

@TheStaticTurtle
Copy link
Owner

Ok, so I was wrong there is internet, and the logs are being sent if the priority is -100 (had a wrong filter on my server).
However setting the on_boot priority to 700 using a wait_util for wifi.connected still failed and here is what I think was going on:

  • Hardware is initialized (priority 800)
  • on_boot automation called (priority 700)
    • Waiting for wifi connected
    • Wifi is now connected (priority at 200)
    • syslog.log called (priority still at 200 meaning the component isn't setup)
  • Esphome is going throught priorities 600 to 220
  • Wifi is connected (priority 200)
    • setup() called in the api
    • setup() called in the component
  • Esphome is going throught the rest of the priorities

Here is what I think is going on when using api.connected :

  • Hardware is initialized (priority 800)
  • on_boot automation called (priority 700)
    • Waiting for api.connected
    • Wifi connected (priority at 220)
    • Api gets connected
    • EspHome goes to the next priority
    • syslog.log called (priority is no longer at 200 meaning the component is setup)
  • Esphome is going throught priorities 600 to 220
  • Wifi is connected (priority 200)
    • setup() called in the api
    • setup() called in the component
  • Esphome is going throught the rest of the priorities

To solve this, I choose to simply init WifiUDP in the constuctor and not in the setup() and add a check for connectivity in the log(). This doesn't really poses an issue because of the check before logging and WifiUDP doesn't generate an exception anyways if there is no internet.

But for some reason, WifiUDP still errors out with [E][WiFiUdp.cpp:183] endPacket(): could not send data: 12 when the node print the config, works fine after tho

Bit more work, commits one way

@TheStaticTurtle
Copy link
Owner

Error 12 is ENOMEM and this is what the esp32 docs have to say: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#limitations
This could be due to the the fact that the underlying lib is using multiple prints instead of one big formated one (Syslog.cpp#L261), but it's clear that it's because something is going to fast for the wifi tx buffer.

The thing is I don't have a esp8266 that I can plug on my computer right now but uploading the new version to an already running one seems to work even with a lot of messages. So I guess the udp implementation is better 🥳.

I'll add a warning for the esp32 because it's not something I can fix easily (would require either that the arcao/Syslog lib to actually realize that there is an issue or that esphome compiles with a bigger buffer and I'm not sure on how it will impact other things)

@TheStaticTurtle
Copy link
Owner

Here we go, this should be fixed, please let me know if there is still an issue. Sorry it took so long 😅

@candeo911
Copy link

works perfectly for me now!! Thanks for the great work :-) FYI, I downloaded the new code and put the config back to "normal" as below.

esphome:
  name: "security"

esp8266:
  board: nodemcuv2


# Enable logging
logger:

syslog:
  ip_address: "192.168.10.101"
  port: 1518
  min_level: DEBUG


# Enable Home Assistant API
api:

ota:
  

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Security Fallback Hotspot"
    password: "VP02iGJmSiRU"


captive_portal:


i2c:
  sda: D2

@pmannk
Copy link
Author

pmannk commented Sep 8, 2022

Nice one, that did the trick. I can't trigger the fault anymore.
Thanks @TheStaticTurtle !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants