-
Notifications
You must be signed in to change notification settings - Fork 61
Modbus Slave RTU crashes after couple of days of working (v1)(IDFGH-15008) #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @bugrayanik, Thank you for reporting the issue. As per your coredump back traces it looks like memory (stack, heap) corruption. However, the information you provided is not enough to recognize reason for the issue exactly (sdkconfig, log, map file etc). You need to perform the heap, stack tracing. It does not mean the issue is exactly in the library but it applies to your whole application and all components used there because the corruption happens elsewhere and earlier in time. This depends on configuration aspects of your application and it is usually hard to get exact reason for the issue without tracing. I would recommend trying the following:
|
Thanks for the help @alisitsyn, |
For stack check on Xtensa you can set debug watchpoint with set a guard region at the end of the stack. See also the stack overflow handling, heap debugging. |
Im finally able to reproduce it, even tho im not sure its same problem. ==================== ESP32 CORE DUMP START ====================
The ROM ELF file won't load automatically since it was not found for the provided chip type.
Crashed task handle: 0x3ffd98b0, name: '', GDB name: 'process 1073584304'
Crashed task is not in the interrupt context
Panic reason: assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)
================== CURRENT THREAD REGISTERS ===================
exccause 0x1d (StoreProhibitedCause)
excvaddr 0x0
epc1 0x40088a73
epc2 0x0
epc3 0x0
epc4 0x0
epc5 0x0
epc6 0x0
eps2 0x0
eps3 0x0
eps4 0x0
eps5 0x0
eps6 0x0
==================== CURRENT THREAD STACK =====================
pc 0x40081c1d 0x40081c1d <panic_abort+21>
lbeg 0x4000c28c 1073791628
lend 0x4000c296 1073791638
lcount 0x0 0
sar 0x10 16
ps 0x60b20 396064
threadptr <unavailable>
br <unavailable>
scompare1 <unavailable>
acclo <unavailable>
acchi <unavailable>
m0 <unavailable>
m1 <unavailable>
m2 <unavailable>
m3 <unavailable>
expstate <unavailable>
f64r_lo <unavailable>
f64r_hi <unavailable>
f64s <unavailable>
fcr <unavailable>
fsr <unavailable>
a0 0x8008af1c -2146914532
a1 0x3ffd9640 1073583680
a2 0x3ffd968b 1073583755
a3 0x2 2
a4 0xa 10
a5 0x60123 393507
a6 0x1 1
a7 0xcdcd 52685
a8 0x0 0
a9 0x1 1
a10 0x28 40
a11 0x3ffd9753 1073583955
a12 0x1 1
a13 0x3ffd7b10 1073576720
a14 0x3 3
a15 0x60023 393251
======================== THREADS INFO =========================
#0 0x40081c1d in panic_abort (details=0x3ffd968b "assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/panic.c:463
#1 0x4008af1c in esp_system_abort (details=0x3ffd968b "assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2 0x400931a4 in __assert_func (file=0x3f41569b "mbrtu.c", line=<optimized out>, func=<optimized out>, expr=0x3f4155cc "eSndState == STATE_TX_IDLE") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/newlib/assert.c:80
#3 0x400fde64 in xMBRTUReceiveFSM () at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/modbus/rtu/mbrtu.c:248
#4 0x400fe3b8 in usMBPortSerialRxPoll (xEventSize=17) at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:102
#5 0x400fe42a in vUartTask (pvParameters=0x0) at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:158
#6 0x4008bae5 in vPortTaskWrapper (pxCode=0x400fe3e0 <vUartTask>, pvParameters=0x0) at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134
Retrying reading threads information...
TCB NAME PRIO C/B STACK USED/FREE
---------- ---------------- -------- ----------------
0x3ffd98b0 1073580200/10 68/512
0x3ffd7d58 1073573200/9 76/480
0x3ffc58b8 1073487832/5 10524/464
0x3ffbffe4 1073478108/0 72/416
0x3ffc0748 1073480000/0 76/416
0x3ffc3de8 1073491936/18 76/560
0x3ffd6840 1073563704/5 68/1776
0x3ffc41a0 1073506976/20 15548/528
0x3ffce698 1073539060/5 16676/816
0x3ffafb18 1073466316/22 57460/432
0x3ffaf9bc 1073411508/24 64/432
0x3ffaf458 1073410128/24 76/432
0x3ffcad28 1073519904/15 76/448
==================== THREAD 1 (TCB: 0x3ffd98b0, name: '') =====================
==================== THREAD 2 (TCB: 0x3ffd7d58, name: '') =====================
==================== THREAD 3 (TCB: 0x3ffc58b8, name: '') =====================
==================== THREAD 4 (TCB: 0x3ffbffe4, name: '') =====================
==================== THREAD 5 (TCB: 0x3ffc0748, name: '') =====================
==================== THREAD 6 (TCB: 0x3ffc3de8, name: '') =====================
==================== THREAD 7 (TCB: 0x3ffd6840, name: '') =====================
==================== THREAD 8 (TCB: 0x3ffc41a0, name: '') =====================
==================== THREAD 9 (TCB: 0x3ffce698, name: '') =====================
==================== THREAD 10 (TCB: 0x3ffafb18, name: '') =====================
==================== THREAD 11 (TCB: 0x3ffaf9bc, name: '') =====================
==================== THREAD 12 (TCB: 0x3ffaf458, name: '') =====================
==================== THREAD 13 (TCB: 0x3ffcad28, name: '') =====================
======================= ALL MEMORY REGIONS ========================
Name Address Size Attrs
.rtc.text 0x400c0000 0x0 RW
.rtc.dummy 0x3ff80000 0x0 RW
.rtc.force_fast 0x3ff80000 0x0 RW
.rtc_noinit 0x50000000 0x0 RW
.rtc.force_slow 0x50000000 0x0 RW
.rtc_fast_reserved 0x3ff82000 0x0 RW
.iram0.vectors 0x40080000 0x403 R XA
.iram0.text 0x40080404 0x18f17 R XA
.dram0.data 0x3ffb0000 0x46dc RW A
.ext_ram_noinit 0x3f800000 0x0 RW
.ext_ram.bss 0x3f800000 0x0 RW
.flash.appdesc 0x3f400020 0x100 R A
.flash.rodata 0x3f400120 0xf4658 RW A
.flash.text 0x400d0020 0xf7236 R XA
.iram0.data 0x4009931c 0x0 RW
.iram0.bss 0x4009931c 0x0 RW
.dram0.heap_start 0x3ffbc818 0x0 RW
.coredump.tasks.data 0x3ffd98b0 0x154 RW
.coredump.tasks.data 0x3ffd9580 0x320 RW
.coredump.tasks.data 0x3ffd7d58 0x154 RW
.coredump.tasks.data 0x3ffd7b60 0x1e0 RW
.coredump.tasks.data 0x3ffc58b8 0x154 RW
.coredump.tasks.data 0x3ffc2e00 0x1d0 RW
.coredump.tasks.data 0x3ffbffe4 0x154 RW
.coredump.tasks.data 0x3ffbfe30 0x1a0 RW
.coredump.tasks.data 0x3ffc0748 0x154 RW
.coredump.tasks.data 0x3ffc0590 0x1a0 RW
.coredump.tasks.data 0x3ffc3de8 0x154 RW
.coredump.tasks.data 0x3ffc3ba0 0x230 RW
.coredump.tasks.data 0x3ffd6840 0x154 RW
.coredump.tasks.data 0x3ffd6140 0x6f0 RW
.coredump.tasks.data 0x3ffc41a0 0x154 RW
.coredump.tasks.data 0x3ffc7c80 0x210 RW
.coredump.tasks.data 0x3ffce698 0x154 RW
.coredump.tasks.data 0x3ffd24c0 0x330 RW
.coredump.tasks.data 0x3ffafb18 0x154 RW
.coredump.tasks.data 0x3ffbda10 0x1b0 RW
.coredump.tasks.data 0x3ffaf9bc 0x154 RW
.coredump.tasks.data 0x3ffaf800 0x1b0 RW
.coredump.tasks.data 0x3ffaf458 0x154 RW
.coredump.tasks.data 0x3ffaf290 0x1b0 RW
.coredump.tasks.data 0x3ffcad28 0x154 RW
.coredump.tasks.data 0x3ffcab50 0x1c0 RW
===================== ESP32 CORE DUMP END =====================
When i check this, I can tell this happened cuz slave tried to receive while transmitting hasn't finished which is protected to happen by assert in the library, the master has a timeout and maybe timeout has to be higher? |
Ok, Im fixing my master to handle the timeout errors better, I am adding flush() before receives that comes after a timeout in Master, Seems like I will be able to fix this from master side. I wonder if slave has any timeout implemented or slaves doesn't have any timeout? and no modbus error about timeout from Slave? Im trying to make the best slave but I feel like there will be modbus master devices that can easily break this slave. Is it the nature of the modbus? are the masters kinda dangerous for slaves like this? |
Thank you for update. The Master has the config option The fix is: BOOL
xMBRTUReceiveFSM( void )
{
BOOL xStatus = FALSE;
UCHAR ucByte;
// assert( eSndState == STATE_TX_IDLE ); // line 248 in original code commented and changed as below:
if ( eSndState != STATE_TX_IDLE ) {
return FALSE;
}
/* Always read the character. */
xStatus = xMBPortSerialGetByte( ( CHAR * ) & ucByte );
switch ( eRcvState )
// The rest of the code. This change can be applied in the esp-modbus in the components folder. Please let me know if you have further issues. The esp-modbus v2 library has different handling of packets and is free of this issue. |
The fix merged to master with commit a294764 Could you send some update for this? |
Checklist
How often does this bug occurs?
After 2-3 days
Expected behavior
I have an ESP32 device with serial port programmed with ESP IDF, i'm leaving the modbus RTU slave open and expect it to work for days, months without crashing. The test is that there is one modbus master sending 2 requests in 1 second through serial port to the ESP device that use the library. it starts well, I'm watching it works for hours.
Actual behavior (suspected bug)
Two different crashes happened for two different times. I have the core dumps as follows. Both are same experiment, leave the modbus rtu working for days but it lasted for 2-3 days and then crashed.
Error logs or terminal output
Steps to reproduce the behavior
leave the modbus RTU slave working
Project release version
1.0.16
System architecture
ARM 64-bit (Apple M1/M2, Raspberry Pi 4/5)
Operating system
MacOS
Operating system version
Sequoia
Shell
ZSH
Additional context
Can you help me what's going on here? Should i switch to the newest version of the library? Are these 2 issues related? What can be the workaround here? Thanks!
The text was updated successfully, but these errors were encountered: