Skip to content

Modbus Slave RTU crashes after couple of days of working (v1)(IDFGH-15008) #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
bugrayanik opened this issue Apr 2, 2025 · 7 comments
Open
2 of 3 tasks

Comments

@bugrayanik
Copy link

bugrayanik commented Apr 2, 2025

Checklist

  • Checked the issue tracker for similar issues to ensure this is not a duplicate
  • Read the documentation to confirm the issue is not addressed there and your configuration is set correctly
  • Tested with the latest version to ensure the issue hasn't been fixed

How often does this bug occurs?

After 2-3 days

Expected behavior

I have an ESP32 device with serial port programmed with ESP IDF, i'm leaving the modbus RTU slave open and expect it to work for days, months without crashing. The test is that there is one modbus master sending 2 requests in 1 second through serial port to the ESP device that use the library. it starts well, I'm watching it works for hours.

Actual behavior (suspected bug)

Two different crashes happened for two different times. I have the core dumps as follows. Both are same experiment, leave the modbus rtu working for days but it lasted for 2-3 days and then crashed.

Error logs or terminal output

1. 

0x40081c1a: panic_abort at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/panic.c:463
0x4008af19: esp_system_abort at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/port/esp_system_chip.c:92
0x400931a1: __assert_func at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/newlib/assert.c:80
0x400fdba1: eMBRTUSend at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/modbus/rtu/mbrtu.c:225
0x400fe0f5: xMBPortEventGet at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portevent.c:104 (discriminator 1)
0x400fe167: usMBPortSerialRxPoll at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:107
0x4008bae2: vPortTaskWrapper at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134



2.

--- 0x40081c1a: panic_abort at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/panic.c:463
0x4008af19: esp_system_abort at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/port/esp_system_chip.c:92
0x400931a1: __assert_func at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/newlib/assert.c:80
0x400fdba1: eMBRTUSend at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/modbus/rtu/mbrtu.c:221
0x400fe0f5: xMBPortEventPost at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portevent.c:96 (discriminator 1)
0x400fe167: usMBPortSerialRxPoll at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:92
0x4008bae2: vPortTaskWrapper at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

I (06:30:39.740) CORE_DUMP: [backtrace]Backtrace Depth: 7
I (06:30:39.746) CORE_DUMP: [backtrace]Backtrace Corrupted: No
I (06:30:39.753) CORE_DUMP: [backtrace]Program Counter: 1074273306
I (06:30:39.760) CORE_DUMP: [backtrace]Coredump Version: 258

I (06:30:39.766) CORE_DUMP: Core Dump Summary:
I (06:30:39.771) CORE_DUMP: Exception Task: uart_queue_task
I (06:30:39.778) CORE_DUMP: Exception PC: 0x40081c1a
--- 0x40081c1a: panic_abort at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/panic.c:463

Steps to reproduce the behavior

leave the modbus RTU slave working

Project release version

1.0.16

System architecture

ARM 64-bit (Apple M1/M2, Raspberry Pi 4/5)

Operating system

MacOS

Operating system version

Sequoia

Shell

ZSH

Additional context

Can you help me what's going on here? Should i switch to the newest version of the library? Are these 2 issues related? What can be the workaround here? Thanks!

@github-actions github-actions bot changed the title Modbus Slave RTU crashes after couple of days of working Modbus Slave RTU crashes after couple of days of working (IDFGH-15008) Apr 2, 2025
@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 3, 2025

Hi @bugrayanik,

Thank you for reporting the issue. As per your coredump back traces it looks like memory (stack, heap) corruption. However, the information you provided is not enough to recognize reason for the issue exactly (sdkconfig, log, map file etc). You need to perform the heap, stack tracing. It does not mean the issue is exactly in the library but it applies to your whole application and all components used there because the corruption happens elsewhere and earlier in time. This depends on configuration aspects of your application and it is usually hard to get exact reason for the issue without tracing.

I would recommend trying the following:

  1. heap tracing.
  2. Fatal errors and tracing
  3. Increase the stack size for Modbus tasks (CONFIG_FMB_PORT_TASK_STACK_SIZE) and for other tasks. Perform heap and stack tracing in your application. Setting the CONFIG_FMB_TIMER_PORT_ENABLED = n also may help.

@alisitsyn alisitsyn self-assigned this Apr 3, 2025
@bugrayanik
Copy link
Author

bugrayanik commented Apr 3, 2025

Thanks for the help @alisitsyn,
I will switch to the debug log level and try to reproduce the crash, for heap tracing, Im already checking the free heap size periodically to ensure there are no memory leaks however I don't know how to apply it here to solve or understand the crash. Can you be more specific of what to do with tracing? Crash happens in days so Im confused how to trace heap for that long

@espressif-bot espressif-bot assigned alisitsyn and unassigned alisitsyn Apr 3, 2025
@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 3, 2025

For stack check on Xtensa you can set debug watchpoint with set a guard region at the end of the stack. See also the stack overflow handling, heap debugging.
Try to monitor free heap and other useful information and print it to the log file which is saved. The log will allow to get historical information prior to crash.

@bugrayanik
Copy link
Author

Im finally able to reproduce it, even tho im not sure its same problem.

==================== ESP32 CORE DUMP START ====================
The ROM ELF file won't load automatically since it was not found for the provided chip type.

Crashed task handle: 0x3ffd98b0, name: '', GDB name: 'process 1073584304'
Crashed task is not in the interrupt context
Panic reason: assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x40088a73
epc2           0x0
epc3           0x0
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x0
eps4           0x0
eps5           0x0
eps6           0x0


==================== CURRENT THREAD STACK =====================
pc             0x40081c1d          0x40081c1d <panic_abort+21>
lbeg           0x4000c28c          1073791628
lend           0x4000c296          1073791638
lcount         0x0                 0
sar            0x10                16
ps             0x60b20             396064
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x8008af1c          -2146914532
a1             0x3ffd9640          1073583680
a2             0x3ffd968b          1073583755
a3             0x2                 2
a4             0xa                 10
a5             0x60123             393507
a6             0x1                 1
a7             0xcdcd              52685
a8             0x0                 0
a9             0x1                 1
a10            0x28                40
a11            0x3ffd9753          1073583955
a12            0x1                 1
a13            0x3ffd7b10          1073576720
a14            0x3                 3
a15            0x60023             393251

======================== THREADS INFO =========================
#0  0x40081c1d in panic_abort (details=0x3ffd968b "assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/panic.c:463
#1  0x4008af1c in esp_system_abort (details=0x3ffd968b "assert failed: xMBRTUReceiveFSM mbrtu.c:248 (eSndState == STATE_TX_IDLE)") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2  0x400931a4 in __assert_func (file=0x3f41569b "mbrtu.c", line=<optimized out>, func=<optimized out>, expr=0x3f4155cc "eSndState == STATE_TX_IDLE") at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/newlib/assert.c:80
#3  0x400fde64 in xMBRTUReceiveFSM () at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/modbus/rtu/mbrtu.c:248
#4  0x400fe3b8 in usMBPortSerialRxPoll (xEventSize=17) at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:102
#5  0x400fe42a in vUartTask (pvParameters=0x0) at D:/Github/KRIO-2S_V5/components/esp-modbus/freemodbus/port/portserial.c:158
#6  0x4008bae5 in vPortTaskWrapper (pxCode=0x400fe3e0 <vUartTask>, pvParameters=0x0) at D:/Espressif_5_3_1/frameworks/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134        
Retrying reading threads information...


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3ffd98b0                 1073580200/10           68/512
0x3ffd7d58                 1073573200/9           76/480
0x3ffc58b8                 1073487832/5        10524/464
0x3ffbffe4                 1073478108/0           72/416
0x3ffc0748                 1073480000/0           76/416
0x3ffc3de8                 1073491936/18           76/560
0x3ffd6840                 1073563704/5          68/1776
0x3ffc41a0                 1073506976/20        15548/528
0x3ffce698                 1073539060/5        16676/816
0x3ffafb18                 1073466316/22        57460/432
0x3ffaf9bc                 1073411508/24           64/432
0x3ffaf458                 1073410128/24           76/432
0x3ffcad28                 1073519904/15           76/448

==================== THREAD 1 (TCB: 0x3ffd98b0, name: '') =====================


==================== THREAD 2 (TCB: 0x3ffd7d58, name: '') =====================


==================== THREAD 3 (TCB: 0x3ffc58b8, name: '') =====================


==================== THREAD 4 (TCB: 0x3ffbffe4, name: '') =====================


==================== THREAD 5 (TCB: 0x3ffc0748, name: '') =====================


==================== THREAD 6 (TCB: 0x3ffc3de8, name: '') =====================


==================== THREAD 7 (TCB: 0x3ffd6840, name: '') =====================


==================== THREAD 8 (TCB: 0x3ffc41a0, name: '') =====================


==================== THREAD 9 (TCB: 0x3ffce698, name: '') =====================


==================== THREAD 10 (TCB: 0x3ffafb18, name: '') =====================


==================== THREAD 11 (TCB: 0x3ffaf9bc, name: '') =====================


==================== THREAD 12 (TCB: 0x3ffaf458, name: '') =====================


==================== THREAD 13 (TCB: 0x3ffcad28, name: '') =====================



======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x400c0000 0x0 RW
.rtc.dummy 0x3ff80000 0x0 RW
.rtc.force_fast 0x3ff80000 0x0 RW
.rtc_noinit 0x50000000 0x0 RW
.rtc.force_slow 0x50000000 0x0 RW
.rtc_fast_reserved 0x3ff82000 0x0 RW
.iram0.vectors 0x40080000 0x403 R XA
.iram0.text 0x40080404 0x18f17 R XA
.dram0.data 0x3ffb0000 0x46dc RW A
.ext_ram_noinit 0x3f800000 0x0 RW
.ext_ram.bss 0x3f800000 0x0 RW
.flash.appdesc 0x3f400020 0x100 R  A
.flash.rodata 0x3f400120 0xf4658 RW A
.flash.text 0x400d0020 0xf7236 R XA
.iram0.data 0x4009931c 0x0 RW
.iram0.bss 0x4009931c 0x0 RW
.dram0.heap_start 0x3ffbc818 0x0 RW
.coredump.tasks.data 0x3ffd98b0 0x154 RW
.coredump.tasks.data 0x3ffd9580 0x320 RW
.coredump.tasks.data 0x3ffd7d58 0x154 RW
.coredump.tasks.data 0x3ffd7b60 0x1e0 RW
.coredump.tasks.data 0x3ffc58b8 0x154 RW
.coredump.tasks.data 0x3ffc2e00 0x1d0 RW
.coredump.tasks.data 0x3ffbffe4 0x154 RW
.coredump.tasks.data 0x3ffbfe30 0x1a0 RW
.coredump.tasks.data 0x3ffc0748 0x154 RW
.coredump.tasks.data 0x3ffc0590 0x1a0 RW
.coredump.tasks.data 0x3ffc3de8 0x154 RW
.coredump.tasks.data 0x3ffc3ba0 0x230 RW
.coredump.tasks.data 0x3ffd6840 0x154 RW
.coredump.tasks.data 0x3ffd6140 0x6f0 RW
.coredump.tasks.data 0x3ffc41a0 0x154 RW
.coredump.tasks.data 0x3ffc7c80 0x210 RW
.coredump.tasks.data 0x3ffce698 0x154 RW
.coredump.tasks.data 0x3ffd24c0 0x330 RW
.coredump.tasks.data 0x3ffafb18 0x154 RW
.coredump.tasks.data 0x3ffbda10 0x1b0 RW
.coredump.tasks.data 0x3ffaf9bc 0x154 RW
.coredump.tasks.data 0x3ffaf800 0x1b0 RW
.coredump.tasks.data 0x3ffaf458 0x154 RW
.coredump.tasks.data 0x3ffaf290 0x1b0 RW
.coredump.tasks.data 0x3ffcad28 0x154 RW
.coredump.tasks.data 0x3ffcab50 0x1c0 RW

===================== ESP32 CORE DUMP END =====================

When i check this, I can tell this happened cuz slave tried to receive while transmitting hasn't finished which is protected to happen by assert in the library, the master has a timeout and maybe timeout has to be higher?

@bugrayanik
Copy link
Author

bugrayanik commented Apr 7, 2025

Ok, Im fixing my master to handle the timeout errors better, I am adding flush() before receives that comes after a timeout in Master, Seems like I will be able to fix this from master side. I wonder if slave has any timeout implemented or slaves doesn't have any timeout? and no modbus error about timeout from Slave? Im trying to make the best slave but I feel like there will be modbus master devices that can easily break this slave. Is it the nature of the modbus? are the masters kinda dangerous for slaves like this?

@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 7, 2025

@bugrayanik ,

Thank you for update.

The Master has the config option FMB_MASTER_TIMEOUT_MS_RESPOND which defines the slave response time and shuld be equal the maximum possible slave response time. If the slave does not respond during this time the Master sends next request. In your case the actual slave response time is set to be exactly equal the time configured in master. The asserts are the legacy aspects that have not been removed in the slave code and this causes the issue. In spite of the collisions the slave should report error and do not crash, this is indeed the bug.

The fix is:
https://github.com/espressif/esp-modbus/blob/master/freemodbus/modbus/rtu/mbrtu.c#L248:

BOOL 
xMBRTUReceiveFSM( void )
{
    BOOL            xStatus = FALSE;
    UCHAR           ucByte;

    // assert( eSndState == STATE_TX_IDLE ); // line 248 in original code commented and changed as below:

    if ( eSndState != STATE_TX_IDLE  ) {
        return FALSE;
    }

    /* Always read the character. */
    xStatus = xMBPortSerialGetByte( ( CHAR * ) & ucByte );

    switch ( eRcvState )
    // The rest of the code.

This change can be applied in the esp-modbus in the components folder. Please let me know if you have further issues.

The esp-modbus v2 library has different handling of packets and is free of this issue.

@espressif-bot espressif-bot assigned alisitsyn and unassigned alisitsyn Apr 14, 2025
@alisitsyn alisitsyn changed the title Modbus Slave RTU crashes after couple of days of working (IDFGH-15008) Modbus Slave RTU crashes after couple of days of working (v1)(IDFGH-15008) Apr 25, 2025
@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 25, 2025

The fix merged to master with commit a294764
The component v1.0.18

@bugrayanik ,

Could you send some update for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants