Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E46 CAN script hanging on 2.9.0 #571

Closed
brentpicasso opened this issue May 6, 2016 · 12 comments
Closed

E46 CAN script hanging on 2.9.0 #571

brentpicasso opened this issue May 6, 2016 · 12 comments
Assignees
Labels
Milestone

Comments

@brentpicasso
Copy link
Contributor

Base RCP config with E46 CAN script is hanging on 2.9.0.

STR:
Load 2.9.0 on base RCP config
load E46 CAN script from wiki
RCP is connected to CAN bus on E46
Start engine
Observe raw channels view update for a couple of seconds, then freeze.

Fall back to 2.8.7, observe data updating properly.

@ryandoherty please test the 2.9.0 with the stock E46 script on your car if possible.

@brentpicasso brentpicasso added this to the 2.9.1 milestone May 6, 2016
@ryandoherty
Copy link
Contributor

ryandoherty commented May 6, 2016

Config
Ace Robey .rcp.txt

@stieg stieg self-assigned this May 6, 2016
@ryandoherty
Copy link
Contributor

Lots of FS init errors:

fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
sdcard: Disk init fail
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
sdcard: Disk init fail
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 3
fileWriter.c: Remounting FS due to write error.

@stieg
Copy link
Contributor

stieg commented May 6, 2016

Issue seems to be centered around the logging starting/stopping at a high rate. Probably need to patch this so its not possible to do this.

@ryandoherty
Copy link
Contributor

Removing the startLogging()/stopLogging() calls fixes the problem, but requires a full power cycle of RCP. Updating the script and clicking the Run button doesn't fix the problem.

@stieg
Copy link
Contributor

stieg commented May 6, 2016

fileWriter.c: FS init error: 1
fileWriter.c: Remounting FS due to write error.
fileWriter.c: Opening log file
fileWriter.c: FS init error: 3
fileWriter.c: Remounting FS due to write error.

Interesting... this changed randomly...

@stieg
Copy link
Contributor

stieg commented May 6, 2016

FileWriter task seems to somehow get caught in an infinite loop. Possibly blowing its stack:

#0  0x08035988 in SDIO_CommandCompletionCmd (NewState=DISABLE) at libs/STM32F4xx_DSP_StdPeriph_Lib_V1.0.1/Libraries/STM32F4xx_StdPeriph_Driver/src/stm32f4xx_sdio.c:713
#1  0x0803028c in FindSCR (pscr=0x20008fe8, rca=<optimized out>) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:2245
#2  0x08030324 in FindSCR (pscr=0x10002e3c <g_workingLoggerConfig+2712>, rca=<optimized out>) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:2282
#3  0x00000000 in ?? ()

@stieg
Copy link
Contributor

stieg commented May 6, 2016

Ok. Filewriter task was most definitely blowing its stack. Anyway that is fixed and I have come across where the code freezes up. Here is the stack:

#0  SDIO_GetFlagStatus (SDIO_FLAG=SDIO_FLAG@entry=2097152) at libs/STM32F4xx_DSP_StdPeriph_Lib_V1.0.1/Libraries/STM32F4xx_StdPeriph_Driver/src/stm32f4xx_sdio.c:884
#1  0x080302dc in FindSCR (pscr=pscr@entry=0x20009320, rca=<optimized out>) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:2264
#2  0x08030374 in SDEnWideBus (NewState=NewState@entry=ENABLE) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:1993
#3  0x08030a52 in SD_EnableWideBusOperation (WideMode=WideMode@entry=2048) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:941
#4  0x08030b2e in SD_Init () at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:378
#5  0x08030b42 in MMC_disk_initialize () at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:165
#6  0x0802ed0a in disk_initialize (pdrv=pdrv@entry=0 '\000') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/diskio.c:30
#7  0x0802d26e in find_volume (rfs=rfs@entry=0x2000939c, path=path@entry=0x20009398, wmode=wmode@entry=0 '\000') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/ff.c:2261
#8  0x0802dd26 in f_mount (fs=0x20003c70, path=path@entry=0x805d5b5 "0", opt=opt@entry=1 '\001') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/ff.c:2459
#9  0x0802b922 in InitFS () at ../../src/sdcard/sdcard.c:72
#10 0x0802689c in open_log_file (ls=0x200093c8) at ../../src/logger/fileWriter.c:273
#11 logging_sample (msg=0x200093bc, ls=0x200093c8) at ../../src/logger/fileWriter.c:360
#12 fileWriterTask (params=<optimized out>) at ../../src/logger/fileWriter.c:423
#13 0x08036d1c in vPortEnterCritical () at libs/FreeRTOSV7.6.0/FreeRTOS/Source/portable/GCC/ARM_CM4F/port.c:390

In frame 1 we are in a busy while loop waiting for an interrupt to come in. Problem is that the interrupt never comes :(. So as a result we wait for ever. Since this portion of the code lives in a critical section that can cause the whole system to become unstable. So the first thing to do is to see if we can eliminate the critical section and keep the preemptive scheduler running. Then we dive down to the root cause.

@brentpicasso
Copy link
Contributor Author

Is this causing a watchdog issue, or just blocking the filewriter?

On Fri, May 6, 2016 at 12:06 PM, Andrew Stiegmann notifications@github.com
wrote:

Ok. Filewriter task was most definitely blowing its stack. Anyway that is
fixed and I have come across where the code freezes up. Here is the stack:

#0 SDIO_GetFlagStatus (SDIO_FLAG=SDIO_FLAG@entry=2097152) at libs/STM32F4xx_DSP_StdPeriph_Lib_V1.0.1/Libraries/STM32F4xx_StdPeriph_Driver/src/stm32f4xx_sdio.c:884
#1 0x080302dc in FindSCR (pscr=pscr@entry=0x20009320, rca=) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:2264
#2 0x08030374 in SDEnWideBus (NewState=NewState@entry=ENABLE) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:1993
#3 0x08030a52 in SD_EnableWideBusOperation (WideMode=WideMode@entry=2048) at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:941
#4 0x08030b2e in SD_Init () at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:378
#5 0x08030b42 in MMC_disk_initialize () at ../../platform/mk2/hal/fat_sd_stm32/fatfs/lo_level_ub/stm32_ub_sdcard.c:165
#6 0x0802ed0a in disk_initialize (pdrv=pdrv@entry=0 '\000') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/diskio.c:30
#7 0x0802d26e in find_volume (rfs=rfs@entry=0x2000939c, path=path@entry=0x20009398, wmode=wmode@entry=0 '\000') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/ff.c:2261
#8 0x0802dd26 in f_mount (fs=0x20003c70, path=path@entry=0x805d5b5 "0", opt=opt@entry=1 '\001') at ../../platform/mk2/hal/fat_sd_stm32/fatfs/ff.c:2459
#9 0x0802b922 in InitFS () at ../../src/sdcard/sdcard.c:72
#10 0x0802689c in open_log_file (ls=0x200093c8) at ../../src/logger/fileWriter.c:273
#11 logging_sample (msg=0x200093bc, ls=0x200093c8) at ../../src/logger/fileWriter.c:360
#12 fileWriterTask (params=) at ../../src/logger/fileWriter.c:423
#13 0x08036d1c in vPortEnterCritical () at libs/FreeRTOSV7.6.0/FreeRTOS/Source/portable/GCC/ARM_CM4F/port.c:390

In frame 1 we are in a busy while loop waiting for an interrupt to come
in. Problem is that the interrupt never comes :(. So as a result we wait
for ever. Since this portion of the code lives in a critical section that
can cause the whole system to become unstable. So the first thing to do is
to see if we can eliminate the critical section and keep the preemptive
scheduler running. Then we dive down to the root cause.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#571 (comment)

Brent Picasso
Autosport Labs http://www.autosportlabs.com
Technology for Race and Street

@stieg
Copy link
Contributor

stieg commented May 6, 2016

Appears to just be blocking my filewriter... but not sure how the rest of the system continues to function given that preemptive scheduling should be disabled. Looking at the code it issues the arm cli instruction which clears the interrupt flag, effectively disabling interrupts. I'm still digging...

@stieg
Copy link
Contributor

stieg commented May 6, 2016

Well looky here... the FreeRTOS code on the GCC/ARM_CM4F platform actually does disable the interrupts when we do the portENTER_CRITICAL call :

portmacro.h

/* Critical section management. */
extern void vPortEnterCritical( void );
extern void vPortExitCritical( void );
extern unsigned long ulPortSetInterruptMask( void );
extern void vPortClearInterruptMask( unsigned long ulNewMaskValue );
#define portSET_INTERRUPT_MASK_FROM_ISR()               ulPortSetInterruptMask()
#define portCLEAR_INTERRUPT_MASK_FROM_ISR(x)    vPortClearInterruptMask(x)
#define portDISABLE_INTERRUPTS()                                ulPortSetInterruptMask()
#define portENABLE_INTERRUPTS()                                 vPortClearInterruptMask(0)
#define portENTER_CRITICAL()                                    vPortEnterCritical()
#define portEXIT_CRITICAL()                                             vPortExitCritical()

And the port.c file.

void vPortEnterCritical( void )
{
    portDISABLE_INTERRUPTS();
    uxCriticalNesting++;
    __asm volatile( "dsb" );
    __asm volatile( "isb" );
}
/*-----------------------------------------------------------*/

void vPortExitCritical( void )
{
    uxCriticalNesting--;
    if( uxCriticalNesting == 0 ) {
        portENABLE_INTERRUPTS();
    }
}

Well.. lets see if removing this critical section fixes the issue.

@stieg
Copy link
Contributor

stieg commented May 6, 2016

Issues has an improvement patch with PR #572

@stieg
Copy link
Contributor

stieg commented May 9, 2016

So the short answer to this one is that a combination of a slow/bad SD card, abuse of the start/stop logging methods, and some bugs in the code lead to this issue. PR #572 fixed the bugs in the code as best as is reasonably possible, but there are still other issues to solve (See issue #573). That won't get solved in this patch release... this bit should do well enough for now. Hence closing

@stieg stieg closed this as completed May 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants