Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/idle: disable sched when idle call nx_bringup #4026

Merged
merged 1 commit into from Jul 2, 2021

Conversation

Donny9
Copy link
Contributor

@Donny9 Donny9 commented Jul 1, 2021

Summary

Because idle task will call mm_malloc to create some task
and will take sem of mm. But if smp enable, the sem of mm may be
taken by other cpu, so idle may be block because take this sem and crash.

Change-Id: I22f0233ef6c59a1b81607d4389e68f8646c89395
Signed-off-by: Jiuzhu Dong dongjiuzhu1@xiaomi.com

Impact

normal boot for smp

Testing

maual test

Because idle task will call mm_malloc to create some task
and will take sem of mm. But if smp enable, the sem of mm may be
taken by other cpu, so idle may be block because take this sem and crash.

Change-Id: I22f0233ef6c59a1b81607d4389e68f8646c89395
Signed-off-by: Jiuzhu Dong <dongjiuzhu1@xiaomi.com>
@xiaoxiang781216 xiaoxiang781216 merged commit 198b85d into apache:master Jul 2, 2021
@masayuki2009
Copy link
Contributor

@Donny9
Hmm, this PR makes deadlock when booting spresesnse:wifi_smp.

@masayuki2009
Copy link
Contributor

@Donny9
Hmm, this PR makes deadlock when booting spresesnse:wifi_smp.

@Donny9 @xiaoxiang781216
spresense:rndis_smp has the same boot problem now.
So, if you have spresense, you can try spresense:rndis_smp.

@Donny9
Copy link
Contributor Author

Donny9 commented Jul 2, 2021

spresense

We don't have this hardware, can you describe this deadlock in more detail, which lock cause the deadlock ? I test this pr in A7 smp, it's normal.

@masayuki2009
Copy link
Contributor

We don't have this hardware, can you describe this deadlock in more detail, which lock cause the deadlock ?

Though this is spresense specific, it stops in the clock initialization.

(gdb) where                                                                                                                                                                                                                                                                                            
#0  arm_switchcontext () at armv7-m/gnu/arm_switchcontext.S:79                                                                                                                                                                                                                                         
#1  0x0d004be4 in nxsem_wait (sem=sem@entry=0x2d05aab4 <g_freqlockwait>) at semaphore/sem_wait.c:153                                                                                                                                                                                                   
#2  0x0d004bfe in nxsem_wait_uninterruptible (sem=sem@entry=0x2d05aab4 <g_freqlockwait>) at semaphore/sem_wait.c:222                                                                                                                                                                                   
#3  0x0d0016cc in cxd56_pm_semtake (id=id@entry=0x2d05aab4 <g_freqlockwait>) at chip/cxd56_powermgr.c:177                                                                                                                                                                                              
#4  0x0d001712 in cxd56_pm_checkfreqlock () at chip/cxd56_powermgr.c:333                                                                                                                                                                                                                               
#5  0x0d0019ee in up_pm_acquire_freqlock (lock=0x2d059624 <g_hv_lock>) at chip/cxd56_powermgr.c:599                                                                                                                                                                                                    
#6  0x0d0373be in board_clock_initialize () at board/cxd56_clock.c:52                                                                                                                                                                                                                                  
#7  0x0d036c84 in cxd56_bringup () at board/cxd56_bringup.c:252                                                                                                                                                                                                                                        
#8  0x00000000 in ?? ()           

@Donny9
Copy link
Contributor Author

Donny9 commented Jul 2, 2021

We don't have this hardware, can you describe this deadlock in more detail, which lock cause the deadlock ?

Though this is spresense specific, it stops in the clock initialization.

(gdb) where                                                                                                                                                                                                                                                                                            
#0  arm_switchcontext () at armv7-m/gnu/arm_switchcontext.S:79                                                                                                                                                                                                                                         
#1  0x0d004be4 in nxsem_wait (sem=sem@entry=0x2d05aab4 <g_freqlockwait>) at semaphore/sem_wait.c:153                                                                                                                                                                                                   
#2  0x0d004bfe in nxsem_wait_uninterruptible (sem=sem@entry=0x2d05aab4 <g_freqlockwait>) at semaphore/sem_wait.c:222                                                                                                                                                                                   
#3  0x0d0016cc in cxd56_pm_semtake (id=id@entry=0x2d05aab4 <g_freqlockwait>) at chip/cxd56_powermgr.c:177                                                                                                                                                                                              
#4  0x0d001712 in cxd56_pm_checkfreqlock () at chip/cxd56_powermgr.c:333                                                                                                                                                                                                                               
#5  0x0d0019ee in up_pm_acquire_freqlock (lock=0x2d059624 <g_hv_lock>) at chip/cxd56_powermgr.c:599                                                                                                                                                                                                    
#6  0x0d0373be in board_clock_initialize () at board/cxd56_clock.c:52                                                                                                                                                                                                                                  
#7  0x0d036c84 in cxd56_bringup () at board/cxd56_bringup.c:252                                                                                                                                                                                                                                        
#8  0x00000000 in ?? ()           

Thank you , i will check this issue.

@Donny9
Copy link
Contributor Author

Donny9 commented Jul 2, 2021

cxd56_pm_semtake

@masayuki2009 The root cause of deadlock is that new task can't be sched and post semaphore, because i add sched_lock and sched_unlock before and after nx_bringup. But If we don't lock sched in smp, the system will crash because mem issue. How do you think this problem?

@masayuki2009
Copy link
Contributor

cxd56_pm_semtake

@masayuki2009 The root cause of deadlock is that new task can't be sched and post semaphore, because i add sched_lock and sched_unlock before and after nx_bringup. But If we don't lock sched in smp, the system will crash because mem issue. How do you think this problem?

@Donny9
Though I've never seen such a problem before, can we change this function as optional and select it in Kconfig for your environment?

@Donny9
Copy link
Contributor Author

Donny9 commented Jul 2, 2021

cxd56_pm_semtake

@masayuki2009 The root cause of deadlock is that new task can't be sched and post semaphore, because i add sched_lock and sched_unlock before and after nx_bringup. But If we don't lock sched in smp, the system will crash because mem issue. How do you think this problem?

@Donny9
Though I've never seen such a problem before, can we change this function as optional and select it in Kconfig for your environment?

@masayuki2009 cxd56_bringup shouldn't be called in idle thread and don't wait semaphore, we should fix this issue?

@masayuki2009
Copy link
Contributor

@masayuki2009 cxd56_bringup shouldn't be called in idle thread and don't wait semaphore, we should fix this issue?

@Donny9
Do you mean that we should use CONFIG_BOARD_LATE_INITIALIZE=y ?

@Donny9
Copy link
Contributor Author

Donny9 commented Jul 2, 2021

@masayuki2009 cxd56_bringup shouldn't be called in idle thread and don't wait semaphore, we should fix this issue?

@Donny9
Do you mean that we should use CONFIG_BOARD_LATE_INITIALIZE=y ?

Yes, you can try.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jul 2, 2021

@Donny9
Though I've never seen such a problem before, can we change this function as optional and select it in Kconfig for your environment?

This is a race condition hard to meet, but it will bite you later eventually. Our SMP system run more than half year without any issue, but stop boot suddenly. So a kconfig isn't a good candidate here.

@masayuki2009
Copy link
Contributor

This is a race condition hard to meet, but it will bite you later eventually. Our SMP system run more than half year >without any issue, but stop boot suddenly. So a kconfig isn't a good candidate here.

@xiaoxiang781216
Thanks for the information.
I'll try CONFIG_BOARD_LATE_INITIALIZE=y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants