Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential shot-through condition #187

Open
tobbeanton opened this issue Mar 12, 2024 · 30 comments
Open

Potential shot-through condition #187

tobbeanton opened this issue Mar 12, 2024 · 30 comments

Comments

@tobbeanton
Copy link

Describe the issue

We are testing Bluejay for our upcoming Crazyflie 2.1 - brushless. As part of this we do autonomous flight testing over and over again, we jokingly call it infinite flight test. What we have noticed is that sometimes it just resets in mid air and we directly suspected the ESC. We built a small test rig where we can measure the mosfet signals as well as the battery voltage and cycle the PWM 10% and 100% every 300ms. This way we managed to capture the voltage dip and find a H-bridge shot-though condition. This usually happens within a minute using this test setup.
image
As can be seen in the image Bc and Bp mosfet signals are both on for a short period of time causing the shot-through. This happens in the transition from breaking to accelerating where it looks like Bp is one PWM cycle late (or Bc early). I'm pretty sure this appens for the other phases as well but I don't have a capture of it.

We tried BLHeli_S 16.7 on which we could not detect the shot-though condition.

The full capture is attached and can be viewed using the Salae Logic.
Shot-through-mosfet-channels.zip

Bluejay version

0.19.2 & 0.20.1-RC2

ESC variant

O_H_10

PWM frequency

48

DShot bitrate

300

Bidirectional DShot

Off

FC firmware

Crazyflie 2024.2

Motor size

08028

Configurator debug log

No response

@stylesuxx
Copy link
Contributor

Interesting, thank you for the detailed report. Did you by any chance try with 24kHz PWM setting too?
Also have you tried increasing dead-time to 15?

cc @damosvil - do you have any input on this, or other things you want to see tested?

@stylesuxx stylesuxx added bug Something isn't working investigate and removed bug Something isn't working labels Mar 12, 2024
@tobbeanton
Copy link
Author

We recently tried 24KHz PWM and found the same issue, but now on phase C.
image

And here are the settings we used for this capture
image

@stylesuxx
Copy link
Contributor

stylesuxx commented Mar 12, 2024

Also have you tried increasing dead-time to 15?

Just in case you missed it.

@tobbeanton
Copy link
Author

Due to the rarity of this appearing I'm guessing it is a timing thing, probably some interrupt that triggers att just the right time, causing the PWM to be updated a bit separated in time.

@tobbeanton
Copy link
Author

Just in case you missed it.

I did not try dead-time 15. Would be strange if this was the cause but better be safe then sorry. Will try tomorrow.

@damosvil
Copy link
Contributor

damosvil commented Mar 12, 2024

It seems a problem related to setting the PCA registers, but checking Blheli_S and Bluejay source code it seems they are both setting those registers in the same place:

https://github.com/bitdump/BLHeli/blob/ef8c1a0b644c228f07a82f3d25e6d581492eaacf/BLHeli_S%20SiLabs/BLHeli_S.asm#L1492

IF PWM_BITS_H != 0

It seems that in some occasions Xp starts working a PWM cycle before Xc (updating Xp and Xc are not synchronized to the PWM cycle), something that agrees with the code. What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.

¿Have you found any pattern to reproduce this issue? ¿How frequent is it in your hw?
¿could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know.
¿could you check if you can also reproduce this issue with Bluejay 0.16?

@tobbeanton
Copy link
Author

tobbeanton commented Mar 13, 2024

What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.

Let us try longer and perhaps we can replicate it in Blheli_S too.

could you check if you can also reproduce this issue with Bluejay 0.16

Yes we could reproduce it, and this time it happened in the middle of breaking, not in the change from breaking to accelerating.
image

could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know.

We will give it a try

@damosvil
Copy link
Contributor

damosvil commented Mar 13, 2024

I have been talking with Alka (the creator of AM32) and he suggests that this might be a problem related to not using a gate driver like the fd6288, that implements shoot through prevention. He also said that ARM MCUs do complement the PWM in hardware so it seems it is an issue related tinywhoop hardware in general that uses EFM8BBx MCUs.
imagen

@damosvil
Copy link
Contributor

damosvil commented Mar 13, 2024

What we can do for the next version is not to update the PCA registers if the PCA counter is about to expire. This way Xp and Xc will be updated synchronized with the PCA cycle. This would fix the issue and I think it would not hit performance noticeably.
Another solution, but only a mitigation would be to update first the low part of the power and damp registers and then update the high parts together, so the issue would probably happen a 50% less, this way not hitting performance.

@tobbeanton
Copy link
Author

To me, checking for PCA counter to expire, sound like the right way to do it. Since the auto-reload registers are used there is already a "performance" hit since it can take almost a full cycle before the PCA registers are updated. And I don't think there is any other safe way to do it.

It sounds a bit challenging to implement but we are happy to test it if you know how to do it @damosvil

@tobbeanton
Copy link
Author

Another thing I was thinking about, why we are not able to replicate it in Blheli_s 16.7. It could just be a coincident that we have not manage to catch it but we have tested for ~20min and for Blujay it usually happens within 1min. Could this be related to interrupt rather then the auto-reload registers?

@damosvil
Copy link
Contributor

To me, checking for PCA counter to expire, sound like the right way to do it. Since the auto-reload registers are used there is already a "performance" hit since it can take almost a full cycle before the PCA registers are updated. And I don't think there is any other safe way to do it.

It sounds a bit challenging to implement but we are happy to test it if you know how to do it @damosvil

Ok, I will try a modification and I will let you know

@damosvil
Copy link
Contributor

Another thing I was thinking about, why we are not able to replicate it in Blheli_s 16.7. It could just be a coincident that we have not manage to catch it but we have tested for ~20min and for Blujay it usually happens within 1min. Could this be related to interrupt rather then the auto-reload registers?

I have checked Blheli_S code again and I think that they do something to avoid the issue in the pca_int isr:
https://github.com/bitdump/BLHeli/blob/ef8c1a0b644c228f07a82f3d25e6d581492eaacf/BLHeli_S%20SiLabs/BLHeli_S.asm#L1567

But I think that ISRs add additional latency so it would be better not to update the PWM registers if PCA counter is about to expire and reorder PCA register writes.

@damosvil
Copy link
Contributor

I have been checking EFMBB2 reference manual and it seems it may be not so easy to control when to load Xc and Xp registers:
imagen
I will check Blheli_S solution again.

@damosvil
Copy link
Contributor

I think that a valid solution would be that, when a new dshot frame arrives, to store the power and damp values, and activate the PCA interrupt (generated when PCA counter is 0). In the interrupt we should set Xp and then Xc, so when the up edges happen both autoreload values are loaded in the same cycle, and disable the interrupt again. I will try to code this solution next week.

@tobbeanton
Copy link
Author

I think that a valid solution would be that, when a new dshot frame arrives, to store the power and damp values, and activate the PCA interrupt (generated when PCA counter is 0). In the interrupt we should set Xp and then Xc, so when the up edges happen both autoreload values are loaded in the same cycle, and disable the interrupt again. I will try to code this solution next week.

Sound good, I think this is a common way to handle it.

@tobbeanton
Copy link
Author

Just checking how things are going? Anything we can do to help (but doing the actual fix might be above our skill level)?

@stylesuxx
Copy link
Contributor

Hey, just a heads-up. We have not forgotten you, unfortunately we are currently a bit swamped with private life/work so things will take some time.

@tobbeanton
Copy link
Author

Thanks for letting us know! It might not be the easiest fix either! Meanwhile we might try the:

Another solution, but only a mitigation would be to update first the low part of the power and damp registers and then update the high parts together, so the issue would probably happen a 50% less, this way not hitting performance.

This we could probably manage ourselves.

@stylesuxx
Copy link
Contributor

@tobbeanton thank you, please let us know how it goes - if it works, we would appreciate a PR.

@hyp0dermik-code
Copy link
Contributor

What I don't understand is why you cannot reproduce the same issue in Blheli_S, because both codebases do the same.

Let us try longer and perhaps we can replicate it in Blheli_S too.

could you check if you can also reproduce this issue with Bluejay 0.16

Yes we could reproduce it, and this time it happened in the middle of breaking, not in the change from breaking to accelerating. image

could you alternate one of the led GPIOs before updating the PCA registers and also scope it? - If you need a customized fw to do this let us know.

We will give it a try

This capture was from 0.16, correct?

Can you confirm what version(s) the previous 2 captures were?
#187 (comment)
#187 (comment)

Is the timing of the bug exactly the same for each occurrence on the same version, or is there some variation? How many samples?

What variation did you see between 0.19.2 and 0.21RC (0.20.1?) Are you able to provide some instances from the missing version please?

@tobbeanton
Copy link
Author

Is there a fix in 0.21RC? Else the bug has more or less been fully identified...?

@hyp0dermik-code
Copy link
Contributor

hyp0dermik-code commented May 18, 2024

Is there a fix in 0.21RC? Else the bug has more or less been fully identified...?

No, I was more curious as to what the difference was in timing between 0.21 and .19.2 (if any) at the same PWM frequency

@tobbeanton
Copy link
Author

I think the bug has been there for a long time, since the PCA switching code was changed.

@alinneacsu
Copy link

alinneacsu commented Nov 1, 2024

Hello! Was the bug fixed in the lastest version ?

@stylesuxx
Copy link
Contributor

@alinneacsu No, otherwise we would have closed the issue and mentioned in the release notes. Are you experiencing the same issues?

@alinneacsu
Copy link

alinneacsu commented Nov 1, 2024

@stylesuxx

I think the issue can be similar: my setup includes a FC, based on H743, running Arducopter (bidir dshot enabled) and a 4in1 ESC running BlueJay latest version. Rarely, until now the rate is 1:50 flights, one of the motors simply turns off in flight, but it looks like it is not demag/desync, based on logs. Tried both 24Khz / 48Khz versions, no differences.

Didn't identified any way to replicate the issue, in a controlled environment.

I have many logs indicating this situation, i'm attaching a simple screenshot for now, the constant RPM at the end of the log indicates the moment when the motor stopped:

Screenshot 2024-11-01 at 23 28 44

I'm logging also the following EDT fields:
.SS -> EDT Stress Level (120 constantly)
.SA -> EDT Status (193, rarely goes to 1)

@stylesuxx
Copy link
Contributor

Please attach full logs, so people can look through them.

@alinneacsu
Copy link

Check this out: https://drive.google.com/drive/folders/1dsq6q2YpsevknT9BYpLhBDjFDhUS9mEA

@stylesuxx
Copy link
Contributor

@alinneacsu can you provide some time stamps of interest for those logs please?

Also, what else have you done to troubleshoot this issue? Does it always happen with the same motor? Have you tried to change timings?

The initial issue seems to be reproducable pretty consistently at least at this one setup. So I am not sure if we are looking at the same issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants