Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C/Wire library: Make Wire library non-blocking #42

Open
wmarkow opened this issue Sep 17, 2018 · 9 comments

Comments

@wmarkow
Copy link

commented Sep 17, 2018

This is a placeholder issue to cover an arduino/Arduino/issues/1476 improvement.

It looks like arduino/ArduinoCore-avr repository is the correct one to cover that case, doesn't it?

@wmarkow wmarkow changed the title Make Wire library non-blocking I2C/Wire library: Make Wire library non-blocking Sep 17, 2018

wmarkow added a commit to wmarkow/ArduinoCore-avr that referenced this issue Sep 17, 2018
@wmarkow

This comment has been minimized.

Copy link
Author

commented Sep 17, 2018

I have just integrated my proposal from arduino/Arduino/issues/1476.

@matthijskooijman

This comment has been minimized.

Copy link
Collaborator

commented Sep 6, 2019

Possible implementations:

Looking at these, I suspect that the version by @wmarkow and my own might be the best starting points, but neither completely solve the problem AFAICS.

One particular challenge I've found (but I'm not sure if it is really properly documented yet), is that the AVR TWI hardware is particularly sensitive to noise that makes it look like there is a second master on the bus. Since I²C is a multi-master bus, when the hardware detects an external start condition, or an arbitration error, it will assume there is another master active and hold off any bus activity until it sees a stop condition. However, when there is no other master but just noise on the bus, this will probably never happen.

When the hardware ends up in such a state, there is no way to actually detect this state (i.e. no status bit or anything), other than detecting a timeout (i.e. detecting that the hardware hasn't finished in reasonable time, so probably never started in the first place). There is a arbitration lost interrupt that can detect the start of this state in some cases (but not the end). The only way I've found to recover from this situation is to disable and re-enable the TWI hardware.

So adding timeouts is probably the only way to fix this. However, just having hardcoded timeouts as most of these implementations (including my own) have is problematic, because:

  • Slaves might be using clock stretching, which can erronously trigger a timeout if it is too short.
  • In a true multi-master situation, a transaction might need to wait for another master to finish their transaction, which will likely trigger such a hardcoded timeout. Since transactions might be long, and repeated start can be used to chain multiple together, there is no real upper limit on how long the timeout should be to keep facilitating a multi-master situation.

I suspect the only really correct way to handle this is to let the sketch specify custom timeout values (or perhaps specify the max clock stretching time, whether multi-master should be supported and if so, the max transaction time of other masters). However, this requires API changes that made implementing this a lot more tricky, which is why I didn't submit my fixes for inclusion (and instead also ended up with hardcoded timeouts tailored to my specific case without multi-master and with limited clock stretching).

@VladimirAkopyan

This comment has been minimized.

Copy link

commented Sep 9, 2019

Custom timeout values are perfectly fine, and I believe API change is justified. Current situation leads to permanent lock-up of the microcontroller, and is completely unacceptable.
An inexperienced developer may know nothing about this issue and it will manifest itself many months after hardware has been designed, built and installed . Sometimes people do use Arduino for serious project because that's all they can do to solve a problem.

@greyltc

This comment has been minimized.

Copy link

commented Sep 15, 2019

These lockups are murdering me right now!
I've just tried https://github.com/IanSC/Arduino-Wire.h and I very much do no recommend it, seems to slow bus comms to a crawl without solving the issue. I'll go on down the list...

@greyltc

This comment has been minimized.

Copy link

commented Sep 15, 2019

I've just tried https://github.com/3devo/ArduinoLibraryWire on my I2C lockups and it doesn't solve them either :-(
Maybe I'm using it wrong though? I didn't see any changes to Wire.h so I didn't do anything different in my interface to the library.

@greyltc

This comment has been minimized.

Copy link

commented Sep 15, 2019

https://github.com/wmarkow/Arduino/tree/issue_%231476 seems to prevent my firmware from locking up! 🎉

  • I do a Wire.setClock(400000); before I get started, but after the unlock procedure, that's forgotten and the bus runs at 100kHz
  • I'm in a loop fetching values from an ADC with great speed. I'd love to be able to set the timeout in microseconds instead of milliseconds. I can be pretty sure I'm in a lock state in my application when 100us goes by without traffic on the bus and I want to do everything I can to recover from a lock up quickly so I can miss as few ADC conversions as possible!
@greyltc

This comment has been minimized.

Copy link

commented Sep 15, 2019

I've been looking at these lockups in my scope very closely for a few days now.
I have an idea what the root cause of them might be in my application (one master = Mega2560 rev3, one slave = TI's ADS122C04).

See 3devo/ArduinoLibraryWire#1 for my details with scope traces if you're interested. I have some ideas for changing the Wire library to prevent them in the first place, but I haven't been able to figure out how to actually implement those fix ideas in the code yet.

@wmarkow

This comment has been minimized.

Copy link
Author

commented Sep 16, 2019

Hello @greyltc,

https://github.com/wmarkow/Arduino/tree/issue_%231476 seems to prevent my firmware from locking up! tada

I just wanted to propose you to check out my code. Good that it works for you. However not everything seems to be covered there. It is nice that you give a few more cases to take a look into:

* I do a `Wire.setClock(400000);` before I get started, but after the unlock procedure, that's forgotten and the bus runs at 100kHz

Indeed, when a timeout condition is met, then I restart the TWI hardware (twi_disable() and twi_init()). In this review is suggested that it should not restart TWI but return a result code instead (or set some flag indicating a timeout failure). The user can check the flag later in the main loop, and reinit TWI in his way (like setting the clock to 400000). In my case that would help but I'm not sure if it works for you, when you need to recover from timeout failure very fast, so you can make your ADC conversion.
Imagine the case: you do not know exactly in which part of your code the timeout will happen. The timeout is set to 10ms (for example). Lets assume there is like 20 another TWI operations somewhere in the code between the timeout and your ADC conversion code. I have the felling that all of those 20 TWI operations may end up with the timeout (but I'm not 100% sure), so it will take at least 20 x 10ms = 200ms before your ADC code will be executed.
In my solution I can wait those 200ms and reinit TWI later in loop. For you - the Wire library may go into a "timeout failure" state and may/should not execute any TWI operations (all API methods may/should return immediatelly with a correct result code) until you reinit it somewhere in main loop. That's only a proposed solution.

* I'm in a loop fetching values from an ADC with great speed. I'd love to be able to set the timeout in microseconds instead of milliseconds. I can be pretty sure I'm in a lock state in my application when 100us goes by without traffic on the bus and I want to do everything I can to recover from a lock up quickly so I can miss as few ADC conversions as possible!

Yes, my code sets the timeout in millis but it seems to be no problems to rework it into microseconds.

@greyltc

This comment has been minimized.

Copy link

commented Sep 16, 2019

@wmarkow, I took your https://github.com/wmarkow/Arduino/tree/issue_%231476 branch and changed the timeout argument to microseconds and made any changes the user might have made to slave address or bitrate (the only two register values exposed by the Wire library) re-applied after the reset and put it in PR #107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.