Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C/Wire library: Make Wire library non-blocking #42

Closed
wmarkow opened this issue Sep 17, 2018 · 92 comments
Closed

I2C/Wire library: Make Wire library non-blocking #42

wmarkow opened this issue Sep 17, 2018 · 92 comments

Comments

@wmarkow
Copy link

wmarkow commented Sep 17, 2018

This is a placeholder issue to cover an arduino/Arduino/issues/1476 improvement.

It looks like arduino/ArduinoCore-avr repository is the correct one to cover that case, doesn't it?

@wmarkow wmarkow changed the title Make Wire library non-blocking I2C/Wire library: Make Wire library non-blocking Sep 17, 2018
wmarkow added a commit to wmarkow/ArduinoCore-avr that referenced this issue Sep 17, 2018
@wmarkow
Copy link
Author

wmarkow commented Sep 17, 2018

I have just integrated my proposal from arduino/Arduino/issues/1476.

@matthijskooijman
Copy link
Collaborator

Possible implementations:

Looking at these, I suspect that the version by @wmarkow and my own might be the best starting points, but neither completely solve the problem AFAICS.

One particular challenge I've found (but I'm not sure if it is really properly documented yet), is that the AVR TWI hardware is particularly sensitive to noise that makes it look like there is a second master on the bus. Since I²C is a multi-master bus, when the hardware detects an external start condition, or an arbitration error, it will assume there is another master active and hold off any bus activity until it sees a stop condition. However, when there is no other master but just noise on the bus, this will probably never happen.

When the hardware ends up in such a state, there is no way to actually detect this state (i.e. no status bit or anything), other than detecting a timeout (i.e. detecting that the hardware hasn't finished in reasonable time, so probably never started in the first place). There is a arbitration lost interrupt that can detect the start of this state in some cases (but not the end). The only way I've found to recover from this situation is to disable and re-enable the TWI hardware.

So adding timeouts is probably the only way to fix this. However, just having hardcoded timeouts as most of these implementations (including my own) have is problematic, because:

  • Slaves might be using clock stretching, which can erronously trigger a timeout if it is too short.
  • In a true multi-master situation, a transaction might need to wait for another master to finish their transaction, which will likely trigger such a hardcoded timeout. Since transactions might be long, and repeated start can be used to chain multiple together, there is no real upper limit on how long the timeout should be to keep facilitating a multi-master situation.

I suspect the only really correct way to handle this is to let the sketch specify custom timeout values (or perhaps specify the max clock stretching time, whether multi-master should be supported and if so, the max transaction time of other masters). However, this requires API changes that made implementing this a lot more tricky, which is why I didn't submit my fixes for inclusion (and instead also ended up with hardcoded timeouts tailored to my specific case without multi-master and with limited clock stretching).

@VladimirAkopyan
Copy link

Custom timeout values are perfectly fine, and I believe API change is justified. Current situation leads to permanent lock-up of the microcontroller, and is completely unacceptable.
An inexperienced developer may know nothing about this issue and it will manifest itself many months after hardware has been designed, built and installed . Sometimes people do use Arduino for serious project because that's all they can do to solve a problem.

@greyltc
Copy link
Contributor

greyltc commented Sep 15, 2019

These lockups are murdering me right now!
I've just tried https://github.com/IanSC/Arduino-Wire.h and I very much do not recommend it, seems to slow bus comms to a crawl without solving the issue. I'll go on down the list...

@greyltc
Copy link
Contributor

greyltc commented Sep 15, 2019

I've just tried https://github.com/3devo/ArduinoLibraryWire on my I2C lockups and it doesn't solve them either :-(
Maybe I'm using it wrong though? I didn't see any changes to Wire.h so I didn't do anything different in my interface to the library.

@greyltc
Copy link
Contributor

greyltc commented Sep 15, 2019

https://github.com/wmarkow/Arduino/tree/issue_%231476 seems to prevent my firmware from locking up! 🎉

  • I do a Wire.setClock(400000); before I get started, but after the unlock procedure, that's forgotten and the bus runs at 100kHz
  • I'm in a loop fetching values from an ADC with great speed. I'd love to be able to set the timeout in microseconds instead of milliseconds. I can be pretty sure I'm in a lock state in my application when 100us goes by without traffic on the bus and I want to do everything I can to recover from a lock up quickly so I can miss as few ADC conversions as possible!

@greyltc
Copy link
Contributor

greyltc commented Sep 15, 2019

I've been looking at these lockups in my scope very closely for a few days now.
I have an idea what the root cause of them might be in my application (one master = Mega2560 rev3, one slave = TI's ADS122C04).

See 3devo/ArduinoLibraryWire#1 for my details with scope traces if you're interested. I have some ideas for changing the Wire library to prevent them in the first place, but I haven't been able to figure out how to actually implement those fix ideas in the code yet.

@wmarkow
Copy link
Author

wmarkow commented Sep 16, 2019

Hello @greyltc,

https://github.com/wmarkow/Arduino/tree/issue_%231476 seems to prevent my firmware from locking up! tada

I just wanted to propose you to check out my code. Good that it works for you. However not everything seems to be covered there. It is nice that you give a few more cases to take a look into:

* I do a `Wire.setClock(400000);` before I get started, but after the unlock procedure, that's forgotten and the bus runs at 100kHz

Indeed, when a timeout condition is met, then I restart the TWI hardware (twi_disable() and twi_init()). In this review is suggested that it should not restart TWI but return a result code instead (or set some flag indicating a timeout failure). The user can check the flag later in the main loop, and reinit TWI in his way (like setting the clock to 400000). In my case that would help but I'm not sure if it works for you, when you need to recover from timeout failure very fast, so you can make your ADC conversion.
Imagine the case: you do not know exactly in which part of your code the timeout will happen. The timeout is set to 10ms (for example). Lets assume there is like 20 another TWI operations somewhere in the code between the timeout and your ADC conversion code. I have the felling that all of those 20 TWI operations may end up with the timeout (but I'm not 100% sure), so it will take at least 20 x 10ms = 200ms before your ADC code will be executed.
In my solution I can wait those 200ms and reinit TWI later in loop. For you - the Wire library may go into a "timeout failure" state and may/should not execute any TWI operations (all API methods may/should return immediatelly with a correct result code) until you reinit it somewhere in main loop. That's only a proposed solution.

* I'm in a loop fetching values from an ADC with great speed. I'd love to be able to set the timeout in microseconds instead of milliseconds. I can be pretty sure I'm in a lock state in my application when 100us goes by without traffic on the bus and I want to do everything I can to recover from a lock up quickly so I can miss as few ADC conversions as possible!

Yes, my code sets the timeout in millis but it seems to be no problems to rework it into microseconds.

@greyltc
Copy link
Contributor

greyltc commented Sep 16, 2019

@wmarkow, I took your https://github.com/wmarkow/Arduino/tree/issue_%231476 branch and changed the timeout argument to microseconds and made any changes the user might have made to slave address or bitrate (the only two register values exposed by the Wire library) re-applied after the reset and put it in PR arduino/Arduino#107

@Hyperdimensionals
Copy link

Hyperdimensionals commented Feb 2, 2020

I'm using a DS3231 external clock and noticed it 'freezes' when I disconnect power to the component. I want the program to continue looping even if the clock is disconnected. I narrowed it down to a Wire.endTransmission() line in the module code, and then found this page.

Forgive my possible ignorant novice question, but which implementation listed by @matthijskooijman works best for this issue? Or is this a situation where there's some line of code I can add that will check whether the clock device is connected/powered before reaching the Wire.endTransmission() line? A lot of this discussion is over my head so I wasn't sure if this has a simpler solution than the issues discussed above?

@Mahdi-Hosseinali
Copy link

I'm using a DS3231 external clock and noticed it 'freezes' when I disconnect power to the component. I want the program to continue looping even if the clock is disconnected. I narrowed it down to a Wire.endTransmission() line in the module code, and then found this page.

Forgive my possible ignorant novice question, but which implementation listed by @matthijskooijman works best for this issue? Or is this a situation where there's some line of code I can add that will check whether the clock device is connected/powered before reaching the Wire.endTransmission() line? A lot of this discussion is over my head so I wasn't sure if this has a simpler solution than the issues discussed above?

Use wire.endTransmission(true), it would end the transmission even if there's no response from the other side. Although this still does not act like a TIME_OUT.

@Hyperdimensionals
Copy link

wire.endTransmission(true) worked for the SDA and SCL lines getting disconnected, but unfortunately when the power line is disconnected it still blocks. I'm using a battery powered clock module so the battery could die and I don't want that to become a failure point.

@Mahdi-Hosseinali
Copy link

It probably depends on your circuitry, mine works when the other device is off. The reference manual says it sends a stop signal and releases the line, but it apparently doesn't. Or it could be that something else is wrong and we are not adept enough to understand it (electronics can be very complicated)

@ermtl
Copy link

ermtl commented Mar 27, 2020

It's been 9 FUCKING YEARS since that bug was first discussed (2011), countless people pulled their hair trying to understand why their Arduino was freezing, why would it work normally, then suddenly stop, a dozen of times, it's been raised in here, dozens of times people had been told to get lost, use something else, etc ...
arduino/Arduino#7328
arduino/Arduino#2418
arduino/Arduino#1842
arduino/Arduino#5249
#42
Google shows pages after pages of people getting started and who get discouraged by this elusive bug they don't understand. This runs totally contrary of what arduino is supposed to stand for.

If the fix was difficult, if it compromised compatibility or added other problem, it would be understandable, but it's not the case, all that shit is caused by those 2 damn lines of code that obviously can create an infinite loop if for some reason the read operation does not complete :

// wait for read operation to complete
while(TWI_MRX == twi_state){
continue;
}

Yes, I know, this state is not supposed to happen according to the I2C protocol, but guess what ? electrical glitches didn't get the memo.

Countless people, after losing hours or days kind of solved the issue either by making a modified version for themselves, or switching to another library, so there are several implementations of a timeout that are simple and would easily solve the issue.

But arduino developers stubbornly refuse to fix it !

You think I'm rude ? After 9 years of giving the finger on this issue to the whole community for absolutely no reason, I couldn't care less.

How long before you close it again ?
Just be honest, and put a "fuck you all, WONTFIX" label on it ...

@VladimirAkopyan
Copy link

Basically, @ermtl is right, this is ridiculous, WTF?

@cmaglie
Copy link
Member

cmaglie commented Mar 27, 2020

You think I'm rude?

Nah I don't think it... you are, period.

Also posting the same message 5 times is not nice, especially for the 99.9% of people subscribed to this repo that are not interested in this discussion. Anyway, now you did it, so let's just skip over this.

How long before you close it again ?

We haven't closed anything... the real problem is that there isn't a one-fit-all solution. If you read the previous discussions all the proposed fix were always barely tested and tailored to the OP specific use case.

I personally tested two PR in the past that, in their promises, will add timeout "without breaking anything" but you know what? at the times when I tested in my setup (an I2C display with a bunch of env sensors) they just broke my previously perfectly working sketch or made it unreasonably slow.

Don't underestimate the complexity of this issue, it's not just "2 damn lines of code".

Said that, I ask you, did you read the previous discussions?
Which one, of the previously proposed solution, you vouch?
Have you already tested them in your setup?

@VladimirAkopyan
Copy link

VladimirAkopyan commented Mar 27, 2020

@cmaglie sure the @ermtl is rude, but his assessment is broadly accurate. I am willing to bet my house that someone is going to use Arduino to build a ventilator to deal with CORVID-19, and it will hang and kill someone.

I tried to follow the discussion, but it's spread over many threads. Let's consider a strategy here:

  1. What is the central place for this discussions, is it this thread? Is it this PR? Make Wire library error safe Arduino#2489
  2. What are the acceptance criteria, can we define a list of libraries and sensors that we are determined not to break? Are breaking changes acceptable?
  3. What is the strategy if breaks are a unavoidable? For libraries that will break, are we going to do a PR or get in touch with the author?
  4. Given that I2C protocol is often abused and misused, including by sensor manufacturers, how do we cater to those cases?
  5. Perhaps the alternative is to create a separate method / API and gradually depricate the old one? Would that be acceptable?

I think the frustration is not just from lack of a solution - that's understandable if the problem is complex. I think it stems from lack of clarity and leadership, there does not appear to be a plan on how to get it fixed. If someone wanted to contribute, is it not clear what they should do, at least to me.

@bxparks
Copy link
Contributor

bxparks commented Mar 27, 2020

@VladimirAkopyan: You would win that bet. I was amazed and mortified when one of the authors of this (https://github.com/ermtl/Open-Source-Ventilator) asked me a question about one of my libraries. Fortunately, I just noticed this in their README.md:
"0.15 Minor version change. This version allows the replacement of the buggy Wire.h arduino library (can hang the controller) with a correct version known as jm_Wire"

@ermtl
Copy link

ermtl commented Mar 28, 2020

bxparks : Well, it's a small world out there and I'm the main author of this Open Source Ventilator controller as part of an international team trying to design and assemble complete open source, easy to replicate emergency ventilators.

A few month ago I had the problem on a commercial product, and I went all the way, from adding capacitors to better filter supply electrical noise, to lowering the value of the pullup resistors to 1Ko, adding little serie resistors just in case, to re designing the wiring, to thinking that I might have had a bad lot of ICs, ordering new ones, waiting for them to arrive, seeing they were no better, adding a watchdog (but it was just an ugly fix), searching for a new similar sensor, ordering the new sensor, trying it, understanding that it could not possibly be 2 different kind of sensors that are both bad the same way, finally suspecting something could be wrong with the library and finally finding this utterly stupid bug !
I then switched to this library that simply includes timeouts:
http://dsscircuits.com/articles/arduino-i2c-master-library
It was nice, but forced me to rewrite the sensor library.
In the meantime, what I billed as a 1 week job had taken me more than 2 month (that's 1 month 3 weeks I could not even imagine billing the customer anything for) and the customer grew impatient. When I finally solved the issue, it was a sight of relief, but in the process, my customer's trust got broken, and he hired someone else to rebuild it, and it's not with an Arduino !

This is just what happened to me, and the bug was caught fast as the device was a noisy environment and it would lockup every few hours / minutes, but some are designing products that are not in an electrically noisy environment, and they might never see the bug, start selling their stuff and people place them close to motors, or old blinking fluo tubes, and suddenly, the products start to fail !

People use Arduinos as the glue between sensors and actuators, and a large proportion of them are I2C circuits. Having such a vicious bug lurking, ready to lockup the entire board undermines the whole platform and confines it to "hobby" status. Even then, when newcomers assemble their first gizmos, seeing it die on them for apparently no reason is really frustrating and might drive them away.

In the Open-Source-Ventilator project, many people must cooperate, we're in a hurry, people are trying several sensors, so telling them they can't use any of the most usual libraries without rewriting them first is totally unrealistic.

After spending a day on the problem (again), I found the jm_wire library (available in the library manager) that's compatible with Wire and the author simply made minimal changes to implement a fucking timeout, and, call that rocket science if you will, but yes, the timeout length can be changed, no less ! Crazy ... just what everyone asked for 9 years.

However, it does not entirely solve the issue as I2C sensor libraries will reference the wire library within their code with
#include <Wire.h>

When also including jm_Wire, all the library's function will be defined twice, and you can bet the compiler will yell and fail.

So the less ugly way I found is to look at the source files of each and every I2C related library searching for #include <Wire.h> and manually replacing it with:

#if __has_include("jm_Wire.h")
#include <jm_Wire.h>
#else
#include <Wire.h>
#endif

It does the trick ... until you update the library !

Since I'm working on a device that needs utmost reliability, not just a gizmo that blinks, it's ok for me to warn people about the stupid problem, and tell them how to overcome it. Here's my explanation:
https://github.com/ermtl/Open-Source-Ventilator/blob/master/OpenSourceVentilator/README.md

There is just a thing I would change in the library, and it's that you have to set jm_Wire.twi_readFrom_wait and jm_Wire.twi_writeTo_wait to true manually. If that was by default, there would be zero impact on the API, the only change would be that in case of electrical error / glitch, the automagic random self hanging of the controller would be gone. And if people want multi master (given the flaws in the current library, I bet not many successfully ever did that, it can barely work with a single sensor) they can add delays by setting twi_readFrom_timeout and twi_writeTo_timeout !


If, after 9 FUCKING YEARS, Arduino developers still refuse to understand the problem and it's impact, I don't know what will wake them up. People tried asking, alerting, complaining, begging, and every time, the pull requests have been ignored, the bug have been closed, replaced with a new one so that newcomers have no idea this is such an old problem, and that countless others before them also complained, here and elsewhere about it.

So yes, I'm rude and if cmaglie and others don't like it, they know where they can shove it up !

@bperrybap
Copy link

Guys,
This seems to have entered into a technical discussion about implementation.
Would this better handled on the mailing list rather than through a git issue?


From a larger picture, all these fast and furious incremental code updates and suggestion for updates are, IMO, hacks. i.e. coding on the fly while sitting at a keyboard, with no overall design and not enough thought behind them.
I am not in favor of using this type of methodology, particularly in a piece of code that is bundled with the IDE and used by lots of users and 3rd party libraries.


@Aleev2007

I2C speed of 0 kbit / s ?

I have no idea what you describing.

In terms of inlining the i/o register updates vs using digitalWrite()
Why the desire for the change?
Particularly when what you are proposing can cause register corruption.

In this case, using digitalWrite() is better than what you have proposed.
You need to understand that digitalWrite() ensures atomic register updates. The code you provided does not.
Without atomic register updates, this code could corrupt a register if the same register were used by and updated by another piece of code (say a library) that is modifying the same register in an ISR.
Even that other code calls digitalWrite() in the ISR, your proposed code could still corrupt the register since it does not update the register atomically.
The digitalWrite() calls in the twi code are only being used in non time critical portions of the code (initialization & error recovery) that are not in the real time path for i/o so it will have absolutely no effect on normal i/o performance so there is no need to optimize that for speed.
Because of all this, it is better to stick with using the portable pin i/o routines that ensure that registers will not be corrupted.

@ermtl
Copy link

ermtl commented Jan 11, 2021

@greyltc :
Further speedup could be obtained by adding a byte loop counter. At first, and while the loop counter has not reached 0, the while statement should execute nearly as fast as the old version without timeout. The value for the byte loop counter (here 48 @ 16MHZ, should be tweaked) must be high enough to catch most of the fast peripherals without slowdown as the counter does not reach 0 and the time consuming calculation is never done. If the delay is longer and the counter elapses, it falls back to the more precise but time consuming current timeout routine. This would create a lower limit of a few tens of microseconds to the minimum timeout value, this should not be a problem since even a short timeout should be much longer than this. If the programmed timeout is shorter than the loop counter time, the routine will exit after the first test once the counter reaches 0.

uint8_t cnt=3*clockCyclesPerMicrosecond();  // Speedup loop counter. Should be long enough for most devices to respond, but not overflow the 8 bits value on fast processors
while(TWI_READY != twi_state){
  if (cnt) {cnt--;} else {
    if((twi_timeout_us > 0ul) && ((micros() - startMicros) > twi_timeout_us)) {
      twi_handleTimeout(twi_do_reset_on_timeout);
      return 0;
    }
  }
}

@Aleev2007
Copy link

Aleev2007 commented Jan 11, 2021

@bperrybap

Even that other code calls digitalWrite() in the ISR, your proposed code could still corrupt the register since it does not update the register atomically.

You're right. It will be correct.

#include < util / atomic.h >

// declare
volatile uint8_t *out;
// activate internal pullups for twi.
ATOMIC_BLOCK (ATOMIC_FORCEON){
  out = portOutputRegister (digitalPinToPort (SDA)); 
  *out |= (digitalPinToBitMask (SDA) | digitalPinToBitMask (SCL));
}

// declare
volatile uint8_t *out;
// deactivate internal pullups for twi.
ATOMIC_BLOCK (ATOMIC_FORCEON){
  out = portOutputRegister (digitalPinToPort (SDA));
  *out &= (~digitalPinToBitMask (SDA) & ~digitalPinToBitMask (SCL));
}

I2C speed of 0 kbit / s ?

I have no idea what you describing.

I2C-bus specification and user manual
4.2.1 I2C/SMBus compliancy

SMBus and I2C protocols are basically the same: A SMBus master is able to control I2C
devices and vice versa at the protocol level. The SMBus clock is defined from 10 kHz to
100 kHz while I2C can be 0 Hz to 100 kHz, 0 Hz to 400 kHz, 0 Hz to 1 MHz and
0 Hz to 3.4 MHz,
depending on the mode. This means that an I2C-bus running at less
than 10 kHz is not SMBus compliant since the SMBus devices may time-out.
NXP Semiconductors UM10204 User manual Rev. 6 — 4 April 2014 page 33 of 64
link

I suggest solving the problem in a similar way.
Specify in the specification for the library the minimum baud rate of 1 Hz.
And hard-code it without the ability to change by the user by means of the library.

@Aleev2007
Copy link

Aleev2007 commented Jan 11, 2021

@greyltc :
Further speedup could be obtained by adding a byte loop counter. At first, and while the loop counter has not reached 0, the while statement should execute nearly as fast as the old version without timeout. The value for the byte loop counter (here 48 @ 16MHZ, should be tweaked) must be high enough to catch most of the fast peripherals without slowdown as the counter does not reach 0 and the time consuming calculation is never done. If the delay is longer and the counter elapses, it falls back to the more precise but time consuming current timeout routine. This would create a lower limit of a few tens of microseconds to the minimum timeout value, this should not be a problem since even a short timeout should be much longer than this. If the programmed timeout is shorter than the loop counter time, the routine will exit after the first test once the counter reaches 0.

uint8_t cnt=3*clockCyclesPerMicrosecond();  // Speedup loop counter. Should be long enough for most devices to respond, but not overflow the 8 bits value on fast processors
while(TWI_READY != twi_state){
  if (cnt) {cnt--;} else {
    if((twi_timeout_us > 0ul) && ((micros() - startMicros) > twi_timeout_us)) {
      twi_handleTimeout(twi_do_reset_on_timeout);
      return 0;
    }
  }
}

Obviously, this (TWI_READY != twi_state) check is done to make sure the second device is ready to receive / transmit.
If the device is not ready, then it holds the line, and the loop body starts executing.
If the loop body takes much longer than the time it took to prepare the second device, it will slow down the transfer rate.

@Aleev2007
Copy link

It seems that not everyone understands what.
To reach 400kHz speed. The driver must have time to process the incoming or prepare the outgoing byte in 40 clock cycles (16 MHz / 400 kHz = 40). This also includes the time to check the timeout.
In the latest driver implementation, with blackjack and whores, this is far from the case. :))

@greyltc
Copy link
Contributor

greyltc commented Jan 11, 2021

I think further convoluting this issue with discussion of the throughput performance of the I2C timeout implementation we've used is not such a great idea. I understand it's only still open because there's still some documentation left to do (which of course is the boring part, so that lingers).

If anyone has a specific I2C performance problem that they'd like to raise, it seems best to file a new issue to cover that. Feel free to tag me there, especially if it's related to the newish timeout stuff and I'll do my best to chime in.

If anyone has a specific idea for making an improvement to the library, you could submit that as a PR and we can discuss that specific change there. Though in making your PR, consider that API changes need to present some pretty compelling evidence to have a chance at being merged, especially backwards incompatible ones (and rightly so). Maybe less formal ideas for improvement could be hashed out on the forums or something before they're ready for a PR?

I definitely think some of the specific performance improvement ideas that we've thrown around here show promise and I'll try to find some time to do some apples to apples oscilloscope informed benchmarking on them myself and submit them as a PR if I think the numbers I see warrant that. I'd tag a few of you on such a PR so we could review that together.

@Aleev2007
Copy link

@bperrybap
You can test the speed of this
https://github.com/Aleev2007/twi
no micros(), no uint32_t, fully compatible with the current version.

@bperrybap
Copy link

@Aleev2007
The approach does appear to offer faster state transition detection but it still would need some updates to be fully compatible.
While the code may be API compatible, it is not functionally compatible.
The timeout period will vary depending on the CPU clock speed and likely depending on how the optimizer optimizes the counter variables for the counter decrements and goto operations given the 8 bit counters may or may not be in a register.
The CPU clock speed affecting the timeout could be resolved with some compile time conditionals and macros to modify the delay & counter values depending the value of F_CPU.
It could be a simple as just modifying the timeout value down in twi_setTimeoutInMicros() to scale the timeout value used based on F_CPU to set the counter values to offer the requested timeout consistently.
I would think it would be simpler and easier to maintain if it were all handled in twi_setTimeoutInMicros()
That way begin() in Wire.cpp would call twi_setTimeoutInMicros() to set the default timeout and then there is no need to pre-calculate 8 bit counter values.
However it is done, I think it is important that the API and the underlying code ensure that the timeout for a given timeout value passed to twi_setTimeoutInMicros() not be affected by the CPU clock speed.
i.e. if 25000 is passed to twi_setTimeoutInMicros() the timeout is the same regardless of CPU clock speed.

The way to work around the potential optimization issues, would be to declare the 8 bit counters volatile, so that consistent code is always generated.

Also there is still a 32 bit comparison in the runtime loop of when timeouts are enabled.
32 bit comparisons are expensive on the AVR.
i.e. this line is doing a 32 bit comparison:
if (twi_timeout_us == 0) goto check;

To speed this up an additional 8 bit flag could be used to indicate that timeouts are enabled (set to true if twi_timeout_us is non zero.
or since twi_timeout_us is not used for anything other than this test after twi_setTimeoutInMicros()
just get rid of it and have simple 8 bit flag to indicate that timeouts are set. It would be set if twi_setTimeoutInMicros() is passed a non zero timeout.
That flag could be used in the loop and tested to see if the 3 counters need to be loaded/initialized.

@Aleev2007
Copy link

@bperrybap
You are absolutely right. I know that you need to add the twi_setTimeoutInMicros () function.
And at the expense of the line
if (twi_timeout_us == 0) goto check;
it is probably not needed at all, even if there is a flag.
I don't see any scenarios where you need to turn off the timeout.
You can just make it even bigger.
And without this line it will work faster.
two "if" operators to disable three "=" operators
I need to look at the generated code to determine which is better.
I'll try to do it tomorrow.

@Aleev2007
Copy link

@bperrybap
I think you can try to measure the speed.
https://github.com/Aleev2007/twi

timeout switch now uses only 10 MSU ticks ;-)

if (!twi_timeout_off_flag){                                                   		  |ticks
 19c:	80 91 02 01 	lds	r24, 0x0102	; 0x800102 <twi_timeout_off_flag>         | 2
 1a0:	81 11       	cpse	r24, r1                                             	  | 1  timeout off | 2 timeout on
 1a2:	09 c0       	rjmp	.+18     	; 0x1b6 <twi_stop+0x2c>                   | 2  goto check; | 0 timeout on     [off +5] 
    counter_1 = set_1;                                                        	          |
 1a4:	80 91 b7 01 	lds	r24, 0x01B7	; 0x8001b7 <set_1>                        | 2                     
 1a8:	89 83       	std	Y+1, r24	; 0x01                                    | 2 
    counter_2 = set_2;                                                                    |
 1aa:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <set_2>                        | 2
 1ae:	8a 83       	std	Y+2, r24	; 0x02                                    | 2
    counter_3 = set_3;                                                                    |
 1b0:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <__data_start>                 | 2
 1b4:	8b 83       	std	Y+3, r24	; 0x03                                    | 2     			      [on +16] 
  }                                                                                       | 
check:                                                                                    |                                        
  if (!(TWCR & _BV(TWSTO))) goto go;                                                      | 
 1b6:	80 91 bc 00 	lds	r24, 0x00BC	; 0x8000bc <__DATA_REGION_ORIGIN__+0x5c>  | 2
 1ba:	84 ff       	sbrs	r24, 4                                                    | 1  goto go;    | 2  wait
 1bc:	1c c0       	rjmp	.+56     	; 0x1f6 <twi_stop+0x6c>                   | 2  goto go;    
  if (twi_timeout_off_flag) goto check;                                                   |
 1be:	80 91 02 01 	lds	r24, 0x0102	; 0x800102 <twi_timeout_off_flag>         | 2
 1c2:	81 11       	cpse	r24, r1                                                   | 1  timeout off | 2 timeout on
 1c4:	f8 cf       	rjmp	.-16     	; 0x1b6 <twi_stop+0x2c>                   | 2  goto check; | 0 timeout on     [off +10]  
  if (!(--counter_1)) goto check;                                                         |
 1c6:	89 81       	ldd	r24, Y+1	; 0x01                                    | 2
 1c8:	81 50       	subi	r24, 0x01	; 1                                       | 1
 1ca:	89 83       	std	Y+1, r24	; 0x01                                    | 2
 1cc:	88 23       	and	r24, r24                                                  | 1
 1ce:	99 f3       	breq	.-26     	; 0x1b6 <twi_stop+0x2c>                   | 2  goto check; | 1 counter = 0    [on +28] [1 loop 16 ticks]
  if (!(--counter_2)) goto check;                                                         | 
 1d0:	8a 81       	ldd	r24, Y+2	; 0x02                                    | 2
 1d2:	81 50       	subi	r24, 0x01	; 1                                       | 1 
 1d4:	8a 83       	std	Y+2, r24	; 0x02                                    | 2
 1d6:	88 23       	and	r24, r24                                                  | 1
 1d8:	71 f3       	breq	.-36     	; 0x1b6 <twi_stop+0x2c>                   | 2  goto check; | 1 counter = 0      
  if (!(--counter_3)) goto check;                                                         |
 1da:	8b 81       	ldd	r24, Y+3	; 0x03                                    | 2
 1dc:	81 50       	subi	r24, 0x01	; 1                                       | 1
 1de:	8b 83       	std	Y+3, r24	; 0x03                                    | 2
 1e0:	88 23       	and	r24, r24                                                  | 1
 1e2:	49 f3       	breq	.-46     	; 0x1b6 <twi_stop+0x2c>                   | 2  goto check; | 1 timeout                 

@bperrybap
Copy link

Coming back to the orignal topic of eliminating the potential of a loop lockup in the Wire library,

Having looked at the 1.8.13 "timeout" code now in a bit more detail,
IMO, I think the current code was put together a bit too hurried and didn't get enough design input/review/thought or testing by technical users that understand things at this level as well as API portability issues before it was released in the 1.8.13 IDE.
As an example, one big issue was that as the code was released in 1.8.13, it extended the WIRE API in a non detectable way, with timeouts disabled, which means libraries/sketches that wanted timeouts had to hard code things for this specific library and would then break when used with the older versions of the Wire library, including on the very same platform.

I think it would be useful for everyone to step back and think about the loop lockup issue and look at the lockup/timeout problem with a fresh set of eyes.

IMO, the current implementation is a bit too heavy so it impacts i/o performance a bit too much.
The other thing I think that is worth REALLY thinking about is how to actually handle the loop timeouts.
IMO, 1 us timeout resolution is not needed and is total overkill.
Can 1 us even accurately ever truly be supported, particularly on slower processors?
I think for a configurable timeout, 1ms would be more than good enough.
This "timeout" is a failsafe to avoid a lockup loop in the code.
It is something that normally should not ever happen.
The length of the timeout in no way affects the i/o performance, so it isn't something that needs to be tuned.
However, how the loop timeout code is implemented does affect i/o performance.
In fact how much configurability of the timeout is really needed?
There is no spec for how long to wait for these events,
but there are certain minimums that are required for some of these, like when multiple masters are used, that are currently not enforced.
How will a typical Arduino user ever know what to pick? They won't.
The vast majority will just run with whatever the default timeout is.
If at some point a timeout is enabled by default, the vast majority of users will never even mess with setting timeout values.

Would a simple hard coded non configurable timeout, on the order of say 1/2 or 1 second, be good enough?
IMO, it likely would be good enough.

Since the WIRE library and WIRE API is somewhat a universal API implemented across nearly all platforms, running on processors at various speeds with various resources, the timeout functionality and any new WIRE API functions should carefully consider their impact on other platforms or slower processors.
For example, what about a slower processor or a software Wire implementation?
Those will likely never be able to support accurate 1us timeout resolution.
Also, other platforms may not have the loop lockup situation like the AVR WIRE library did.

In other words would it be better in the grand scheme of things to have a simpler implementation that could be adopted more widely? vs trying to make something that has such granular configurability?

I bring this up because if it becomes desirable to make some significant changes to how loop timeouts are handled to simplify things or offer better performance, now would be the time to do it, before the existing 1.8.13 "timeout" code and API starts to get documented and gets too many dependencies.


Anybody that is working on this type of code and making these types of changes should be capable of testing their stuff themselves first before they make it available to others.
Like said earlier I absolute HATE trial & error fly by the seat of the pants keyboard hacking and, IMO, that is what I see going on.

@Aleev2007 You have made significant changes to the low level twi.c code in places that are not related to this loop timeout. Also, nearly all the comments have been stripped out.
The code is now broken.

There is no need to be making these types of changes, particularly when 1) they are in places that will not affect the runtime i/o performance, and 2) breaks things.
The latest code I saw at 2020-01-12 13:00 CST commit ( 80cf430 ) on your github repository was not stable enough to run performance numbers.
After calling Wire.setWireTimeout() to set the default timeout, every read and write times out.
If you call Wire.setWireTimeout(0) after that to disable the timeouts, the code locks up on the next attempt to do a write

The twi.c code also has several warnings in it and I believe that one is indicating a calculation error due to a math/expression precedence order.

@ermtl
Copy link

ermtl commented Jan 12, 2021

@bperrybap Even if you have no use for a timeout faster than 500ms or 1 second, it's still useful for others.
Most sensors respond almost immediately, and a few hundred microsecond delay means it will timeout. Knowing about it ASAP can be important (to stop a motor, shut off a pneumatic valve, etc ...) and one second later could be too late. Also, some applications require compatibility with the SMBus specification (25msec). However, as you mention, a 1us granularity is not really useful. If a bigger granularity is used, the stored timeout value could be 16 bits, that's much faster to test and uses 2 RAM bytes instead of 4. With the fast test, the need for a separate boolean value also goes away, freeing another RAM byte.
As an example, if the granularity is 64uS, a 16 bit value would allow a timeout between 64us and 4.19 seconds + unlimited (no timeout) if the value is 0.
The changes proposed by @greyltc and myself practically remove the slowdown for fast responding devices without requiring extensive rewrite of the existing code.
Adding this timeout feature have been so laborious and took so long, doing it all over again and changing the API again instead of fixing the speed issue does not seem being the best way forward.

@bperrybap
Copy link

For me personally, none of this really matters as I don't have anything that requires and demands maximum i/o throughput or the ability to set very short state change timeouts.

I do think of utmost critical importance is that there needs to be some kind of macro that user code can test at compile time so the user code call tell if this new loop abort/timeout capability exists, especially if it is not enabled by default which means the user code do something to enable the timeouts.
The WIRE library code in 1.8.13 failed to do this.

Whenever you guys have something that works, I'll be happy to try it.

In terms of performance, as of right now, from what I've seen, the approach that @Aleev2007 has gone down or something similar to it (i.e. not using micros()), can reduce state change latency and/or state change loop timeouts, especially when loop timeouts are enabled, more than an approach that uses micros()
It also demonstrates than it is still possible to be compatible with the current 1.8.13 setWireTimeout() API without using micros() which is expensive due to all the 32 bit values and runtime math involved inside the micros() function.

@Aleev2007
Copy link

@bperrybap

I probably got tired yesterday, from many attempts, to force the compiler to make code in which I can calculate the delay in the loop.
And that's why it broke the logic of twi .c work. Sorry. :)

I decided to go over from the beginning and have now made minimal changes to the old version 1.8.12 of the library.
I don’t have the test equipment right now, but everything is going without errors.
https://github.com/Aleev2007/twi
If it's not difficult for you, I would like to know how much slower it has become in comparison with the unmodified version 1.8.12.
If it's acceptable. So I have ideas on how to make it compatible with 1.8.13 and fast.

And I haven't looked at ISR yet, I also have ideas about it.
Last time I was mistaken in saying that the TWI module spends 40 cycles to transfer one byte. I forgot that the connection is serial and needs to be multiplied by 9. So it is quite possible to make everything fly even at 400 Hz.

@bperrybap
Copy link

@Aleev2007
The code is MUCH slower. (obviously broken)
1.8.12 is 77us writes, 118us reads
this code is 189,279us writes and 126,041us reads
After doing reads, writes change to 63,028us
Obviously the code is not working as intended.


There is too much keyboard hacking going on for this project by multiple parties.
I believe in a traditional methodical design & development process.
I'm done testing code that isn't handled that way.

matthijskooijman added a commit to matthijskooijman/ArduinoCore-avr that referenced this issue Apr 23, 2021
In commit deea929 (Introduce non compulsory Wire timeout), some
timeout-related functions were added. To allow writing portable
sketches, it is important for those sketch to know whether they are
using a Wire library version that is new enough to offer these functions
without having to rely on version number checking (since other Arduino
cores have their own versioning schemes).

This adds a WIRE_HAS_TIMEOUT macro, similar to the existing WIRE_HAS_END
macro, to facilitate that.

This relates to arduino#42.
@Wuntenn
Copy link

Wuntenn commented May 14, 2021

I had the same error, I'm a newb too and haven't used cpp for a while.

I'm the kinda newb that hits every bug and had a similar error when I used interrupts a few weeks ago. I found out somewhere that if you place a lot of code in the vector code blocks that it can cause the code to hang.

I don't remember where I came across the advice but the recommended approach was to set a flag only in the vector and to use the loop to poll for the flag and carry out the necessary changes.

Reading through the datasheet for the UNO, it says that the TWI is an interrupt based system. I used this approach and it overcame my issue. Thought it might be useful for someone here! 🤷‍♂️

@ghost
Copy link

ghost commented Oct 23, 2021

How about int16 for the timeout, default +25ms (and enabled, use millis()), use < 0 for advanced use i.e. micros() resolution?
The implementation has skewed a bit away from being 'Arduino', more niche. Keep it simple and consistent for the beginners.

@mattborja
Copy link

mattborja commented Apr 13, 2022

I just need a way to skip/abandon a line of code hanging after a short period of time...

I honestly don't have this fully spec-ed out yet, but maybe we could provide the community with an idempotent interrupt implementation pattern to externally manage a hanging Wire.endTransmission(). It would involve the use of a LM555 timer to trigger up to 2 interrupts after a short period of time on startup.

The execution surrounding the interrupt(s) may include:

  • T1

    • Attempting a blocking Wire check (i.e. Wire.beginTransmission(DEV_ADDR); Wire.endTransmission();)
    • Changing a volatile bool WIRE_READY to true (if reached)
  • T2

    • Invoke detachInterrupt() to effectively "kill" interrupt "thread" T1
    • Disabling the LM555 timer to prevent subsequent interrupts (or also use detachInterrupt() here)
  • Main

    • Evaluate IC connection via WIRE_READY state

EDIT: The following also seems to work using https://github.com/rambo/I2C

#define DEV_ADDR 0x50
#define DEV_TIMEOUT_MS 3000

void WireCheck() {
  Serial.print("Checking wire status...");

  I2c.timeOut(DEV_TIMEOUT_MS);

  I2c._start();
  I2c._sendAddress(DEV_ADDR);
  byte wireStatus = I2c._stop();

  switch (wireStatus) {
    case 0:
      Serial.println("WIRE_OK");
      return;
      
    case 1:
      Serial.println("WIRE_TIMEOUT");
      return;

    case 0xff:
      Serial.println("WIRE_ERR");
      return;
  }
}

Might have a bug in here, but it is at least timing something out...

@matthijskooijman
Copy link
Collaborator

@mattborja, this timeout feature has actually been implemented (two years ago) in arduino/Arduino#107, but it is not enabled by default. Did you try turning it on using https://www.arduino.cc/reference/en/language/functions/communication/wire/setwiretimeout/ ?

Since we seem to have forgotten closing this issue, I'll do that now. For future reference, arduino/Arduino#107 implemented the feature, arduino/Arduino#362 is still pending to add feature detection macros like WIRE_HAS_TIMEOUT, and arduino/Arduino#363 is a draft to enable timeouts by default in the future.

@mattborja
Copy link

mattborja commented Apr 13, 2022

@matthijskooijman I hadn't come across that yet, but will give it a shot. I do prefer native libraries whenever possible.

Thanks!

@mattborja
Copy link

@matthijskooijman Yeah, that seems to be working great.

Aside: Extra "s" in heading Parameters here: https://www.arduino.cc/reference/en/language/functions/communication/wire/endtransmission/#_parameterss

Thanks again!

@per1234
Copy link
Contributor

per1234 commented Apr 13, 2022

Extra "s" in heading Parameters here:

Thanks for reporting that @mattborja. I see the "syntaxx" section heading suffers from the same error.

Please submit a correction via a pull request here:
https://github.com/arduino/reference-en/blob/master/Language/Functions/Communication/Wire/endTransmission.adoc

@mattborja
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests