Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C Slave race condition leads to handler being out of sync #17

Open
agdl opened this issue Feb 8, 2017 · 4 comments
Open

I2C Slave race condition leads to handler being out of sync #17

agdl opened this issue Feb 8, 2017 · 4 comments

Comments

@agdl
Copy link
Member

agdl commented Feb 8, 2017

From @shahokun on August 18, 2015 20:42

We have an Arduino Due acting as the slave device on an I2C bus. Rarely, we will see the Arduino get "stuck" and be unresponsive. When we attached a debugger to the running processor, we saw that it was constantly getting bombarded with I2C interrupts (WIRE_ISR_HANDLER). The status flag indicated that the Arduino was in SLAVE_SEND mode and had already transmitted data (TWI TXRDY bit = 0). However, the I2C transmission would never complete, the I2C clock was pulled low, and the processor was stuck in an infinite loop of servicing an interrupt but not doing anything (i.e. none of the if blocks were entered).

The confounding factor was that the SVREAD bit of the TWI status register was 0, indicating a Master Write. If this were the case, then the correct status should be SLAVE_RECV. Working backwords from this, the likely culprit would result from a race condition at the following lines (339-343) in TwoWire::onService()

// Transfer completed
TWI_EnableIt(twi, TWI_SR_SVACC);
TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
    | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IER_TXCOMP);
status = SLAVE_IDLE;

Assume a Master Read operation just completed and the Arduino has status = SLAVE_SEND. The Arduino is wrapping up the I2C transaction and enables the SVACC interrupt (TWI_EnableIt(twi, TWI_SR_SVACC);). After this happens, but before the status = SLAVE_IDLE; line is executed, a new I2C transaction comes in and triggers an interrupt. This time, it is a Master Write operation. It jumps into the handler, but because status has not been set to SLAVE_IDLE, it does not execute the first if block which would properly set status to SLAVE_RECV. Moreover, all of our Master Write transactions are two or more bytes, so the TWI would stretch the clock indefinitely because it never reads the first byte due to status being incorrect.

This is a bit theoretical because we will need to do extended testing to see if the problem goes away. The proposed solution is to reverse the order of the cleanup steps:

// Transfer completed
status = SLAVE_IDLE;
TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
    | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IDR_TXCOMP);
TWI_EnableIt(twi, TWI_IER_SVACC);

This keeps the status flag protected against synchronization issues. It also reverses the order of the DisableIt()/EnableIt() at lines 301-303 so that you never have both groups of interrupts enabled at once. I would change the flags to use TWI_IDR_* for all DisableIt() and TWI_IER_* for all EnableIt() operations, for posterity; however, TWI_IDR_x, TWI_IER_x, and TWI_SR_x are the same value for all x, so this does not lead to any incorrect behavior.

Could anyone provide advice on whether these sorts of nested interrupts could occur in the TWI peripheral? I wasn't able to find explicit information about how the NVIC on the Due interacts with the TWI interrupt registers. I assume that since specific TWI flags are explicitly enabled and disabled that they are treated separately. In the proposed scenario, the Due would be in the handler for a TXCOMP interrupt, then get interrupted by a SVACC interrupt.

Copied from original issue: arduino/Arduino#3699

@agdl
Copy link
Member Author

agdl commented Feb 8, 2017

From @stickbreaker on August 19, 2015 1:47

I am not familiar with the DUE, but Shouldn't the I2C HW be stretching SCL, This should prohibit any new transaction before the software has completed handling the current transaction?

Chuck

@agdl
Copy link
Member Author

agdl commented Feb 8, 2017

From @shahokun on August 19, 2015 3:47

Clock stretching in general is not necessarily an issue. In this case, however, the HW is stretching the clock for a Master Read operation, but the software incorrectly thinks it's in a Slave Send/Master Write operation and does not respond correctly. The clock is stretched indefinitely and the transaction never completes.

@agdl
Copy link
Member Author

agdl commented Feb 8, 2017

From @stickbreaker on August 19, 2015 4:29

Ok, I think I see what you have explained. You getting an slave read interrupt while processing a slave write command.
It sounds reasonable to reconfigure before switching modes.
Chuck.

@agdl
Copy link
Member Author

agdl commented Feb 8, 2017

From @shahokun on August 20, 2015 20:36

Testing showed that the proposed solution above did not fix the issue. After more debugging, it seems that the problem is simpler: the Master continues onto the next I2C transaction before the Slave (Arduino) has a chance to respond to the end of the first I2C transaction. This may be because the Slave is busy in a disable interrupts block, or some other rare sequence of events. The race condition goes something like this:

  1. The Master sends a Write operation. The Slave has status = SLAVE_RECV.
  2. The I2C transaction completes. The Slave gets an interrupt for the I2C completion, but is busy elsewhere and does not service it for a bit.
  3. The Master continues on to the next I2C transaction. This time, it is a Read operation. The Slave still has status = SLAVE_RECV (Master Write), but the Arduino TWI hardware thinks it is handling a Master Read operation, and the TWI status register reflects as much.
  4. The Slave services the I2C interrupt. Now when it reads the TWI status register (uint32_t sr = TWI_GetStatus(twi);), it indicates the transaction is not complete, and is in Master Read mode.
  5. The Arduino TWI hardware stretches the clock until the slave responds, i.e. calls TWI_WriteByte(). However, since the Slave is in status = SLAVE_RECV, this never happens. Meanwhile, the I2C transaction never completes, freeing the Slave to switch modes, because the clock is stretched.

The root cause of this issue is that the Wire library assumes that the status register follows some predictable sequence of events. However, because it is interrupt driven, the hardware status register can change state before the software has a chance to poll the status. Essentially, the Wire driver should not assume that just because it is currently in the SLAVE_RECV state that it will stay that way. I would propose changing the Slave driver in Wire.cpp to something like this:

void TwoWire::onService(void) {
// Retrieve interrupt status
uint32_t sr = TWI_GetStatus(twi);

if (status == SLAVE_IDLE && TWI_STATUS_SVACC(sr)) {
    TWI_DisableIt(twi, TWI_IDR_SVACC);
    TWI_EnableIt(twi, TWI_IER_RXRDY | TWI_IER_GACC | TWI_IER_NACK
            | TWI_IER_EOSACC | TWI_IER_SCL_WS | TWI_IER_TXCOMP);

    srvBufferLength = 0;
    srvBufferIndex = 0;

    // Detect if we should go into RECV or SEND status
    // SVREAD==1 means *master* reading -> SLAVE_SEND
    if (!TWI_STATUS_SVREAD(sr)) {
        status = SLAVE_RECV;
    } else {
        status = SLAVE_SEND;

        // Alert calling program to generate a response ASAP
        if (onRequestCallback)
            onRequestCallback();
        else
            // create a default 1-byte response
            write((uint8_t) 0);
    }
}

if (status != SLAVE_IDLE) {
    if (TWI_STATUS_TXCOMP(sr) && TWI_STATUS_EOSACC(sr)) {
        if (status == SLAVE_RECV && onReceiveCallback) {
            // Copy data into rxBuffer
            // (allows to receive another packet while the
            // user program reads actual data)
            for (uint8_t i = 0; i < srvBufferLength; ++i)
                rxBuffer[i] = srvBuffer[i];
            rxBufferIndex = 0;
            rxBufferLength = srvBufferLength;

            // Alert calling program
            onReceiveCallback( rxBufferLength);
        }


        // Transfer completed
        //TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
        //        | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IER_TXCOMP);
        //TWI_EnableIt(twi, TWI_SR_SVACC);
        //status = SLAVE_IDLE;

        // EDIT: Reversed order of operations to prevent nested
        // interrupt triggering.
        // This did not appear to fix the issue so may be unneeded.
        status = SLAVE_IDLE;
        // Transfer completed
        TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
                | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IER_TXCOMP);
        TWI_EnableIt(twi, TWI_SR_SVACC);

    }
}

if (status == SLAVE_RECV) {
    if (TWI_STATUS_RXRDY(sr)) {
        if (srvBufferLength < BUFFER_LENGTH)
            srvBuffer[srvBufferLength++] = TWI_ReadByte(twi);
    }
    // EDIT: Write a dummy byte if the TWI peripheral ends up in a
    // different mode than you expected, so it doesn't get stuck indefinitely.
    else if (TWI_STATUS_SVREAD(sr) == 1 && TWI_STATUS_TXRDY(sr) && !TWI_STATUS_NACK(sr)) {
        TWI_WriteByte(twi, 0);
    }
}

if (status == SLAVE_SEND) {
    if (TWI_STATUS_TXRDY(sr) && !TWI_STATUS_NACK(sr)) {
        uint8_t c = 'x';
        if (srvBufferIndex < srvBufferLength)
            c = srvBuffer[srvBufferIndex++];
        TWI_WriteByte(twi, c);
    }
    // EDIT: Read a dummy byte if the TWI peripheral ends up in a
    // different mode than you expected, so it doesn't get stuck indefinitely.
    else if (TWI_STATUS_SVREAD(sr) == 0 && TWI_STATUS_RXRDY(sr)) {
        TWI_ReadByte(twi);
    }
}
}

This may not be the best solution depending on how you want to handle I2C error conditions. However, I think that this issue should be at least be warned about in the Wire documentation. Yes, it results because of a combination of the Master sending I2C messages too often, and the Slave not handling I2C interrupts quickly enough, but the software driver should not be able to get stuck in this unsynchronized state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants