Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make w25n01g FLASH driver non-blocking for SPI #13555

Merged
merged 1 commit into from Apr 21, 2024

Conversation

SteveCEvans
Copy link
Member

@SteveCEvans SteveCEvans commented Apr 19, 2024

Previously writes to the W25N01G FLASH device would block for ~160us with an H743 processor due to the code waiting for a sequence of transfers to complete. This is now reduced to under 4us and the FLASH writes then proceed in the background.

Note that QSPI operation is as before and will still be badly non-deterministic.

Impacted FCs are:

ALIENFLIGHTNGF7
AOCODARCH7DUAL
ARESF7
BAYCK_DOLPHIN
BAYCK_DOLPHIN_MINI
CLRACINGF7
FREEAXISF4
GREATMOUNTAINF435
IFLIGHT_BLITZ_F722
IFLIGHT_BLITZ_F7_PRO
IFLIGHT_F405_TWING
IFLIGHT_F722_TWING
IFLIGHT_F745_AIO
IFLIGHT_H7_TWING
JHEF405PRO
JHEH743_HD
KAKUTEH7MINI
KAKUTEH7V2
MAMBAH743
MAMBAH743_2022B
NEUTRONRCF435SE
NEUTRONRCF745
NEUTRONRCF7AIO
NEUTRONRCH743AIO
NEUTRONRCH7BT
SDMODELH7
SPRACINGH7EXTREME
SPRACINGH7NANO
SPRACINGH7ZERO
TAKERF722SE
TMOTORH743
TMOTORVELOXF7SE
VEKTORF722
ZEEZF7

Tested on a MAMBAH743.

Copy link

Do you want to test this code? You can flash it directly from Betaflight Configurator:

  • Simply put #13555 (this pull request number) in the Select commit field of the Configurator firmware flasher tab (you need to Enable expert mode, Show release candidates and Development).

WARNING: It may be unstable. Use only for testing!

@haslinghuis haslinghuis added this to the 4.5 milestone Apr 19, 2024
Copy link
Member

@haslinghuis haslinghuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@ledvinap ledvinap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well ..

this will be trivial with RTOS

Performance benefit is obvious. But this can introduce very subtle bugs that will take ages to fix. It may be worth spending some time to make it cleaner.

One possible improvement is to modify segments to allow inline data storage, It will remove need for most static variables and help in other code too.

typedef struct busSegment_s {
    union {
        struct {
            // Transmit buffer
            union { 
                uint8_t *txData;
                uint8_t txInline[4]; 
            };
            // Receive buffer, or in the case of the final segment to
            union {
                 uint8_t *rxData;
                 uint8_t rxInline[4];  
            };
        } buffers;
        struct {
            // Link to the device associated with the next transfer
            const extDevice_t *dev;
            // Segments to process in the next transfer.
            volatile struct busSegment_s *segments;
        } link;
    } u;
    uint16_t len;
    uint16_t segmentFlags;
    busStatus_e (*callback)(uint32_t arg);
} busSegment_t;

#define SEGMENTF_NRGATECS 0x01
#define SEGMENTF_TXINLINE   0x02
#define SEGMENTF_RXINLINE   0x04

src/main/drivers/flash_w25n01g.c Show resolved Hide resolved
{.u.buffers = {readStatus, readyStatus}, sizeof(readStatus), true, w25n01g_callbackReady},
{.u.buffers = {writeEnable, NULL}, sizeof(writeEnable), true, w25n01g_callbackWriteEnable},
{.u.buffers = {progRandomProgDataLoad, NULL}, sizeof(progRandomProgDataLoad), false, NULL},
{.u.buffers = {NULL, NULL}, 0, true, NULL},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, comment what will end up in given entry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do, but the callback shows how the read data is processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that here is only {NULL, NULL}, 0, true, NULL}. Something will be filled in, but in different place (lines with command referenced are fine).
And when reading actual sequence construction code, only indexes are used,
So there is no place to review full sequence that will be executed.

src/main/drivers/flash_w25n01g.c Show resolved Hide resolved
src/main/drivers/flash_w25n01g.c Show resolved Hide resolved
src/main/drivers/flash_w25n01g.c Show resolved Hide resolved
@SteveCEvans
Copy link
Member Author

Well ..

this will be trivial with RTOS

Performance benefit is obvious. But this can introduce very subtle bugs that will take ages to fix. It may be worth spending some time to make it cleaner.

One possible improvement is to modify segments to allow inline data storage, It will remove need for most static variables and help in other code too.

typedef struct busSegment_s {
    union {
        struct {
            // Transmit buffer
            union { 
                uint8_t *txData;
                uint8_t txInline[4]; 
            };
            // Receive buffer, or in the case of the final segment to
            union {
                 uint8_t *rxData;
                 uint8_t rxInline[4];  
            };
        } buffers;
        struct {
            // Link to the device associated with the next transfer
            const extDevice_t *dev;
            // Segments to process in the next transfer.
            volatile struct busSegment_s *segments;
        } link;
    } u;
    uint16_t len;
    uint16_t segmentFlags;
    busStatus_e (*callback)(uint32_t arg);
} busSegment_t;

#define SEGMENTF_NRGATECS 0x01
#define SEGMENTF_TXINLINE   0x02
#define SEGMENTF_RXINLINE   0x04

Well, it’s not an RTOS!

I’d rather have clearly named static variables than bring them as anonymous data inside a static segment structure. This keeps the segments as clean and simple as possible.

@SteveCEvans
Copy link
Member Author

@ledvinap The code should be much more readable. I also considered the code paths, and proved them using a logic analyser and the initial flash is now removed simplifying the code further.

Copy link
Contributor

@ledvinap ledvinap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks. Now it is quite easy to follow.

BTW: flashfs write code is expecting pageProgramContinue to handle multiple buffers (at least in sync flush). But from quick look, it isn't flashfs main problem ...


// Call transfer completion callback
if (fdevice->callback) {
fdevice->callback(fdevice->callbackArg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io/flashfs.c is not interrupt-safe (mainly flashfsAdvanceTailInBuffer). Eventually, it should be fixed (IMO caching bufferTail in local variable before testing will be enough)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only update bufferTail in flashfsAdvanceTailInBuffer (except following an erase) so I think this is safe. Can't think of a failure case, but perhaps I'm not thinking hard enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is with reading it (flashfsGetDirtyDataBuffers is affected too).


bufferHead = 5;
bufferTail = 9;
FLASHFS_WRITE_BUFFER_SIZE  = 10;

static uint32_t flashfsTransmitBufferUsed(void)
{
    if (bufferHead >= bufferTail)
        return bufferHead - bufferTail;
    // ISR update bufferTail, incrementing it by 1 bytes and wrapping around, bufferTail == 0
    return FLASHFS_WRITE_BUFFER_SIZE - bufferTail + bufferHead;
    // 15 is returned 
}

In this case, fix is trivial. Function may return slightly old value, but that is expected:

static uint32_t flashfsTransmitBufferUsed(void)
{
    unsigned tail = bufferTail;
    if (bufferHead >= tail)
        return bufferHead - tail;

    return FLASHFS_WRITE_BUFFER_SIZE - tail + bufferHead;
}

The problem is not that obvious, because flashfsWriteByte is even more buggy and will silently wraparound buffer. And blackbox tries to prevent overflowing buffers, so this is not hit too often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have learned to take these problems very seriously - bug happening once in a century (per device) is much worse than the one happening every day, ale least if you manage thousands of devices)

uint32_t columnAddress;

if (bufferDirty) {
columnAddress = W25N01G_LINEAR_TO_COLUMN(programLoadAddress);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can reduce variable scope here)

Suggested change
columnAddress = W25N01G_LINEAR_TO_COLUMN(programLoadAddress);
uint32_t columnAddress = W25N01G_LINEAR_TO_COLUMN(programLoadAddress);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnAddress is now used at line 703 and 712 so I expanded the scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((But that one and can have it's own scope. Sorry, this is too minor problem, don't worry about it))

// Set the address and buffer details for the random data load
progRandomProgDataLoad[1] = (columnAddress >> 8) & 0xff;
progRandomProgDataLoad[2] = columnAddress & 0xff;
segmentsRandomDataLoad[3].u.buffers.txData = (uint8_t *)buffers[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: both u and buffers can be anonymous in segment structure, segmentsRandomDataLoad[3].txData = will mask internal structure layout. Only disadvantage is that overlap with link is not that obvious.
But that may be my laziness when typing

src/main/drivers/flash_w25n01g.c Show resolved Hide resolved
programStartAddress = programLoadAddress;
} else {
// Callback on completion of data load
programSegment[3].callback = w25n01g_callbackWriteComplete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd play it safe and clear link explicitly. BUS_ERROR (or future refactoring) may prevent link clearing code to work, and consequences can be ugly.

        spiLinkSegments(NULL, programSegment, NULL);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clearing is done automatically at

nextSegment->u.link.dev = NULL;

This is needed because the linking is the mechanism for queuing requests and such linkage between segments from different drivers MUST be broken like this. This can't ever be refactored away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Caller may be required to clear last link of inserted chain. But as cycle prevention, ISR clearing is good idea.
It will get really interesting if one driver ends with BUS_ERROR - linked chains are not terminated and unlinked.)

IMO it is slightly better to break link explicitly here - normal links are created by driver chaining code, that code is responsible to undo it. But here the link was created explicitly, it is safer to undo it.

@@ -744,6 +744,19 @@ uint8_t spiGetExtDeviceCount(const extDevice_t *dev)
return dev->bus->deviceCount;
}

// Link two segment lists
// Note that there is no need to unlink segment lists as this is done automatically as they are processed
void spiLinkSegments(const extDevice_t *dev, busSegment_t *firstSegment, busSegment_t *secondSegment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change argument order to object, params... This function is not (directly) working on device, but is appending something to segment chain (*firstSegment)

Suggested change
void spiLinkSegments(const extDevice_t *dev, busSegment_t *firstSegment, busSegment_t *secondSegment)
void spiLinkSegments(busSegment_t *firstSegment, const extDevice_t *dev, busSegment_t *secondSegment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way as the dev argument is always passed first in all other routines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the difference is that other functions have dev as object they are working on. It becomes obvious when you try passing NULL to terminate segments.

@@ -744,6 +744,19 @@ uint8_t spiGetExtDeviceCount(const extDevice_t *dev)
return dev->bus->deviceCount;
}

// Link two segment lists
// Note that there is no need to unlink segment lists as this is done automatically as they are processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for BUS_ABORT ... see comment below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Luckily nothing ever uses that case. I'll change the code to skip forward to the end of the segment list rather than just return. Thanks for this!

@ledvinap
Copy link
Contributor

(@SteveCEvans : can you avoid force-pushing commits when not necessary, please? Incremental changes in commits are easier to follow)

@nerdCopter

This comment was marked as off-topic.

@nerdCopter nerdCopter merged commit f4d6a2c into betaflight:master Apr 21, 2024
23 checks passed
@nerdCopter
Copy link
Member

nerdCopter commented Apr 21, 2024

OMG i DID NOT MEAN TO PRESS MERGE, SO SORRY!!!!

@SteveCEvans
Copy link
Member Author

Great work, thanks. Now it is quite easy to follow.

BTW: flashfs write code is expecting pageProgramContinue to handle multiple buffers (at least in sync flush). But from quick look, it isn't flashfs main problem ...

The m25p16_pageProgramContinue() version does handle multiple buffers. I could certainly stitch on the extras to w25n01g_pageProgramContinue() too. The original code didn't so I thought it best to constrain the scope of the changes and come back to this.

@SteveCEvans
Copy link
Member Author

(@SteveCEvans : can you avoid force-pushing commits when not necessary, please? Incremental changes in commits are easier to follow)

Sorry, just trying to keep it tidy, but you're right that's best left until the merge into master.

@SteveCEvans
Copy link
Member Author

OMG i DID NOT MEAN TO PRESS MERGE, SO SORRY!!!!

Ooooops. How do we rewind that so we can take this PR to its conclusion properly?

@haslinghuis
Copy link
Member

Either we revert or continue in subsequent PR. Had tested the current code and it works.

@ledvinap
Copy link
Contributor

Just live with it. If necessary, we can fasttrack bugfix PR.

nerdCopter added a commit to nerdCopter/betaflight that referenced this pull request Apr 21, 2024
@nerdCopter
Copy link
Member

i opened a revert PR, if you want. i again, deeply apologize
#13560

@SteveCEvans
Copy link
Member Author

Great work, thanks. Now it is quite easy to follow.
BTW: flashfs write code is expecting pageProgramContinue to handle multiple buffers (at least in sync flush). But from quick look, it isn't flashfs main problem ...

The m25p16_pageProgramContinue() version does handle multiple buffers. I could certainly stitch on the extras to w25n01g_pageProgramContinue() too. The original code didn't so I thought it best to constrain the scope of the changes and come back to this.

I'm currently looking at this.

And I've made other commits since the merge.

@SteveCEvans
Copy link
Member Author

I'll create a follow on PR then.

@SteveCEvans
Copy link
Member Author

I found that writing multiple buffers at once resulted in the FLASH not keeping up and gaps appearing at 8kHz logging rate.

@SteveCEvans
Copy link
Member Author

SteveCEvans commented Apr 21, 2024

i opened a revert PR, if you want. i again, deeply apologize #13560

No worries. #13562 tidies things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: COMPLETED
Development

Successfully merging this pull request may close these issues.

None yet

5 participants