Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Combining SPI and DMA #861

Open
jpf91 opened this issue Mar 23, 2024 · 7 comments
Open

[feature request] Combining SPI and DMA #861

jpf91 opened this issue Mar 23, 2024 · 7 comments
Labels
enhancement New feature or request HW hardware-related stale

Comments

@jpf91
Copy link
Contributor

jpf91 commented Mar 23, 2024

Is your feature request related to a problem? Please describe.

I'm currently trying to find the best performing SPI SD card implementation for the neorv32 controller. Currently FIFO and IRQ seems the way to go. However, SD card access always transfers complete sectors, so at least 512 Bytes. For that, it would be nice to use DMA.

However:

  1. Half-Duplex sending via DMA should be possible right now. Configure a source, configure the destination to the TX FIFO and use the SPI_CTRL_TX_NHALF as trigger.
  2. Half-duplex receiving is not possible: DMA could be configured with source SPI DATA register, target memory with increment and SPI_CTRL_IRQ_RX_AVAIL. However, read transfers still need to be initiated by writing to the data register, so this requires still writing to that register without the DMA.
  3. Full duplex read/writeis also not possible and probably wont work with less than two DMA channels.

Describe the solution you'd like
Describe alternatives you've considered

I'm not sure what is the best solution. For 2) alone, a SPI modifier, which just triggers another transfer whenever the RX FIFO is being read, could be used. Then you would have to read 1 element less using DMA, deconfigure this mode and then read the final element manually... Or does the wishbone bus have some way to signal that something is "the last" partial transfer of a larger transfer? Then this information could be used to avoid triggering another transfer for the last element.

For 3 I think 2 DMA channels will be necessary. With 2 DMA channels, usecase 2) is also covered. However, if only reading is required, the additional DMA channel causes more bus transfers. A partial solution would be to implement 1:N transfers from a register in the DMA controller instead of memory, so at least the read request is not required. Still, the extra receive-only SPI mode above would need less bus transfers than any solution with 2 DMA channels.

I wonder what other microcontrollers are doing?

Additional context
For the SD-card usecase, half duplex support is fine: After sending a command, the SD card will read 0xFF for an unspecified number of times. One it has prepared the reponse, it will return 0xFE and after that, all data. As the total transfer length is variable, this can't be done in one single SPI request anyway.

@stnolting stnolting added enhancement New feature or request HW hardware-related labels Mar 26, 2024
@stnolting
Copy link
Owner

So you want to read whole sector from SD card with minimal CPU interaction, right?

With the current set of features this is what I would (quite naively) do:

  • configure a large SPI FIFO that can hold all the required write commands, address data and 512 dummy bytes for reading the actual sector
  • write the command + address + 512 dummy bytes to the FIFO (yes, this is done by the CPU, but as we do not need to check any status here this should be quite fast)
  • the SPI module sends all the configured data and reads 512 bytes from the SD card
  • at the end of the transmission use the SPI interrupt to trigger the DMA
  • the DMA moves all the data from SPI RX FIFO to some RAM location for further processing
  • the DMA informs the main program via an interrupt when the transfer is done

I'm not sure what is the best solution. For 2) alone, a SPI modifier, which just triggers another transfer whenever the RX FIFO is being read, could be used. Then you would have to read 1 element less using DMA, deconfigure this mode and then read the final element manually

Hmm, interesting idea... But I am not sure how this could be implemented (with some reasonable amount of logic 😅). Any ideas?

Or does the wishbone bus have some way to signal that something is "the last" partial transfer of a larger transfer? Then this information could be used to avoid triggering another transfer for the last element.

I'm not sure, maybe there are some special burst modes available for this. However, please keep in mind that the internal bus is not Wishbone (even if we have stolen used a lot of it).

For 3 I think 2 DMA channels will be necessary.

Having several channels would be nice. Basically, you could just instantiate another DMA controller and use that for the second channel.

Another option I was thinking about when writing the DMA was to use a "descriptor based" DMA. Basically, you could put all the configurations as "structs" into RAM, chain them via some pointer and the DMA will happily execute them one after another until reaching the end of the chain. 🤔

For the SD-card usecase, half duplex support is fine: After sending a command, the SD card will read 0xFF for an unspecified number of times. One it has prepared the reponse, it will return 0xFE and after that, all data. As the total transfer length is variable, this can't be done in one single SPI request anyway.

Oh, I was assuming a fixed latency... Ok, this makes interfacing a bit more complex..

How about a dedicated SD-card controller? 🤔

@jpf91
Copy link
Contributor Author

jpf91 commented Mar 28, 2024

With the current set of features this is what I would (quite naively) do:

Good idea, thanks! I'd like to keep the SPI FIFO a bit smaller. Right now I just keep all the transfer logic in the interrupt (this is a FreeRTOS project, so switching to the tasks might have some overhead). If this turns out to be a performance issue, I will try your suggestion. (I unfortunately don't have the hardware here right now, so I can't test this right now).

Hmm, interesting idea... But I am not sure how this could be implemented (with some reasonable amount of logic 😅). Any ideas?

I guess a single configuration bit for "RX Poll Mode" could be used. This would indicate that the last value in the TX FIFO is resubmitted again when the RX register is read. However, thinking about this some more, this might be over-engineering.

I'm not sure, maybe there are some special burst modes available for this. However, please keep in mind that the internal bus is not Wishbone (even if we have stolen used a lot of it).

I think I should do some research how other controllers handle this situation 🤔

Another option I was thinking about when writing the DMA was to use a "descriptor based" DMA. Basically, you could put all the configurations as "structs" into RAM, chain them via some pointer and the DMA will happily execute them one after another until reaching the end of the chain. 🤔

That sound nice, although sometimes I guess independent triggers are also nice.

Some context: We're hosting an FPGA development lab course at my university. In this course, students design an I2S peripheral to interface external DACs for audio playback, integrated with the NeoRV32 SoC. So far, our demo software only generates a sine wave, which is quite boring 😉 As I finally have some time for this, I'm currently implementing the real music player firmware. And in that context, I could make use of another (manually triggered) DMA channel to copy data from the audio buffer to the I2S master 🙈

How about a dedicated SD-card controller? 🤔

That's probably the best solution, but I'll try to make it work with the SPI interface for now. Maybe one of our students next year could implement such a controller 🤔

So to summarize, I guess the most general solution would be to just have more (maybe even a configurable number of) DMA channels. With 2 channels the SPI use cases would also be covered.

@stnolting
Copy link
Owner

So to summarize, I guess the most general solution would be to just have more (maybe even a configurable number of) DMA channels. With 2 channels the SPI use cases would also be covered.

I see the problem, but do we need two independent channels (i.e. two individual bus access engines) or just two individual descriptors (i.e. two sets of configuration bits to describe an actual DMA transfer)? 🤔

@andkae
Copy link

andkae commented Apr 8, 2024

just two individual descriptors

A approach like altera could be a also an idea, you define an stack of descriptors in the RAM or as dedicated descriptor area for the DMA

image

@stnolting
Copy link
Owner

Right, I really like this concept. You could add another field for a "pointer" and then put as many descriptors into RAM as you like and "chain" them to execute several transfers in a row. 🤔

Obviously, the DMA would require additional hardware to load a descriptor from memory.

Do you think this would be a handy feature?

@andkae
Copy link

andkae commented Apr 24, 2024

Hi Stephan,

Obviously, the DMA would require additional hardware to load a descriptor from memory.

yes at least the capacity for one complete descriptor, to avoid many bus occupies. But i mean when the system designer decides to use an DMA, then i think he has to take in mind that's in general is a second processor. Therefore are some LEs neccessary.

When you have such mimic, then could be also forwarded the irqs to the DMA directly, that means:
DMA-IRQ0 = SPI-TX --> Execute Descriptor Base Pointer A
DMA-IRQ1 = SPI-RX --> Execute Descriptor Base Pointer B
....

When the IRQ fires the DMA uses the basepointer and executes the transfer. Then a SPI Transfer would run w/o any CPU interaction.

Do you think this would be a handy feature?

Question to @jpf91: Would it be helpful for your use case?

BR,
Andreas

@stnolting
Copy link
Owner

But i mean when the system designer decides to use an DMA, then i think he has to take in mind that's in general is a second processor.

True, but let's keep it smaller than an actual second CPU core 😅

When you have such mimic, then could be also forwarded the irqs to the DMA directly, that means:
DMA-IRQ0 = SPI-TX --> Execute Descriptor Base Pointer A

That would mean you need to store the base addresses of ALL descriptors somewhere inside the DMA, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request HW hardware-related stale
Projects
None yet
Development

No branches or pull requests

3 participants