Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers/ipcc: add Inter Processor Communication driver #6537

Merged
merged 2 commits into from
Jul 29, 2022

Conversation

mlyszczek
Copy link
Contributor

This is proposal of new driver for IPCC (inter processor communication controller).

This is still work in progress but I would like to request small review now. Mainly I would like you to review interfaces as this is common code for all architectures.

The only test I did was to properly compile it.

I will be implementing lower half of the driver now, and it would be cool to have at least interfaces reviewed before I do all the stuff and will have to change half of the code :)

I've implemented IPCC as character device, very similar to ordinary UART with separate tx and rx (full duplex). It's interrupt driven - lower half driver will clear/fill buffers (if buffering is enabled) and notify blocked/polling threads. It should be enough to simply call ipcc_register(0); to register single full duplex /dev/ipcc0 device. Then simple open/read/write to get/send data.

It's a lower half driver responsibility to prepare (reserve) memory for IPCC. arch/ part will also have to take IPCC into account when passing heap memory for OS - reserved memory should not be exposed to OS as it's shared between CPUs which can have different OSes on them. Lower layer will also be responsible to send message in proper format, like
struct { uint32_t buflen; unsigned char buf[] };

@xiaoxiang781216
Copy link
Contributor

What's the hardware infrastructure plan to work with the IPCC driver model?

@mlyszczek
Copy link
Contributor Author

I apologize but I am not really sure I understand the question.

I need IPCC to communicate 2 CPUs on stm32wl5. It's a trivial controller, software is responsible to reserve memory, and hardware simply send iterrupts when software says "I copied data from read buffer - you can put new data there" or "I left you something in a memory, go check it out". I did not work with other inter processor communication controllers but I image it will not be that different and I tried to keep this upper half as generic as possible.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jun 28, 2022

I apologize but I am not really sure I understand the question.

I need IPCC to communicate 2 CPUs on stm32wl5. It's a trivial controller, software is responsible to reserve memory, and hardware simply send iterrupts when software says "I copied data from read buffer - you can put new data there" or "I left you something in a memory, go check it out". I did not work with other inter processor communication controllers but I image it will not be that different and I tried to keep this upper half as generic as possible.

Ok, stm32wl5 doesn't create a very special hardware for IPC(just share memory plus software interrupt).
I would suggest you look at OpenAMP(RTOS side) and remoteproc/rpmsg(Linux side):
https://static.linaro.org/connect/hkg18/presentations/hkg18-411.pdf
https://www.kernel.org/doc/Documentation/remoteproc.txt
https://www.kernel.org/doc/Documentation/rpmsg.txt
NuttX fully incorporate OpenAMP stack and provide many essential service on top of them:
https://cwiki.apache.org/confluence/display/NUTTX/NuttX+And+OpenAMP
Actually, ST also promote OpenAMP framework on their hardware:
https://elinux.org/images/6/63/ELC_EU19_rpmsg.pdf
So, with the OpenAMP, you can just provide a simple rptun driver:
https://github.com/apache/incubator-nuttx/blob/master/include/nuttx/rptun/rptun.h
just like:
https://github.com/apache/incubator-nuttx/blob/master/arch/sim/src/sim/up_rptun.c
https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/mpfs/mpfs_ihc.c
and get many services directly.

@mlyszczek
Copy link
Contributor Author

That OpenAMP looks promising and nice but I have some concerns. Mainly about memory. It seems quite complex (and definately way more complex than simple character device stream) so I suspect it requires quite a lot of memory (for an mcu)? Looking around I noticed that it requires at least one extra thread (for dispatching) that by default uses 4kB of ram. That is a lot considering that this will have to work on two of the CPUs. Not to mention extra code size (that I did not confirm in any way).

ST promotes openamp yes, but I've seen it's on their stm32mp* processors which have arm A7 and M4 - so there is plenty of RAM. On stm32wl5 you have 64kB, and I did not see way to enable openamp in their stm32cube.

I want to use this to communicate via LORA and draw stuff on LCD - this will take it's own share of memory, so I don't want to add extra features to the plate (even tough they seem nice) as I already am a little afraid of exhausing RAM.

So do you know how much extra memory would openamp take (ram/flash)?

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jun 29, 2022

That OpenAMP looks promising and nice but I have some concerns. Mainly about memory. It seems quite complex (and definately way more complex than simple character device stream) so I suspect it requires quite a lot of memory (for an mcu)?

The major memory consumption come from tx/rx virtio queue, here is a demo config from sim:
https://github.com/apache/incubator-nuttx/blob/master/arch/sim/src/sim/up_rptun.c#L115-L121
which is quit large, but you can change both the number and size of buffer to reduce the total size less than 1KB.

Looking around I noticed that it requires at least one extra thread (for dispatching) that by default uses 4kB of ram. That is a lot considering that this will have to work on two of the CPUs. Not to mention extra code size (that I did not confirm in any way).

There is an option to reuse the work queue for dispatching:
https://github.com/apache/incubator-nuttx/blob/master/drivers/rptun/Kconfig#L21-L24
so the additional overhead can be shared with other component.

ST promotes openamp yes, but I've seen it's on their stm32mp* processors which have arm A7 and M4 - so there is plenty of RAM. On stm32wl5 you have 64kB, and I did not see way to enable openamp in their stm32cube.

I want to use this to communicate via LORA and draw stuff on LCD - this will take it's own share of memory, so I don't want to add extra features to the plate (even tough they seem nice) as I already am a little afraid of exhausing RAM.

All service is disabled by default, so the major overhead come from:

  1. virtio and remoteproc code
  2. IPC buffer depends on the config
  3. Runtime house keeping data depends on how many endpoint you create

So do you know how much extra memory would openamp take (ram/flash)?

My rough estimation: RAM ~2KB, Flash ~20KB
Anyway, you can provide a dummy rptun driver and get the real result quickly.

@mlyszczek
Copy link
Contributor Author

mlyszczek commented Jun 30, 2022

Yeah, so I've compiled stub code and my flash increased by 18kB. It will probably take similar ammount of memory to compile on second CPU and I don't suppose there is some trickery to share flash code between CPUs ;) And I don't know what else will have to be compiled for another CPU to support OpenAMP there, so it might be another extra tens of kB of flash.

And my project requires substential ammount of flash as it is so adding around 40kB just for IPCC is hard to swallow.

So I've decieded that I will continue with that simple character device stream. The question is, are you going to accept this into upstream or should I not even bother creating pull request for this and just keep it internally?

@onegray
Copy link
Contributor

onegray commented Jun 30, 2022

Just to note, as for STM32WB (which uses same IPCC hardware) it is not possible to run user code on CPU2, only vendor encrypted binaries. For BLE we have bt_driver, but for ZigBee and Thread we still need a solution. I am not sure whether it should be a generic IPCC driver, or IPCC as a part of a more specific lora_driver/zigbee_driver/thread_driver or just an app-side library.

@xiaoxiang781216
Copy link
Contributor

Yeah, so I've compiled stub code and my flash increased by 18kB. It will probably take similar ammount of memory to compile on second CPU and I don't suppose there is some trickery to share flash code between CPUs ;) And I don't know what else will have to be compiled for another CPU to support OpenAMP there, so it might be another extra tens of kB of flash.

Yes, the second cpu need consume the same size of flash

And my project requires substential ammount of flash as it is so adding around 40kB just for IPCC is hard to swallow.

So I've decieded that I will continue with that simple character device stream. The question is, are you going to accept this into upstream or should I not even bother creating pull request for this and just keep it internally?

Yes, OpenAMP just give you a choice to see whether it can save your development effort and reuse other's work.

@xiaoxiang781216
Copy link
Contributor

Linux contain several subsystem for IPC:
hardware spinlock:
https://docs.kernel.org/locking/hwspinlock.html
mailbox:
https://www.kernel.org/doc/html/latest/driver-api/mailbox.html
virtio:
https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html
rpmsg:
https://docs.kernel.org/staging/rpmsg.html
remoteproc:
https://docs.kernel.org/staging/remoteproc.html

Item 1 and 2 directly map to the hardware block, and easy to define the driver interface
Item 3 and 4 define the share memory layout which is purely defined by software not couple with the hardware
Item 5 is used to control the life cycle(download firmware, start, stop) of the remote cpu
You may get some ideal from them.

@xiaoxiang781216
Copy link
Contributor

Just to note, as for STM32WB (which uses same IPCC hardware) it is not possible to run user code on CPU2, only vendor encrypted binaries. For BLE we have bt_driver, but for ZigBee and Thread we still need a solution. I am not sure whether it should be a generic IPCC driver, or IPCC as a part of a more specific lora_driver/zigbee_driver/thread_driver or just an app-side library.

people can define many different and incompatible protocol on top of share memory. It's very hard to reuse the definition or code on the different hardware or even the same hardware but different firmware.
Virtio/rpmsg is only the solution accept widely by many platform(Linux/FreeBSD/RTOS) and maintained by the committee as far as I know.

@acassis
Copy link
Contributor

acassis commented Jun 30, 2022

@xiaoxiang781216 @mlyszczek I think it is good to have a light IPC on NuttX and of course the fact of having it doesn't mean people cannot use rpmsg or openamp (or their both integration). For some people having more options could be confusing, but I think it is a good thing, it means flexibility. Same way people can use NuttX native NX Graphics libs or LVGL or just framebuffer or whatever they want. We just need to let the tools in place to let people to use.

@acassis
Copy link
Contributor

acassis commented Jun 30, 2022

@xiaoxiang781216 did you see that: https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net ?
So I'll include eventfd as another IPC supported on Linux (and NuttX)

@xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 did you see that: https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net ? So I'll include eventfd as another IPC supported on Linux (and NuttX)

OpenAMP is trying add virtio support, once it's ready we plan to add virito driver to NuttX:
OpenAMP/open-amp#390

@mlyszczek
Copy link
Contributor Author

I've added lower half driver for the IPCC. The only test I've done is that I compiled it. I am going to write some tests for second cpu with stm32cube and fix any bugs that I find.

So right now I would like to only request design review - don't look for bugs.

@onegray If you can't flash cm0 with custom software then I don't think you will be able to take advantage of this driver. As Xiao said, with shared memory there is no standard (with the except of OpenAMP) and my implementation is ultra simple (4 bytes of length + data[]) but still custom one, and that most likely won't work with stm complex communication.

For lora driver I plan to write BSD socket interface on top of that IPCC and use Lora software/api provided by stm32cube.

@onegray
Copy link
Contributor

onegray commented Jul 5, 2022

Yes, for stm32wb we have to follow the CPU2 firmware logic which is quite specific. The communication for each IPCC channel is represented by async event queue shared in memory as a linked list. The channels have fixed assignment and just two of them are needed for BLE (for system commands and for BLE commands).
Fortunately we have the bt_driver_s which suits for BLE in HCI mode. So for stm32wb it looks like a way to go, similar to nrf52.

@onegray
Copy link
Contributor

onegray commented Jul 5, 2022

just two of them are needed for BLE

I was wrong on this, more IPCC channels are used for specific purpose. But it makes the communication protocol even more custom.

@acassis
Copy link
Contributor

acassis commented Jul 7, 2022

@mlyszczek I think we can keep your design as it is. "Being simple isn't banal. It's elegant" -- Leonardo Da Vinci

@acassis
Copy link
Contributor

acassis commented Jul 11, 2022

@mlyszczek could you please fix these conflicts? Suggestion: maybe you can keep it depending on CONFIG_EXPERIMENTAL before more people starting using it. @xiaoxiang781216 and @onegray do you agree to let him remove the Draft flag and keep it as Experimental for now?

@mlyszczek
Copy link
Contributor Author

@acassis I wouldn't remove the draft for now - I still want to write some tests for second CPU and confirm that the code does indeed work. When I confirm it - and fix possible bugs, I will rebase and upload new patch.

@mlyszczek mlyszczek force-pushed the ipcc-driver branch 4 times, most recently from e86cac5 to 08ed9b8 Compare July 17, 2022 23:12
@mlyszczek
Copy link
Contributor Author

Ok, this should be ready to review now. As @acassis suggested I marked this driver as EXPERIMENTAL, as I don't feel I've tested it enough. I did perform tens of thousands tests - mainly write/read with different buffer sizes (less than, equal and n-times bigger than IPCC memory can hold) and it all worked fine, but still, I would like to keep this EXPERIMENTAL.

In the next few months I will test this in real live scenarios so I will fix any lurking bugs that - well, are probably there. When I move my code to production and decide I am happy with the quality of the driver I will make a commit to remove EXPERIMENTAL.

EXPERIMENTAL flag also gives me a little room to change API/design if I need to.

@mlyszczek mlyszczek marked this pull request as ready for review July 17, 2022 23:23
@mlyszczek mlyszczek changed the title [API review request] drivers/ipcc: add Inter Processor Communication driver drivers/ipcc: add Inter Processor Communication driver Jul 17, 2022
drivers/ipcc/Make.defs Show resolved Hide resolved
drivers/ipcc/ipcc_register.c Outdated Show resolved Hide resolved
include/nuttx/ipcc.h Outdated Show resolved Hide resolved
include/nuttx/ipcc.h Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_register.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_register.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_unlink.c Show resolved Hide resolved
* more than there is in the buffer.
*/

nwritten = circbuf_read(&priv->ipcc->txbuf, txmem->data, priv->txlen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it too much stuff to do in interrupt handler? Optionally we can move something to work_queue. But I think it's ok for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circbuf is very lightweight, it does couple int comparision then call memcpy(). Is copying couple of hundreds bytes that much for interrupt handler? I guess it might be if you copy a lot of data. I guess it would be best to install DMA handler here, which I might consider if I hit bottleneck here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the high priority work queue is more suitable for this, but consider this just as an option. Also I understand it needs more efforts to implement this, so let's keep it as is.

@onegray
Copy link
Contributor

onegray commented Jul 18, 2022

Great to see the working way how the IPCC communication could be done! This is definitely can be useful in some cases, especially if we need to transfer some meaning amount of data. But also I feel the proposed implementation would not be the most efficient way for the small events oriented protocol, like some wireless are. Small packets can be transferred by simple 'place & get' logic, without streaming data via serial-like interface. I do not detract the proposed solution, but may be we could consider to call it in a more specific way, like ipcc-serial driver (or similar)?

@mlyszczek
Copy link
Contributor Author

@onegray could you please elaborate more about these performance issues?

Is this because of multiple copying? cpu2 wifi buffer -> ipcc memory -> cpu1 circ buffer -> cpu1 user space app? circ buf part can be optimized out by disabling CONFIG_IPCC_BUFFERED.

Too many interrupts? This also could be optimized by increasing IPCC memory and trigger interrupt once specific threshold of memory is occupied. This could be done with some open flags, and ioctl(TRIGGER_TX_TRANSACTION) or similar.

We could also do some black magic by exposing pointer to raw IPCC memory via ioctl() so userapp could work directly on IPCC memory without copying anything. But this is no unix way.

I wrote this driver with ease of use in mind, and character device is the easiest I could think of:) I really don't know what is the performance of the driver. I am yet to see it when testing with passing lora traffic through it. I'm fine with renaming this to ipcc-serial though.

@onegray
Copy link
Contributor

onegray commented Jul 18, 2022

Yes, I am concerning about multiple copying and then parsing back from bytes stream to a packet. But probably it is a reasonable fee for a such universal interface. Telling the truth I do not have exact feeling where is the place for a such driver. So ignore my doubts and let's go with your solution :)

drivers/ipcc/ipcc_close.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_open.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_poll.c Outdated Show resolved Hide resolved
include/nuttx/ipcc.h Outdated Show resolved Hide resolved
include/nuttx/ipcc.h Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_register.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_unlink.c Outdated Show resolved Hide resolved
include/nuttx/ipcc.h Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_register.c Outdated Show resolved Hide resolved
drivers/ipcc/ipcc_write.c Show resolved Hide resolved
@mlyszczek mlyszczek force-pushed the ipcc-driver branch 2 times, most recently from 59523cc to 1453193 Compare July 25, 2022 16:26
@acassis
Copy link
Contributor

acassis commented Jul 27, 2022

@mlyszczek I noticed you removed the Draft tag, is it done for mainline integration? BTW, did you see this project: https://github.com/nickbjohnson4224/fastpub maybe could give more some ideas/inspiration.

@mlyszczek
Copy link
Contributor Author

@acassis yes, as far as I am concerned this can be merged. It's still marked as CONFIG_EXPERIMENTAL as you, rightfully suggested, because I belive there are still bugs lurking there - which I will be fixing while using this driver in my project.

I meant for this ipcc driver to be as flexible as possible and as unixy as possible, performance was not my first goal. I belive zero copy driver can be implemented using ioctl(), but again, that was not my goal here.

If it turns out that this driver is a bottleneck in my application, I will probably investigate other options:)

This patch adds upper half driver for IPCC character driver. IPCC
is a Inter Processor Communication Controller.

Driver is still immature and not thoroughly tested, thus
CONFIG_EXPERIMENTAL is required.

Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
@xiaoxiang781216 xiaoxiang781216 merged commit e887a4a into apache:master Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants