-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers/ipcc: add Inter Processor Communication driver #6537
Conversation
What's the hardware infrastructure plan to work with the IPCC driver model? |
I apologize but I am not really sure I understand the question. I need IPCC to communicate 2 CPUs on stm32wl5. It's a trivial controller, software is responsible to reserve memory, and hardware simply send iterrupts when software says "I copied data from read buffer - you can put new data there" or "I left you something in a memory, go check it out". I did not work with other inter processor communication controllers but I image it will not be that different and I tried to keep this upper half as generic as possible. |
Ok, stm32wl5 doesn't create a very special hardware for IPC(just share memory plus software interrupt). |
That OpenAMP looks promising and nice but I have some concerns. Mainly about memory. It seems quite complex (and definately way more complex than simple character device stream) so I suspect it requires quite a lot of memory (for an mcu)? Looking around I noticed that it requires at least one extra thread (for dispatching) that by default uses 4kB of ram. That is a lot considering that this will have to work on two of the CPUs. Not to mention extra code size (that I did not confirm in any way). ST promotes openamp yes, but I've seen it's on their stm32mp* processors which have arm A7 and M4 - so there is plenty of RAM. On stm32wl5 you have 64kB, and I did not see way to enable openamp in their stm32cube. I want to use this to communicate via LORA and draw stuff on LCD - this will take it's own share of memory, so I don't want to add extra features to the plate (even tough they seem nice) as I already am a little afraid of exhausing RAM. So do you know how much extra memory would openamp take (ram/flash)? |
The major memory consumption come from tx/rx virtio queue, here is a demo config from sim:
There is an option to reuse the work queue for dispatching:
All service is disabled by default, so the major overhead come from:
My rough estimation: RAM ~2KB, Flash ~20KB |
Yeah, so I've compiled stub code and my flash increased by 18kB. It will probably take similar ammount of memory to compile on second CPU and I don't suppose there is some trickery to share flash code between CPUs ;) And I don't know what else will have to be compiled for another CPU to support OpenAMP there, so it might be another extra tens of kB of flash. And my project requires substential ammount of flash as it is so adding around 40kB just for IPCC is hard to swallow. So I've decieded that I will continue with that simple character device stream. The question is, are you going to accept this into upstream or should I not even bother creating pull request for this and just keep it internally? |
Just to note, as for STM32WB (which uses same IPCC hardware) it is not possible to run user code on CPU2, only vendor encrypted binaries. For BLE we have |
Yes, the second cpu need consume the same size of flash
Yes, OpenAMP just give you a choice to see whether it can save your development effort and reuse other's work. |
Linux contain several subsystem for IPC: Item 1 and 2 directly map to the hardware block, and easy to define the driver interface |
people can define many different and incompatible protocol on top of share memory. It's very hard to reuse the definition or code on the different hardware or even the same hardware but different firmware. |
@xiaoxiang781216 @mlyszczek I think it is good to have a light IPC on NuttX and of course the fact of having it doesn't mean people cannot use rpmsg or openamp (or their both integration). For some people having more options could be confusing, but I think it is a good thing, it means flexibility. Same way people can use NuttX native NX Graphics libs or LVGL or just framebuffer or whatever they want. We just need to let the tools in place to let people to use. |
@xiaoxiang781216 did you see that: https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net ? |
OpenAMP is trying add virtio support, once it's ready we plan to add virito driver to NuttX: |
I've added lower half driver for the IPCC. The only test I've done is that I compiled it. I am going to write some tests for second cpu with stm32cube and fix any bugs that I find. So right now I would like to only request design review - don't look for bugs. @onegray If you can't flash cm0 with custom software then I don't think you will be able to take advantage of this driver. As Xiao said, with shared memory there is no standard (with the except of OpenAMP) and my implementation is ultra simple (4 bytes of length + data[]) but still custom one, and that most likely won't work with stm complex communication. For lora driver I plan to write BSD socket interface on top of that IPCC and use Lora software/api provided by stm32cube. |
Yes, for stm32wb we have to follow the CPU2 firmware logic which is quite specific. The communication for each IPCC channel is represented by async event queue shared in memory as a linked list. The channels have fixed assignment and just two of them are needed for BLE (for system commands and for BLE commands). |
I was wrong on this, more IPCC channels are used for specific purpose. But it makes the communication protocol even more custom. |
@mlyszczek I think we can keep your design as it is. "Being simple isn't banal. It's elegant" -- Leonardo Da Vinci |
@mlyszczek could you please fix these conflicts? Suggestion: maybe you can keep it depending on CONFIG_EXPERIMENTAL before more people starting using it. @xiaoxiang781216 and @onegray do you agree to let him remove the Draft flag and keep it as Experimental for now? |
@acassis I wouldn't remove the draft for now - I still want to write some tests for second CPU and confirm that the code does indeed work. When I confirm it - and fix possible bugs, I will rebase and upload new patch. |
e86cac5
to
08ed9b8
Compare
Ok, this should be ready to review now. As @acassis suggested I marked this driver as EXPERIMENTAL, as I don't feel I've tested it enough. I did perform tens of thousands tests - mainly write/read with different buffer sizes (less than, equal and n-times bigger than IPCC memory can hold) and it all worked fine, but still, I would like to keep this EXPERIMENTAL. In the next few months I will test this in real live scenarios so I will fix any lurking bugs that - well, are probably there. When I move my code to production and decide I am happy with the quality of the driver I will make a commit to remove EXPERIMENTAL. EXPERIMENTAL flag also gives me a little room to change API/design if I need to. |
* more than there is in the buffer. | ||
*/ | ||
|
||
nwritten = circbuf_read(&priv->ipcc->txbuf, txmem->data, priv->txlen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it too much stuff to do in interrupt handler? Optionally we can move something to work_queue
. But I think it's ok for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circbuf is very lightweight, it does couple int comparision then call memcpy(). Is copying couple of hundreds bytes that much for interrupt handler? I guess it might be if you copy a lot of data. I guess it would be best to install DMA handler here, which I might consider if I hit bottleneck here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the high priority work queue is more suitable for this, but consider this just as an option. Also I understand it needs more efforts to implement this, so let's keep it as is.
Great to see the working way how the IPCC communication could be done! This is definitely can be useful in some cases, especially if we need to transfer some meaning amount of data. But also I feel the proposed implementation would not be the most efficient way for the small events oriented protocol, like some wireless are. Small packets can be transferred by simple 'place & get' logic, without streaming data via serial-like interface. I do not detract the proposed solution, but may be we could consider to call it in a more specific way, like ipcc-serial driver (or similar)? |
@onegray could you please elaborate more about these performance issues? Is this because of multiple copying? cpu2 wifi buffer -> ipcc memory -> cpu1 circ buffer -> cpu1 user space app? circ buf part can be optimized out by disabling CONFIG_IPCC_BUFFERED. Too many interrupts? This also could be optimized by increasing IPCC memory and trigger interrupt once specific threshold of memory is occupied. This could be done with some open flags, and ioctl(TRIGGER_TX_TRANSACTION) or similar. We could also do some black magic by exposing pointer to raw IPCC memory via ioctl() so userapp could work directly on IPCC memory without copying anything. But this is no unix way. I wrote this driver with ease of use in mind, and character device is the easiest I could think of:) I really don't know what is the performance of the driver. I am yet to see it when testing with passing lora traffic through it. I'm fine with renaming this to ipcc-serial though. |
Yes, I am concerning about multiple copying and then parsing back from bytes stream to a packet. But probably it is a reasonable fee for a such universal interface. Telling the truth I do not have exact feeling where is the place for a such driver. So ignore my doubts and let's go with your solution :) |
59523cc
to
1453193
Compare
@mlyszczek I noticed you removed the Draft tag, is it done for mainline integration? BTW, did you see this project: https://github.com/nickbjohnson4224/fastpub maybe could give more some ideas/inspiration. |
@acassis yes, as far as I am concerned this can be merged. It's still marked as CONFIG_EXPERIMENTAL as you, rightfully suggested, because I belive there are still bugs lurking there - which I will be fixing while using this driver in my project. I meant for this ipcc driver to be as flexible as possible and as unixy as possible, performance was not my first goal. I belive zero copy driver can be implemented using ioctl(), but again, that was not my goal here. If it turns out that this driver is a bottleneck in my application, I will probably investigate other options:) |
This patch adds upper half driver for IPCC character driver. IPCC is a Inter Processor Communication Controller. Driver is still immature and not thoroughly tested, thus CONFIG_EXPERIMENTAL is required. Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
Signed-off-by: Michał Łyszczek <michal.lyszczek@bofc.pl>
This is proposal of new driver for IPCC (inter processor communication controller).
This is still work in progress but I would like to request small review now. Mainly I would like you to review interfaces as this is common code for all architectures.
The only test I did was to properly compile it.
I will be implementing lower half of the driver now, and it would be cool to have at least interfaces reviewed before I do all the stuff and will have to change half of the code :)
I've implemented IPCC as character device, very similar to ordinary UART with separate tx and rx (full duplex). It's interrupt driven - lower half driver will clear/fill buffers (if buffering is enabled) and notify blocked/polling threads. It should be enough to simply call ipcc_register(0); to register single full duplex /dev/ipcc0 device. Then simple open/read/write to get/send data.
It's a lower half driver responsibility to prepare (reserve) memory for IPCC. arch/ part will also have to take IPCC into account when passing heap memory for OS - reserved memory should not be exposed to OS as it's shared between CPUs which can have different OSes on them. Lower layer will also be responsible to send message in proper format, like
struct { uint32_t buflen; unsigned char buf[] };