Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC asyncio: efficiently polling devices #2664

Open
peterhinch opened this issue Dec 6, 2016 · 21 comments
Open

RFC asyncio: efficiently polling devices #2664

peterhinch opened this issue Dec 6, 2016 · 21 comments

Comments

@peterhinch
Copy link
Contributor

A common situation when interfacing hardware is the case where a device which is normally not ready needs to be polled. One solution is along the lines of

async def poll(device):
    while True:
        if device.ready():
            device.handle_data()
        yield

This works but has a high worst case latency. If there are N coroutines each with a maximum latency of T between yield statements, an interval of up to NT will exist between calls to device.ready(). This can be reduced to T if there is a way to delegate the polling to the scheduler. In other words to provide a facility to schedule on an event (cf the Poller class in my scheduler).
Any comments on ways to achieve this with uasyncio? Perhaps there's a solution which I've missed.

To anyone interested I've ported my classes for debounced switches and pushbuttons to asyncio and plan to extend this to other drivers. Preliminary code and docs here.

@dpgeorge
Copy link
Member

dpgeorge commented Dec 7, 2016

This can be reduced to T if there is a way to delegate the polling to the scheduler.

This is indeed how the EpollEventLoop works: in the case where there is no work to do (no coros are active) this scheduler will call select.poll() to wait on all the IO objects (ie file descriptors) and then execute those that have an event, like ready for reading.

To apply this to other hardware devices those devices would need to be pollable by the select module. Eg UART and USB_VCP on pyboard are pollable. One would need to make it so user-defined classes can be polled.

@peterhinch
Copy link
Contributor Author

That would be a great enhancement. Is it a major task?

@dpgeorge
Copy link
Member

dpgeorge commented Dec 7, 2016

No, it's not too difficult. If your object provides an ioctl method then it can be called, like with how block devices work at the Python level.

@peterhinch
Copy link
Contributor Author

peterhinch commented Dec 7, 2016

[edited]
Alas I can't see how select is usable in a cooperative multi-tasking environment. A typical device driver coroutine might trigger a device, then block until the device is ready, with other coroutines being scheduled while it is blocked. This implies multiple coroutines, each waiting on its own select() statement.

Looking at the code it seems to me that select() blocks with no provision for yielding to other coros. Is there a need for an asyncio friendly variant of select() written as a generator, so coroutines can issue await uselect(args)?

Or is there another established approach? Does the __await__() special method have anything to offer?

@dpgeorge
Copy link
Member

dpgeorge commented Dec 8, 2016

Looking at the code it seems to me that select() blocks with no provision for yielding to other coros. Is there a need for an asyncio friendly variant of select() written as a generator, so coroutines can issue await uselect(args)?

It works differently to this. The main event loop which schedules the coros is the only part of the code that calls select, and hence the only part that blocks. The coros that want to do IO should never block, they should only do non-blocking IO and should not call select (except with a zero timeout, ie a simple poll). If they need to wait, the coros should yield a special IO object (eg IORead) and then the event loop collects all the IO operations that are waiting, configures select/poll to listen for all these IO ops, and then blocks until one or more of them are ready.

@peterhinch
Copy link
Contributor Author

OK, I've found the code. A clever solution! I'll experiment.

@peterhinch peterhinch reopened this Dec 9, 2016
@peterhinch
Copy link
Contributor Author

I'm experiencing a couple of problems. I've written a dummy Device class using IORead whose ioctl method will currently always return 0. The class has a fileno() method, which I gather needs to return the address of the device as an integer.

Firstly, I don't know a way to do this in Python (uctypes.addressof() works only for buffers). I used assembler but a Python solution would be good.

Secondly, on the Pyboard an OSError is thrown. On Unix I tried returning an arbitrary integer from fileno(). No error was thrown, the readloop running continuously (unsurprisingly ioctl was never called). On the Pyboard the same code, returning the same integer, threw the OSError. Here is the code and the outcome on the Pyboard.

try:
    import uasyncio as asyncio
except ImportError:
    import asyncio

MP_STREAM_POLL_RD = 1
MP_STREAM_POLL = 3

# Get address of object not supporting the buffer protocol
@micropython.asm_thumb
def addressof(r0):
    nop()


class Device(object):
    def __init__(self):
        self.ready = False

    def fileno(self):
        print(hex(addressof(self)))
        return addressof(self)

    def ioctl(self, cmd, flags):
        res = 0
        print('Got here')
        if cmd == MP_STREAM_POLL and (flags & MP_STREAM_POLL_RD):
            if self.ready:
                res = MP_STREAM_POLL_RD
        return res

    async def readloop(self):
        while True:
            print('About to yield')
            yield asyncio.IORead(self)
            print('Should never happen')

loop = asyncio.get_event_loop()
device = Device()
loop.call_soon(device.readloop())
loop.run_forever()

Outcome:

>>> import io
About to yield
0x200078b0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "io.py", line 49, in <module>
  File "/sd/lib/uasyncio/core.py", line 115, in run_forever
  File "/sd/lib/uasyncio/core.py", line 90, in run_forever
  File "/sd/lib/uasyncio/__init__.py", line 21, in add_reader
OSError: stream operation not supported
>>> 

Doubtless I'm missing something, but a pointer would be great :)

@BrendanSimon
Copy link

With respect to your uasync examples here, you could also use the uaysnc.Queue() to signal an a coro that is waiting on the queue. This would be another method of addressing a race condition.

The content of the queue could contain the state change of a switch (pressed or released).

The PushButton class could have the attributes pressed_coro and released_coro instead of true_coro and false_coro for improved clarity of meaning.

Any idea if there is any performance difference in using a queue between existing coros, or a coro creating other coros on the fly ?

@peterhinch
Copy link
Contributor Author

peterhinch commented Dec 11, 2016

Those notes are a work in progress. I'd regard the Switch, Pushbutton and Delay_ms classes as reasonably developed as they were ported from another scheduler but I'll bear in mind your naming suggestion.

The notes - which I've now split out into a separate document - are probably best ignored at this stage. I have some code samples using uasync.Queue() in development, and I've also written micro versions of Lock and Event primitives, all of which have application in dealing with race conditions. I'm learning the uasyncio subset as I go, so a decent tutorial with code samples is some way off. Likewise I haven't started benchmarking different approaches; the scheduler looks to have been designed with efficiency in mind so I'd expect spawning coros to have a low overhead. But in practice, as I'm sure you're aware, latency is very much dependent on application design.

I'd very much like to know how to fix this IORead stuff which has me foxed.

@BrendanSimon
Copy link

BrendanSimon commented Dec 11, 2016

I found the following in py/stream.c. Does that help any?

Do you need read and write methods, as well os ioctl ?

I'm not sure how a protocol is set, or how the flags fields get set in python objects. Inherited from a base class perhaps? There must be examples of other stream objects that would give some insight.

const mp_stream_p_t *mp_get_stream_raise(mp_obj_t self_in, int flags) {
    mp_obj_type_t *type = mp_obj_get_type(self_in);
    const mp_stream_p_t *stream_p = type->protocol;
    if (stream_p == NULL
        || ((flags & MP_STREAM_OP_READ) && stream_p->read == NULL)
        || ((flags & MP_STREAM_OP_WRITE) && stream_p->write == NULL)
        || ((flags & MP_STREAM_OP_IOCTL) && stream_p->ioctl == NULL)) {
        // CPython: io.UnsupportedOperation, OSError subclass
        mp_raise_msg(&mp_type_OSError, "stream operation not supported");
    }
    return stream_p;
}

@peterhinch
Copy link
Contributor Author

In essence my problem is this. I'm emulating a device driver for a read-only device. For efficiency I want to delegate polling the device to the scheduler.

@dpgeorge advised that the way to do this is to have an ioctl method which returns the state of the device. The coro in the device driver has a continuously running loop which polls the hardware. To do that efficiently it yields an IORead object. From that point the loop blocks, with the scheduler checking ioctl. The loop only unblocks when ioctl returns MP_STREAM_POLL_RD.

In order for this mechanism to work, I need a fileno method which returns the address of the object. This presents two problems, only the first of which I can work around with an assembler hack.

  1. I don't know a way in MicroPython to return the address of an object which doesn't support the buffer protocol.
  2. The fileno method raises an OSError regardless of the integer returned.

Doubtless I'm missing something simple here, but a pointer would be much appreciated :)

@peterhinch
Copy link
Contributor Author

@BrendanSimon Following up your identification of the source of the error I modified mp_get_stream_raise() to produce debug info to determine the cause of the error. It's getting a null pointer.

Line 21 of uasyncio's __init__.py

self.poller.register(fd, select.POLLIN)

is getting fd == 0x2000a3b0 so the event loop's add_reader() method is retrieving the self pointer from my class. Yet mp_get_stream_raise() is seeing a null pointer.
@dpgeorge Are you able to shed any light on this?

@peterhinch
Copy link
Contributor Author

I'm also making no headway with using a UART which also has a fileno() issue. The following loopback test:

import uasyncio as asyncio
from pyb import UART
uart = UART(1, 9600)

async def sender():
    swriter = asyncio.StreamWriter(uart, {})
    while True:
        swriter.awrite('Hello uart\n')
        await asyncio.sleep(2)

async def receiver():
    sreader = asyncio.StreamReader(uart)
    while True:
        res = await sreader.readline()
        print('Recieved', res)

loop = asyncio.get_event_loop()
loop.create_task(sender())
loop.create_task(receiver())
loop.run_forever()

fails with

>>> import auart
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "auart.py", line 21, in <module>
  File "/sd/lib/uasyncio/core.py", line 115, in run_forever
  File "/sd/lib/uasyncio/core.py", line 90, in run_forever
AttributeError: 'UART' object has no attribute 'fileno'
>>> 

Similar instantiation of StreamReader and StreamWriter objects is found in open_connection() with a socket, so I'm puzzled why a UART is failing.

@dhylands
Copy link
Contributor

It looks like fileno for a socket is only implemented in unix/file.c and unix/modsocket.c

So right now (with no changes), it looks like only unix files and sockets will work with uasyncio

@dpgeorge
Copy link
Member

@dpgeorge advised that the way to do this is to have an ioctl method which returns the state of the device

Yes, but that's only the Python side of things... on the other side (in C) there needs to be a bit of glue code so that the Python ioctl method that the user defines can be accessed when a C method is expected.

But aside from that, one of the main blockers for this issue is with fileno: on baremetal targets there is no such concept as file descriptor and so either uasyncio needs to be rewritten to support file "objects" rather than descriptors, or baremetal needs to support file descriptors.

@peterhinch
Copy link
Contributor Author

So I gather that on baremetal efficient polling and asynchronous UART I/O in Python are both dependent on firmware changes. Are these in the pipeline for implementation, or is it a longer term goal?

Aside from the polling issue I'm deeply impressed by the new uasyncio which has done everything I've thrown at it :)

@dpgeorge
Copy link
Member

So I gather that on baremetal efficient polling and asynchronous UART I/O in Python are both dependent on firmware changes. Are these in the pipeline for implementation, or is it a longer term goal?

As part of getting uasyncio working on esp8266 (and subsequently pyboard) there will need to be changes to make this stuff work (eg for esp8266 to do concurrent socket handling).

@peterhinch
Copy link
Contributor Author

With micropython/micropython-lib#164 the UART I/O is now working, but so far I've had no joy defining a Python class with an ioctl method.

there needs to be a bit of glue code so that the Python ioctl method that the user
defines can be accessed when a C method is expected.

Before spending time on this, has this been implemented yet?

@pfalcon
Copy link
Contributor

pfalcon commented Apr 7, 2017

Nope, the idea of the streams is that they're efficient and thus should be implemented in C. It's a bit of oxymoron thus to want to implement them in Python, and unclear, if there's a real usecase except unit testing.

@dpgeorge posted an example of different native subclassing technique(s) though, using streams as an illustration: #2824

@peterhinch
Copy link
Contributor Author

peterhinch commented Jun 15, 2018

Revisiting this after the welcome implementation of #3836.
[EDIT 20th June: make proposed behaviour configurable at runtime.]

An application with multiple coroutines yielding on uasyncio.wait(0) (or similar) will poll I/O once per complete cycle of scheduling: in other words I/O polling is scheduled in round-robin fashion along with the user coroutines. This proposal is for a runtime option enabling pending I/O to be scheduled as soon as possible. With the option selected, when a coroutine yields, I/O is tested and, if pending, scheduled at once.

The use of this option brings two benefits:

  1. It reduces buffering requirements for fast devices such as UARTs.
  2. User defined stream devices are serviced in the most timely feasible fashion.

There is a drawback in that raw scheduling performance is reduced where I/O is not in use. In practice a user would only specify the option if fast I/O were to be used. I have benchmarked the proposed solution in the default situation where fast I/O is not selected and no I/O device is used: the reduction in raw scheduling performance is at most 3%.

Current design

When I/O is pending PollEventLoop.wait() uses self.call_soon() to schedule the waiting coro. This appends it to runq causing the I/O handling to be scheduled after all other pending coros.

In EventLoop.run_forever() self.wait(delay) is called in the outer loop when runq has completed. This is correct given the way I/O is scheduled, and minimises overhead in scheduling runq tasks.

Proposed solution

  1. The EventLoop constructor has a fast_io=0 argument.
  2. If fast_io > 0 an I/O queue (.ioq) is instantiated of size fast_io.
  3. An EventLoop bound method ._call_now() analogous to .call_soon appends to .ioq.
  4. An EventLoop method pointer ._call_io points to .call_soon if fast_io==0, otherwise it points to ._call_now.
  5. PollEventLoop.wait() uses ._call_io() to schedule pending I/O.
  6. If .fast_io is used, EventLoop.run_forever() calls self.wait(0) in the inner loop and if .ioq is not empty it pops the next task from .ioq in place of .runq.

Performance

A useful performance metric is the rate at which, in the absence of any I/O scheduling, uasyncio can schedule multiple round-robin coroutines. this benchmark is a way to assess this. In the absence of I/O scheduling only change (6) above should affect this measurement (and then only if .fast_io is set).

Current code schedules up to 200 coros at a rate of ~6800Hz - in other words the minimum overhead is about 146μs. With .fast_io set this increases to ~186μs, a 27% drop. These measurements were on a Pyboard V1.1. In practice a user will only set fast_io if its behaviour is required: the tradeoff of a higher scheduling overhead is arguably inevitable in this situation.

As stated above, if fast_io is not set the effect on raw scheduling performance is a reduction of about 3%.

The benefit is that if fast_io is set, pending I/O is scheduled as soon as the scheduler is able to do so.

Potential issue

The current design's round-robin scheduling guarantees that, with properly written coroutines, an application can never block the scheduler. In the proposed solution this would remain the default.

With .fast_io set, coroutines jump the queue; this carries the risk that blocking may occur. There are two scenarios:

  1. Slow user-written I/O devices, confronted with triggers occurring at a high repetition rate, hog execution time preventing normal coroutines from executing.
  2. Existing I/O drivers do likewise in a heavily I/O-bound system.

My view on these points is as follows:

  1. The issue can only arise if .fast_io is specified.
  2. Writing I/O drivers is an "advanced" technique whose purpose is to take advantage of the proposed priority scheme. Where performance is not paramount devices will be polled by conventional coros. Arguably the discipline required to write I/O drivers is akin to that of writing ISR's which also have the potential to lock the system.
  3. An I/O-bound system probably implies multiple high baudrate UARTs running concurrently. It's not obvious (to me) whether such a system would be better served by the proposed patch or the existing code. Less frequent scheduling implies a higher coro workload.

Conclusion

This change would release the full potential of #3836 and it would go some way towards ameliorating UART buffering requirements as discussed in that PR (fixing the read/write bug won't, of course, eliminate the need for hardware buffering: this proposed "priority" patch reduces but does not eliminate scheduling latency).

I am happy to attempt to implement this (preferably after dealing with the outstanding read/write issue) if the team agrees it's beneficial. Comments on the proposed solution are, of course, welcome.

@peterhinch
Copy link
Contributor Author

In the absence of any comments to this I have raised micropython-lib PR 287 implementing the above.

tannewt pushed a commit to tannewt/circuitpython that referenced this issue Mar 6, 2020
Download links from S3; Do not upload release assets to GitHub
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants