Implement scalable I/O model with worker-per-context#7
Merged
Conversation
Restructure event loop to follow Erlang's scalable I/O model: - Add py_event_worker: dedicated worker process per Python context that receives FD events and timers directly via enif_select - Add py_event_worker_registry: ETS-based O(1) worker lookup by loop_id - Add py_event_worker_sup: simple_one_for_one supervisor for workers C layer changes: - Add worker_pid, has_worker, loop_id fields to erlang_event_loop_t - Add nif_event_loop_set_worker and nif_event_loop_set_id NIFs - Update add_reader/writer/call_later/cancel_timer to route to worker when available, falling back to router for backward compatibility Performance improvements (vs baseline): - Single-worker timer throughput: +11.7% - Timer latency p95: -8.5% - TCP scaling efficiency at 2 workers: +20.6% All 113 tests pass.
9c5d544 to
72d5941
Compare
The initial add_reader/add_writer NIFs correctly used worker_pid when available, but the reselect functions and Python-level FD registration were hardcoded to use router_pid. This caused FD events after the initial registration to go back to the router instead of the worker, breaking the scalable I/O model for TCP connections. Fixed 14 locations to use the pattern: ErlNifPid *target_pid = loop->has_worker ? &loop->worker_pid : &loop->router_pid; NIF functions: nif_reselect_reader, nif_reselect_writer, nif_reselect_reader_fd, nif_reselect_writer_fd, nif_start_reader, nif_start_writer Python module functions: py_add_reader, py_add_writer, py_schedule_timer, py_cancel_timer, py_add_reader_for, py_add_writer_for, py_schedule_timer_for, py_cancel_timer_for Also updated condition checks to accept either router or worker: (!loop->has_router && !loop->has_worker)
Combines FD event dispatch and reselect into a single NIF call, eliminating one roundtrip. The wakeup remains separate for clarity. Worker flow is now: 1. handle_fd_event_and_reselect - dispatch to pending queue + reselect 2. event_loop_wakeup - wake Python thread
- Add sync_sleep fields to erlang_event_loop_t (id, complete flag, cond var)
- Add py_erlang_sleep() Python function that blocks on Erlang timer
- Add nif_dispatch_sleep_complete() to signal sleep completion
- Handle {sleep_wait, DelayMs, SleepId} in py_event_worker.erl
- Add erlang_asyncio module with full asyncio-compatible API:
* sleep(), run(), gather(), wait(), wait_for()
* create_task(), ensure_future(), shield()
* get_event_loop(), new_event_loop(), set_event_loop()
* TimeoutError, CancelledError exceptions
* ALL_COMPLETED, FIRST_COMPLETED, FIRST_EXCEPTION constants
- Add py_erlang_sleep_SUITE with 8 tests
Usage:
import erlang_asyncio
async def main():
await erlang_asyncio.sleep(0.001)
results = await erlang_asyncio.gather(task1(), task2())
erlang_asyncio.run(main())
f3d2310 to
7bb28a9
Compare
- Add erlang_asyncio section to docs/asyncio.md with API reference - Add performance comparison and usage guidelines - Update CHANGELOG.md with v1.8.0 features
CI runners can have unpredictable timing - allow up to 10x tolerance instead of 2x to avoid flaky test failures.
- Add asyncio.md and web-frameworks.md to README docs list - Update getting-started.md with hex.pm install and erlang_asyncio section - Add performance tips about erlang_asyncio to web-frameworks.md
Implement optimizations for ASGI request/response handling: - Direct response tuple extraction: extract_asgi_response() directly converts (status, headers, body) tuples to Erlang terms, avoiding generic py_to_term() overhead (5-10% improvement) - Pre-interned header names: Cache 16 common HTTP headers (host, content-type, user-agent, etc.) as PyBytes with length-based dispatch for O(1) lookup (3-5% improvement) - Cached status code integers: Cache 14 common HTTP status codes (200, 201, 204, 301, 302, 304, 400-405, 500-503) as PyLong objects (1-2% improvement) Combined expected improvement: 9-17% for ASGI marshalling. Add tests for all optimizations in py_SUITE.erl.
Implement thread-local scope template caching that provides 15-20% improvement for applications with repeated path patterns: - Cache scope templates per path using FNV-1a hash - Clone cached templates and update only dynamic fields (client, headers, query_string) for cache hits - Track interpreter ownership for subinterpreter/free-threading safety - Add ASGI-specific Erlang atoms for efficient scope map lookups The cache uses 64 entries per thread with automatic replacement. Cache entries include interpreter tracking to prevent cross-interpreter PyObject sharing which would cause crashes. Add test_asgi_scope_caching test case.
Implement resource-backed buffer for request bodies >= 1KB (threshold defined by ASGI_ZERO_COPY_THRESHOLD): - Create AsgiBuffer Python type implementing buffer protocol - Use NIF resource to manage buffer lifecycle safely - Python can slice/view the buffer without additional copies - Works correctly with async code since resource lifetime is managed - Automatic fallback to PyBytes for small bodies or when resource type is not initialized Components: - asgi_buffer_resource_t: NIF resource holding binary data - AsgiBufferObject: Python type with bf_getbuffer/bf_releasebuffer - AsgiBuffer_from_resource(): Factory function - Updated asgi_binary_to_buffer() to use resource for large bodies Add test_asgi_zero_copy_buffer test case.
Implement LazyHeaderList Python type that converts headers on-demand instead of eagerly converting all headers when building ASGI scope. Key changes: - LazyHeaderList with sequence protocol (__len__, __getitem__, __iter__) - LazyHeaderListIter for iteration support - NIF resource (lazy_headers_resource_t) holds copied header data - Threshold of 4 headers (smaller lists use eager conversion) - Headers cached after first access for repeated lookups This completes all 6 ASGI NIF optimizations with expected total improvement of 40-60% for typical ASGI workloads.
- Add ASGI NIF optimizations to 1.8.0 changelog - Document all 6 optimizations with expected improvements - Update web-frameworks.md with optimization details table
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
py_event_routerbottleneck with direct I/O and timer routing to workersChanges
New modules:
py_event_worker- Core worker process receiving FD events and timers directly viaenif_selectanderlang:send_afterpy_event_worker_registry- ETS registry for worker lookup by loop IDpy_event_worker_sup- simple_one_for_one supervisor for dynamic workersC layer modifications:
worker_pidandloop_idfields toerlang_event_loop_tadd_reader/writer/call_later/cancel_timerto route to worker when availableevent_loop_set_worker/2,event_loop_set_id/2Benchmark results vs main: