Skip to content

mgmtd, grpc: add Get(CONFIG), Execute and Subscribe support#22158

Draft
lamestllama wants to merge 1 commit into
FRRouting:masterfrom
lamestllama:grpc-subscribe
Draft

mgmtd, grpc: add Get(CONFIG), Execute and Subscribe support#22158
lamestllama wants to merge 1 commit into
FRRouting:masterfrom
lamestllama:grpc-subscribe

Conversation

@lamestllama
Copy link
Copy Markdown

@lamestllama lamestllama commented Jun 2, 2026

Summary

Add gRPC Get(CONFIG), Execute and Subscribe support through mgmtd.

Get(CONFIG) reads from mgmtd's running datastore when the gRPC module is
loaded into mgmtd, so the mgmtd gRPC endpoint returns central configuration.

Execute uses mgmtd's backend RPC transaction path, so a request received on
the mgmtd gRPC endpoint can reach the daemon that registered the RPC. One
example is ripd's /frr-ripd:clear-rip-route.

Subscribe adds a server-streaming path for notifications and operational
state:

  • ON_CHANGE registers notification selectors through mgmtd.
  • STREAM sends the current operational state, then a sync_response marker,
    and then stays registered for matching notifications.
  • SAMPLE reads operational state at the requested interval.
  • Heartbeats can be requested for otherwise quiet streams.
  • Slow consumers are closed with OUT_OF_RANGE once their pending response
    queue reaches the configured limit.

The daemon side stays generic: daemons continue to register RPCs and
notifications with mgmtd, and gRPC becomes another frontend for that existing
machinery.

The subscription path also handles stream cancellation and shutdown explicitly.
Notification delivery re-checks the live subscription under the RPC state lock
before writing to the stream. During shutdown, subscription state is detached
from the stream and CQ tags are left to process teardown rather than being
reclaimed while notification callbacks may still be unwinding.

Why

Daemon-owned RPCs and notifications were already visible to mgmtd, but the
mgmtd gRPC endpoint could not read mgmtd's central running configuration,
dispatch Execute requests to backend daemons or deliver daemon notifications
to gRPC subscribers.

This puts gRPC on the same frontend/backend path as the rest of mgmtd.

Testing

Added topotests for:

  • Get(CONFIG) through mgmtd, including repeated subtree reads and root
    datastore reads
  • Execute dispatch to ripd using /frr-ripd:clear-rip-route
  • RIP authentication notifications delivered through Subscribe ON_CHANGE
  • selector matching and non-overlap
  • invalid selector rejection
  • STREAM initial state and sync_response
  • SAMPLE periodic reads
  • heartbeats on quiet streams
  • bounded pending queues for slow consumers

There are tests available for the ospfd and ospf6d daemons that will work
with OSPF after #22058, or its successor, is merged.

Related work

#20883 also implements Get(CONFIG), using a direct mgmtd-specific path from
lib/northbound_grpc.cpp into mgmtd internals.

That path includes mgmtd private headers, declares extern struct mgmt_master *mm, calls mgmtd helper code from the generic gRPC module, serialises the
datastore subtree as LYB, and parses it back into a libyang tree for the usual
gRPC JSON or XML output path.

This PR keeps the generic gRPC module daemon-neutral. It asks libfrr's
northbound dispatcher for configuration data, and mgmtd provides the registered
implementation when the module is loaded there. The client-visible result is
similar, but the ownership boundary is clearer:

northbound_grpc.cpp
-> libfrr northbound dispatcher
-> mgmtd registered implementation
-> mgmtd running datastore
-> copied libyang tree
-> normal gRPC JSON or XML output

@frrbot frrbot Bot added the mgmt FRR Management Infra label Jun 2, 2026
@lamestllama lamestllama changed the title mgmtd, grpc: add Execute and Subscribe support mgmtd, grpc: add Get, Execute and Subscribe support Jun 2, 2026
@lamestllama lamestllama changed the title mgmtd, grpc: add Get, Execute and Subscribe support mgmtd, grpc: add Get(CONFIG), Execute and Subscribe support Jun 2, 2026
@lamestllama lamestllama marked this pull request as ready for review June 2, 2026 11:46
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR wires up Get(CONFIG), Execute, and Subscribe gRPC RPCs through mgmtd's existing frontend/backend machinery. Get(CONFIG) delegates to a registered nb_config_get_dispatch_cb; Execute uses the mgmtd RPC transaction path asynchronously; Subscribe adds server-streaming support for ON_CHANGE, STREAM, and SAMPLE modes with slow-consumer back-pressure and heartbeats.

  • New mgmt_grpc.c: Registers mgmtd as the northbound dispatcher, bridging gRPC Execute requests into mgmt_txn_send_rpc_notify and config-get into mgmtd's running datastore.
  • lib/northbound_grpc.cpp: Major rework adds SubscribeRpcState (server-streaming), ExecuteRpcState (async), and get_dnode_config dispatch; fixes the ok=false CQ break-on-first-error behaviour that was killing the whole gRPC service on any single stream cancellation.
  • mgmtd/mgmt_fe_adapter.c: Adds mgmt_fe_notify_sub to let non-socket subscribers (gRPC) receive notifications through the same ns_string selector tree used by native frontend clients.

Confidence Score: 4/5

The core Execute and Get(CONFIG) paths through mgmtd are straightforward and well-tested; normal Subscribe operation is functionally correct but stream teardown paths have edge cases in timer events that remain unaddressed from prior review rounds.

The synchronous and async RPC dispatch in mgmt_grpc.c is clean and refcount-correct. The CQ loop fix no longer kills the whole service on a single stream cancellation. The main risk area is SubscribeRpcState: previously flagged leaks when finish_from_event_thread is called from a timer without setting state=FINISH remain open. The heartbeat timer reschedule after slow-consumer close reported here is minor. The overall design is sound and the topotests cover the main modes well.

lib/northbound_grpc.cpp — specifically the finish_from_event_thread / timer-event paths in SubscribeRpcState where state=FINISH is never set, leaving the tag unreachable by the CQ delete branch.

Important Files Changed

Filename Overview
lib/northbound_grpc.cpp Major rework: adds SubscribeRpcState (streaming), ExecuteRpcState (async), and per-path config dispatch. Several previously-flagged lifetime/leak issues remain unaddressed; heartbeat timer reschedules unnecessarily after slow-consumer close.
mgmtd/mgmt_grpc.c New file bridging gRPC Execute/config-get into mgmtd's transaction machinery. Synchronous and async dispatch paths are cleanly separated; refcount protocol around mgmt_grpc_rpc_req is correct.
mgmtd/mgmt_fe_adapter.c Adds mgmt_fe_notify_sub for non-socket (gRPC) subscribers. fe_notify_sub_lookup uses a linear list scan per notification; data pointer from assure_notify_msg_cache is not checked for NULL before being decoded.
lib/northbound.c Adds optional global dispatch hooks for RPC, async RPC, config-get, and notification subscribe/unsubscribe. Clean indirection layer with -EOPNOTSUPP sentinels for unregistered callbacks.
mgmtd/mgmt_txn.c Adds mgmt_txn_rpc_done_cb hook to txn_req_rpc and mgmt_txn_send_rpc_notify(); also fixes missing req->error assignment in two error paths before error-string is set.
grpc/frr-northbound.proto Adds Subscribe RPC (server-streaming) with SubscribeRequest/SubscribeResponse, SyncResponse, and Heartbeat messages. SubscribeResponse.update uses DataTree which has no path field.

Sequence Diagram

sequenceDiagram
    participant C as gRPC Client
    participant CQ as gRPC CQ Thread
    participant MT as Main Thread
    participant MGMT as mgmtd backend

    Note over C,MGMT: Subscribe (ON_CHANGE/STREAM/SAMPLE)
    C->>CQ: SubscribeRequest
    CQ->>MT: c_callback (run_mainthread)
    MT->>MT: validate selectors
    MT->>MT: nb_notification_data_subscribe()
    MT-->>CQ: "state=MORE, do_request() new listener"
    opt STREAM mode
        MT->>MT: "enqueue_state_snapshot(sync=true)"
        MT->>C: "SubscribeResponse{update}..."
        MT->>C: "SubscribeResponse{sync_response}"
    end
    opt SAMPLE mode
        MT->>MT: schedule_sample_timer()
        loop every sample_interval_ms
            MT->>MT: enqueue_state_snapshot()
            MT->>C: "SubscribeResponse{update}"
        end
    end
    Note over MT,MGMT: Notification delivery
    MGMT->>MT: mgmt_fe_adapter_send_notify()
    MT->>MT: grpc_notification_data_dispatch()
    MT->>CQ: async_responder.Write()
    CQ->>C: "SubscribeResponse{update}"
    Note over C,CQ: Client cancel / slow consumer
    C->>CQ: "cancel stream (ok=false)"
    CQ->>MT: subscribe_cq_error_event (deregister sub)
    MT->>MGMT: nb_notification_data_unsubscribe()

    Note over C,MGMT: Execute RPC
    C->>CQ: ExecuteRequest
    CQ->>MT: c_callback (run_mainthread)
    MT->>MT: execute_prepare_input()
    MT->>MT: nb_rpc_dispatch_async()
    MT->>MGMT: mgmt_txn_send_rpc_notify()
    MGMT-->>MT: mgmt_grpc_rpc_done()
    MT->>CQ: responder.Finish()
    CQ->>C: ExecuteResponse
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
lib/northbound_grpc.cpp:1045-1048
Heartbeat timer is rescheduled unconditionally even when `sub->cancelled = true`. This happens when `close_subscription` is called (slow-consumer path) with a write already in flight: the write hasn't completed yet so `deregister_subscription` hasn't run, `sub` is still non-NULL, but `sub->cancelled` is already true. The timer will keep firing at `heartbeat_interval_ms` cadence doing nothing useful until the pending write completes and `run_mainthread` finally calls `deregister_subscription`. Adding a `cancelled` guard matches the equivalent check in `sample_timer_event` and stops the redundant firings.

```suggestion
void SubscribeRpcState::schedule_heartbeat_timer(void)
{
	if (!sub || !sub->heartbeat_interval_ms || sub->cancelled)
		return;
```

### Issue 2 of 3
lib/northbound_grpc.cpp:820-821
**Notification xpath discarded in every SubscribeResponse**

`grpc_notification_data_dispatch` silently ignores the `xpath` parameter on every call, so `SubscribeResponse.update` (a bare `DataTree`) carries no path context. A client subscribing to multiple selectors — e.g., `/frr-ripd:authentication-failure` and `/frr-ospfd:neighbor-state-change` — must fully parse the serialised payload to determine which notification arrived. Adding the notification path to the response (either as a field in `DataTree` or in a wrapper message) would let clients dispatch without deserialising the payload.

### Issue 3 of 3
mgmtd/mgmt_fe_adapter.c:273-280
**`fe_notify_sub_lookup` is O(n) per notification**

Every call to `mgmt_fe_adapter_send_notify` iterates over `fe_notify_subs` once per matched session ID. With `k` gRPC subscriptions and `m` matched session IDs per notification the loop runs `k * m` times. For deployments with many concurrent Subscribe streams this adds measurable overhead to every notification dispatch. A hash table or `struct hash` keyed on `session_id` (as used elsewhere in this file for `mgmt_fe_sessions`) would give O(1) lookup and match the existing pattern.

Reviews (7): Last reviewed commit: "mgmtd, grpc: add Get, Execute and Subscr..." | Re-trigger Greptile

Comment thread lib/northbound_grpc.cpp
Comment thread mgmtd/mgmt_grpc.c Outdated
Comment thread mgmtd/mgmt_fe_adapter.c
@lamestllama lamestllama force-pushed the grpc-subscribe branch 2 times, most recently from 5f0f078 to 5f1acef Compare June 2, 2026 12:48
@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

Route gRPC Get(CONFIG) requests loaded in mgmtd through mgmtd's running datastore so the mgmtd gRPC endpoint returns central configuration rather than daemon-local process config.

Route gRPC Execute requests loaded in mgmtd through the backend RPC transaction machinery so daemon-owned YANG RPCs can be reached from the mgmtd gRPC endpoint.

Add the Subscribe RPC wire shape and implement ON_CHANGE notification delivery through mgmtd's frontend selector tree. STREAM sends local operational-state snapshots before registering for matching notifications, SAMPLE performs periodic local state reads, and quiet streams can emit optional heartbeats.

Keep Subscribe streams bounded by closing slow consumers with OUT_OF_RANGE, and clean up subscriptions on normal FINISH as well as cancellation. Add focused topotests for mgmtd Get(CONFIG), mgmtd Execute dispatch, RIPD notification delivery, selector matching, validation errors, heartbeats, STREAM snapshots, SAMPLE periodic reads and Subscribe back-pressure.

Signed-off-by: Eric Parsonage <eric@eparsonage.com>
@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

2 similar comments
@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

Comment thread lib/northbound_grpc.cpp
@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

1 similar comment
@lamestllama
Copy link
Copy Markdown
Author

@greptile[bot] review

@lamestllama lamestllama marked this pull request as draft June 2, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master mgmt FRR Management Infra size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant