mgmtd, grpc: add Get(CONFIG), Execute and Subscribe support#22158
mgmtd, grpc: add Get(CONFIG), Execute and Subscribe support#22158lamestllama wants to merge 1 commit into
Conversation
8e18117 to
190dcb2
Compare
190dcb2 to
0bdfa6b
Compare
Greptile SummaryThis PR wires up
Confidence Score: 4/5The core Execute and Get(CONFIG) paths through mgmtd are straightforward and well-tested; normal Subscribe operation is functionally correct but stream teardown paths have edge cases in timer events that remain unaddressed from prior review rounds. The synchronous and async RPC dispatch in mgmt_grpc.c is clean and refcount-correct. The CQ loop fix no longer kills the whole service on a single stream cancellation. The main risk area is SubscribeRpcState: previously flagged leaks when finish_from_event_thread is called from a timer without setting state=FINISH remain open. The heartbeat timer reschedule after slow-consumer close reported here is minor. The overall design is sound and the topotests cover the main modes well. lib/northbound_grpc.cpp — specifically the finish_from_event_thread / timer-event paths in SubscribeRpcState where state=FINISH is never set, leaving the tag unreachable by the CQ delete branch. Important Files Changed
Sequence DiagramsequenceDiagram
participant C as gRPC Client
participant CQ as gRPC CQ Thread
participant MT as Main Thread
participant MGMT as mgmtd backend
Note over C,MGMT: Subscribe (ON_CHANGE/STREAM/SAMPLE)
C->>CQ: SubscribeRequest
CQ->>MT: c_callback (run_mainthread)
MT->>MT: validate selectors
MT->>MT: nb_notification_data_subscribe()
MT-->>CQ: "state=MORE, do_request() new listener"
opt STREAM mode
MT->>MT: "enqueue_state_snapshot(sync=true)"
MT->>C: "SubscribeResponse{update}..."
MT->>C: "SubscribeResponse{sync_response}"
end
opt SAMPLE mode
MT->>MT: schedule_sample_timer()
loop every sample_interval_ms
MT->>MT: enqueue_state_snapshot()
MT->>C: "SubscribeResponse{update}"
end
end
Note over MT,MGMT: Notification delivery
MGMT->>MT: mgmt_fe_adapter_send_notify()
MT->>MT: grpc_notification_data_dispatch()
MT->>CQ: async_responder.Write()
CQ->>C: "SubscribeResponse{update}"
Note over C,CQ: Client cancel / slow consumer
C->>CQ: "cancel stream (ok=false)"
CQ->>MT: subscribe_cq_error_event (deregister sub)
MT->>MGMT: nb_notification_data_unsubscribe()
Note over C,MGMT: Execute RPC
C->>CQ: ExecuteRequest
CQ->>MT: c_callback (run_mainthread)
MT->>MT: execute_prepare_input()
MT->>MT: nb_rpc_dispatch_async()
MT->>MGMT: mgmt_txn_send_rpc_notify()
MGMT-->>MT: mgmt_grpc_rpc_done()
MT->>CQ: responder.Finish()
CQ->>C: ExecuteResponse
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
lib/northbound_grpc.cpp:1045-1048
Heartbeat timer is rescheduled unconditionally even when `sub->cancelled = true`. This happens when `close_subscription` is called (slow-consumer path) with a write already in flight: the write hasn't completed yet so `deregister_subscription` hasn't run, `sub` is still non-NULL, but `sub->cancelled` is already true. The timer will keep firing at `heartbeat_interval_ms` cadence doing nothing useful until the pending write completes and `run_mainthread` finally calls `deregister_subscription`. Adding a `cancelled` guard matches the equivalent check in `sample_timer_event` and stops the redundant firings.
```suggestion
void SubscribeRpcState::schedule_heartbeat_timer(void)
{
if (!sub || !sub->heartbeat_interval_ms || sub->cancelled)
return;
```
### Issue 2 of 3
lib/northbound_grpc.cpp:820-821
**Notification xpath discarded in every SubscribeResponse**
`grpc_notification_data_dispatch` silently ignores the `xpath` parameter on every call, so `SubscribeResponse.update` (a bare `DataTree`) carries no path context. A client subscribing to multiple selectors — e.g., `/frr-ripd:authentication-failure` and `/frr-ospfd:neighbor-state-change` — must fully parse the serialised payload to determine which notification arrived. Adding the notification path to the response (either as a field in `DataTree` or in a wrapper message) would let clients dispatch without deserialising the payload.
### Issue 3 of 3
mgmtd/mgmt_fe_adapter.c:273-280
**`fe_notify_sub_lookup` is O(n) per notification**
Every call to `mgmt_fe_adapter_send_notify` iterates over `fe_notify_subs` once per matched session ID. With `k` gRPC subscriptions and `m` matched session IDs per notification the loop runs `k * m` times. For deployments with many concurrent Subscribe streams this adds measurable overhead to every notification dispatch. A hash table or `struct hash` keyed on `session_id` (as used elsewhere in this file for `mgmt_fe_sessions`) would give O(1) lookup and match the existing pattern.
Reviews (7): Last reviewed commit: "mgmtd, grpc: add Get, Execute and Subscr..." | Re-trigger Greptile |
5f0f078 to
5f1acef
Compare
|
@greptile[bot] review |
5f1acef to
651e969
Compare
Route gRPC Get(CONFIG) requests loaded in mgmtd through mgmtd's running datastore so the mgmtd gRPC endpoint returns central configuration rather than daemon-local process config. Route gRPC Execute requests loaded in mgmtd through the backend RPC transaction machinery so daemon-owned YANG RPCs can be reached from the mgmtd gRPC endpoint. Add the Subscribe RPC wire shape and implement ON_CHANGE notification delivery through mgmtd's frontend selector tree. STREAM sends local operational-state snapshots before registering for matching notifications, SAMPLE performs periodic local state reads, and quiet streams can emit optional heartbeats. Keep Subscribe streams bounded by closing slow consumers with OUT_OF_RANGE, and clean up subscriptions on normal FINISH as well as cancellation. Add focused topotests for mgmtd Get(CONFIG), mgmtd Execute dispatch, RIPD notification delivery, selector matching, validation errors, heartbeats, STREAM snapshots, SAMPLE periodic reads and Subscribe back-pressure. Signed-off-by: Eric Parsonage <eric@eparsonage.com>
651e969 to
2b325bd
Compare
|
@greptile[bot] review |
2 similar comments
|
@greptile[bot] review |
|
@greptile[bot] review |
|
@greptile[bot] review |
1 similar comment
|
@greptile[bot] review |
Summary
Add gRPC
Get(CONFIG),ExecuteandSubscribesupport throughmgmtd.Get(CONFIG)reads from mgmtd's running datastore when the gRPC module isloaded into
mgmtd, so themgmtdgRPC endpoint returns central configuration.Executeuses mgmtd's backend RPC transaction path, so a request received onthe
mgmtdgRPC endpoint can reach the daemon that registered the RPC. Oneexample is
ripd's/frr-ripd:clear-rip-route.Subscribeadds a server-streaming path for notifications and operationalstate:
ON_CHANGEregisters notification selectors through mgmtd.STREAMsends the current operational state, then async_responsemarker,and then stays registered for matching notifications.
SAMPLEreads operational state at the requested interval.OUT_OF_RANGEonce their pending responsequeue reaches the configured limit.
The daemon side stays generic: daemons continue to register RPCs and
notifications with mgmtd, and gRPC becomes another frontend for that existing
machinery.
The subscription path also handles stream cancellation and shutdown explicitly.
Notification delivery re-checks the live subscription under the RPC state lock
before writing to the stream. During shutdown, subscription state is detached
from the stream and CQ tags are left to process teardown rather than being
reclaimed while notification callbacks may still be unwinding.
Why
Daemon-owned RPCs and notifications were already visible to mgmtd, but the
mgmtdgRPC endpoint could not read mgmtd's central running configuration,dispatch
Executerequests to backend daemons or deliver daemon notificationsto gRPC subscribers.
This puts gRPC on the same frontend/backend path as the rest of mgmtd.
Testing
Added topotests for:
Get(CONFIG)throughmgmtd, including repeated subtree reads and rootdatastore reads
Executedispatch toripdusing/frr-ripd:clear-rip-routeSubscribe ON_CHANGESTREAMinitial state andsync_responseSAMPLEperiodic readsThere are tests available for the
ospfdandospf6ddaemons that will workwith OSPF after #22058, or its successor, is merged.
Related work
#20883 also implements
Get(CONFIG), using a direct mgmtd-specific path fromlib/northbound_grpc.cppinto mgmtd internals.That path includes mgmtd private headers, declares
extern struct mgmt_master *mm, calls mgmtd helper code from the generic gRPC module, serialises thedatastore subtree as LYB, and parses it back into a libyang tree for the usual
gRPC JSON or XML output path.
This PR keeps the generic gRPC module daemon-neutral. It asks libfrr's
northbound dispatcher for configuration data, and mgmtd provides the registered
implementation when the module is loaded there. The client-visible result is
similar, but the ownership boundary is clearer:
northbound_grpc.cpp-> libfrr northbound dispatcher
-> mgmtd registered implementation
-> mgmtd running datastore
-> copied libyang tree
-> normal gRPC JSON or XML output