-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup audit
domain for tracking user/action events and metrics
#628
Comments
audit
Domainaudit
domain for accumulating events and metrics
Some important terminology to avoid being confused here.
Now there are 2 kinds of things we want to observe:
For this issue, the In that sense, we would want to:
As for operational logs/metrics. Again logs are not kept around, they go to STDERR. However metrics can be kept somewhere in a separate area. There is a discussion about this here: MatrixAI/js-logger#15. It makes sense that something else should be maintaining state of operational observability, not the application itself. That way a focused system can specialise in operational observability. Usually this means something open-telemetry based. Things like memory usage is a good place to start. One question is whether something is operational or not. Consider tracking node connections. Is this operational or is it a user/application event? It's hard to provide a clear distinction here. For a network monitoring app - it would be part of auditing. For this it is less so. I think though for the purposes of the the testnet and mainnet dashboard, this is something we will need to track in the |
I was thinking that one needs to be able to have a streaming query. So in some cases you can have a fixed snapshot query which is the default case when going over a rocksdb snapshot. In other cases you would want an asynciterable over all existing records and any new records that have entered. In this case you have a infinite iterator, that never ends, unless the client decides to stop reading (by destroying it somehow). |
To do this you may consider a cursor. |
We won't ever expect PK to have to graphing libraries - definitely not in PK CLI - maybe in PK Desktop or PK Mobile - it'd have to be extremely lightweight though, don't want to bloat it up. But operational metrics will go to grafana. |
My go to for visualisation is https://d3js.org/, It's pretty light weight (280kb) and only needs a canvas or svg to render. |
@CMCDragonkai should the seeking with the cursor include the element with the id that you seeked? |
Yes it's always inclusive. And if there is an "ending seek", you always do inclusive then exclusive. It's the pythonic way. Actually it's standard math notation for range selection. |
audit
domain for accumulating events and metricsaudit
domain for tracking user/action events and metrics
getAuditEvents and getAuditEventsLongRunning need not be combined. The reason for this is that the transaction of |
If you are streaming the results live, shouldn't you just abstract it all in the handler and only need 1 handler? |
yes, i've abstracted it so that only one handler is used. The one handler will appropriately switch between the long running and normal version of |
I've got the js-rpc handlers working, still need to write them into the spec. The last thing left to do is to rework how metrics are captured with rolling averages, etc. MatrixAI/Polykey-CLI#40 (comment) Otherwise, this should be ready to merge after a squash. |
Specification
The
audit
domain of Polykey will be responsible for the auditing user-behaviour of Polykey nodes.It should expose
JS-RPC
server-streaming handlers that yield audit events, and can also provide summary metrics of those audit events.The subdomains of
audit
should be based on the domains of Polykey itself, and the audit domain should also contain Observable properties (see #444) derived from the events dispatched by each subdomain.The
JS-RPC
API should be available viaJS-WS
, so that it is accessible from the services like the mainnet/testnet status page (see #599).Furthermore, this
JS-RPC
API should replay all accumulated state for each metric upon initial opening of the server-streaming call, so that if the any connected services were to restart, they would be able to get all the existing metric data, much like how the rxjs shareReplay funciton.Audit Events
Event Flow
Domain class instances will be injected dependencies into the Audit domain. This means that the other domains will be able to expose any data they want to record via Events without any semantics regarding the Audit domain. The Audit domain can listen to these events and record them in the database.
Database Schema
Using
js-db
, there will be several level for theAudit
domain:audit/
- The base Levelaudit/topic/{topicId}
- Topics Levelaudit/events/{eventId}
- Events leveleventId
s will be made usingIdSortable
, so that they are completely monotonic. Furthermore, events can be accessed by iterating over a topic level (audit/topic/{topicId}
), which yields multipleeventId
s. This will be used to reference the events stored in theaudit/events/{eventId}
. By doing this, events are able to be apart of multiple topics as well.Topics can be nested meaning that querying topic path of
['node', 'connection']
will return all audit events from their children (['node', 'connection', 'reverse']
and['node', 'connection', 'forward']
)API
The basic API will use an AsyncGenerator that yields events from a specified topic:
The options offer pagination, where the user can limit the number of audit events that the generator will yield and call the generator again with
seek
set to the EventId of the last element that was yielded.The second API method yields events live as they are being dispatched:
The reason why the paramaters are different is because that the iteration of new events beyond what is currently stored within the DB cannot be in any order other than chronologically ascending. Furthermore, as this method requires for multiple db transaction snapshots, there is no point for the caller to pass in a transaction to perform on. Note that
generator.return()
orgenerator.throw()
must be called on the returnedAsyncGenerator
when eitherseekEnd
orlimit
is not specified, as this call will run indefinitely until eitherthrow
orreturn
is called,seekEnd
orlimit
is reached, oraudit.stop({ force: true })
is called.Metrics
Metrics will need to be specc'd out further. However, currently, metrics are indexed by a
MetricPath
similar toAuditEvents
. However, they are not stored in the DB, but rather derived from data within the DB.API
The basic API returns a metric based on a
topicPath
and allows for the input ofseek
andseekEnd
to specify a specific timeframe for the metric results. Metrics will have to be implemented on a case by case basis.Possibly Relevant Metrics
Some Specific Metrics Include:
Additional context
NodeID
has changed. #386nodes status
command should include connection and graph stats Polykey-CLI#36Tasks
JS-RPC
.The text was updated successfully, but these errors were encountered: