Setup `audit` domain for tracking user/action events and metrics #628

amydevs · 2023-11-13T07:51:41Z

Specification

The audit domain of Polykey will be responsible for the auditing user-behaviour of Polykey nodes.

It should expose JS-RPC server-streaming handlers that yield audit events, and can also provide summary metrics of those audit events.

The subdomains of audit should be based on the domains of Polykey itself, and the audit domain should also contain Observable properties (see #444) derived from the events dispatched by each subdomain.

The JS-RPC API should be available via JS-WS, so that it is accessible from the services like the mainnet/testnet status page (see #599).

Furthermore, this JS-RPC API should replay all accumulated state for each metric upon initial opening of the server-streaming call, so that if the any connected services were to restart, they would be able to get all the existing metric data, much like how the rxjs shareReplay funciton.

Audit Events

Event Flow

Domain class instances will be injected dependencies into the Audit domain. This means that the other domains will be able to expose any data they want to record via Events without any semantics regarding the Audit domain. The Audit domain can listen to these events and record them in the database.

Database Schema

Using js-db, there will be several level for the Audit domain:

audit/ - The base Level
audit/topic/{topicId} - Topics Level
audit/events/{eventId} - Events level

eventIds will be made using IdSortable, so that they are completely monotonic. Furthermore, events can be accessed by iterating over a topic level (audit/topic/{topicId}), which yields multiple eventIds. This will be used to reference the events stored in the audit/events/{eventId}. By doing this, events are able to be apart of multiple topics as well.

Topics can be nested meaning that querying topic path of ['node', 'connection'] will return all audit events from their children (['node', 'connection', 'reverse'] and ['node', 'connection', 'forward'])

API

The basic API will use an AsyncGenerator that yields events from a specified topic:

function* getAuditEvents(topicPath: Array<string>, options: { seek?: EventId, seekEnd?: EventId, order?: 'asc' | 'desc', limit?: number }, tran?: DBTransaction): AsyncGenerator<AuditEvent>

The options offer pagination, where the user can limit the number of audit events that the generator will yield and call the generator again with seek set to the EventId of the last element that was yielded.

The second API method yields events live as they are being dispatched:

function* getAuditEventsLongRunning(topicPath: Array<string>, options: { seek?: EventId, seekEnd?: EventId, limit?: number }): AsyncGenerator<AuditEvent>

The reason why the paramaters are different is because that the iteration of new events beyond what is currently stored within the DB cannot be in any order other than chronologically ascending. Furthermore, as this method requires for multiple db transaction snapshots, there is no point for the caller to pass in a transaction to perform on. Note that generator.return() or generator.throw() must be called on the returned AsyncGenerator when either seekEnd or limit is not specified, as this call will run indefinitely until either throw or return is called, seekEnd or limit is reached, or audit.stop({ force: true }) is called.

Metrics

Metrics will need to be specc'd out further. However, currently, metrics are indexed by a MetricPath similar to AuditEvents. However, they are not stored in the DB, but rather derived from data within the DB.

API

The basic API returns a metric based on a topicPath and allows for the input of seek and seekEnd to specify a specific timeframe for the metric results. Metrics will have to be implemented on a case by case basis.

async function getAuditMetric(topicPath: Array<string>, options: { seek?: EventId, seekEnd?: EventId }): Promise<AuditMetric>

Possibly Relevant Metrics

Some Specific Metrics Include:

Active Connections
Node Graph Size
Gestalt Graph Size
Sigchain Claims
Geographical Data
etc.

Additional context

Tasks

Expose metrics from Polykey domains as events
Use Observables to convert EventTargets to Streams
Convert Observable Streams to WebStreams/AsyncIterables for Usage with JS-RPC.

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2023-11-13T19:27:17Z

Some important terminology to avoid being confused here.

Log - a log is just a record of something happening - they could be operational logs, they could be debug level, they could be info level - they are done via js-logger, and we expect to print it to STDERR in real time without buffering, and that is supposed to be collected by an orchestrator for operational analytics. These do not need to represent a state change. Logs are useful to eventually form traces - which can be used to operational observability which is useful for debugging.
Event - these are structured things that are supposed to be reacted to - we have programmatic events like js-events, which serve as the basis for Integrate Observables to coordinate push-based Dataflow (with reactive properties) #444 and the implementation of observables in the future. Usually these represent a state change, but not always.
Metrics - these are statistical summarisations. The basic metrics are:
- Counter - a number that goes upwards
- Gauge - a number that could go up and down
- Other ones here: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md

Now there are 2 kinds of things we want to observe:

Operational Observability
User/Action Observability

For this issue, the audit domain is focused on User/Action Observability. Not operational observability.

In that sense, we would want to:

Watch for Events Representing State Change
Record them into DB - thus representing the events as a log
The audit domain only reacts to events by recording them - it doesn't do anything logically
The audit domain can update metrics as new events come into play.

As for operational logs/metrics. Again logs are not kept around, they go to STDERR. However metrics can be kept somewhere in a separate area. There is a discussion about this here: MatrixAI/js-logger#15. It makes sense that something else should be maintaining state of operational observability, not the application itself. That way a focused system can specialise in operational observability. Usually this means something open-telemetry based. Things like memory usage is a good place to start.

One question is whether something is operational or not. Consider tracking node connections. Is this operational or is it a user/application event? It's hard to provide a clear distinction here. For a network monitoring app - it would be part of auditing. For this it is less so. I think though for the purposes of the the testnet and mainnet dashboard, this is something we will need to track in the audit domain.

CMCDragonkai · 2023-11-13T19:36:25Z

I was thinking that one needs to be able to have a streaming query.

So in some cases you can have a fixed snapshot query which is the default case when going over a rocksdb snapshot.

In other cases you would want an asynciterable over all existing records and any new records that have entered. In this case you have a infinite iterator, that never ends, unless the client decides to stop reading (by destroying it somehow).

CMCDragonkai · 2023-11-13T19:36:40Z

In other cases you would want an asynciterable over all existing records and any new records that have entered. In this case you have a infinite iterator, that never ends, unless the client decides to stop reading (by destroying it somehow).

To do this you may consider a cursor.

CMCDragonkai · 2023-11-16T02:02:42Z

CMCDragonkai · 2023-11-16T02:03:08Z

CMCDragonkai · 2023-11-16T20:00:10Z

We won't ever expect PK to have to graphing libraries - definitely not in PK CLI - maybe in PK Desktop or PK Mobile - it'd have to be extremely lightweight though, don't want to bloat it up.

But operational metrics will go to grafana.

tegefaulkes · 2023-11-16T23:42:08Z

My go to for visualisation is https://d3js.org/, It's pretty light weight (280kb) and only needs a canvas or svg to render.

amydevs · 2023-11-16T23:54:55Z

In other cases you would want an asynciterable over all existing records and any new records that have entered. In this case you have a infinite iterator, that never ends, unless the client decides to stop reading (by destroying it somehow).

To do this you may consider a cursor.

@CMCDragonkai should the seeking with the cursor include the element with the id that you seeked?

CMCDragonkai · 2023-11-17T00:56:55Z

In other cases you would want an asynciterable over all existing records and any new records that have entered. In this case you have a infinite iterator, that never ends, unless the client decides to stop reading (by destroying it somehow).

To do this you may consider a cursor.

@CMCDragonkai should the seeking with the cursor include the element with the id that you seeked?

Yes it's always inclusive. And if there is an "ending seek", you always do inclusive then exclusive. It's the pythonic way. Actually it's standard math notation for range selection.

CMCDragonkai · 2023-11-17T00:57:43Z

@amydevs I've added related issues #179 cause that was there, you should always do a quick search on the board. You should review that too, close it if you can incorporate its tasks/spec into here.

amydevs · 2023-11-20T06:59:52Z

getAuditEvents and getAuditEventsLongRunning need not be combined. The reason for this is that the transaction of getAuditEvents is parameterized, whilst getAuditEventsLongRunning takes multiple transaction snapshots to be able to support live data. Instead, this functionality will be combined in the associated RPC handler, where the transaction is abstracted away.

CMCDragonkai · 2023-11-20T07:19:07Z

If you are streaming the results live, shouldn't you just abstract it all in the handler and only need 1 handler?

amydevs · 2023-11-20T07:58:13Z

If you are streaming the results live, shouldn't you just abstract it all in the handler and only need 1 handler?

yes, i've abstracted it so that only one handler is used. The one handler will appropriately switch between the long running and normal version of getAuditEvents.

amydevs · 2023-11-20T08:46:35Z

I've got the js-rpc handlers working, still need to write them into the spec.

The last thing left to do is to rework how metrics are captured with rolling averages, etc.

MatrixAI/Polykey-CLI#40 (comment)

Otherwise, this should be ready to merge after a squash.

amydevs added the development Standard development label Nov 13, 2023

CMCDragonkai changed the title ~~audit Domain~~ Setup audit domain for accumulating events and metrics Nov 13, 2023

CMCDragonkai assigned amydevs Nov 13, 2023

This was referenced Nov 14, 2023

CLI Beta Launch MatrixAI/Polykey-CLI#40

Closed

nodes status should include connection and graph stats MatrixAI/Polykey-CLI#52

Merged

amydevs mentioned this issue Nov 16, 2023

audit Domain #634

Merged

9 tasks

CMCDragonkai mentioned this issue Nov 17, 2023

Setting up diagnostics Domain for keeping track of some operational metrics #635

Open

CMCDragonkai changed the title ~~Setup audit domain for accumulating events and metrics~~ Setup audit domain for tracking user/action events and metrics Nov 17, 2023

amydevs mentioned this issue Nov 22, 2023

audit Domain Metrics should be rolling and calculated per AuditEvent insertion #636

Open

1 task

amydevs closed this as completed in #634 Nov 22, 2023

CMCDragonkai mentioned this issue May 7, 2024

Access Audit Log #179

Closed

CMCDragonkai added r&d:polykey:supporting activity Supporting core activity r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup `audit` domain for tracking user/action events and metrics #628

Setup `audit` domain for tracking user/action events and metrics #628

amydevs commented Nov 13, 2023 •

edited

Loading

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 16, 2023

CMCDragonkai commented Nov 16, 2023

CMCDragonkai commented Nov 16, 2023

tegefaulkes commented Nov 16, 2023 •

edited

Loading

amydevs commented Nov 16, 2023

CMCDragonkai commented Nov 17, 2023

CMCDragonkai commented Nov 17, 2023

amydevs commented Nov 20, 2023

CMCDragonkai commented Nov 20, 2023

amydevs commented Nov 20, 2023

amydevs commented Nov 20, 2023 •

edited

Loading

Setup audit domain for tracking user/action events and metrics #628

Setup audit domain for tracking user/action events and metrics #628

Comments

amydevs commented Nov 13, 2023 • edited Loading

Specification

Audit Events

Event Flow

Database Schema

API

Metrics

API

Possibly Relevant Metrics

Additional context

Tasks

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 13, 2023

CMCDragonkai commented Nov 16, 2023

CMCDragonkai commented Nov 16, 2023

CMCDragonkai commented Nov 16, 2023

tegefaulkes commented Nov 16, 2023 • edited Loading

amydevs commented Nov 16, 2023

CMCDragonkai commented Nov 17, 2023

CMCDragonkai commented Nov 17, 2023

amydevs commented Nov 20, 2023

CMCDragonkai commented Nov 20, 2023

amydevs commented Nov 20, 2023

amydevs commented Nov 20, 2023 • edited Loading

Setup `audit` domain for tracking user/action events and metrics #628

Setup `audit` domain for tracking user/action events and metrics #628

amydevs commented Nov 13, 2023 •

edited

Loading

tegefaulkes commented Nov 16, 2023 •

edited

Loading

amydevs commented Nov 20, 2023 •

edited

Loading