Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237

CMCDragonkai · 2021-09-02T06:41:42Z

Specification

Pagination is the process by which a client program can acquire a subset of data from a larger stream of data.

Polykey maintains potentially large streams of data:

Vaults
Sigchain
Nodes Database
Gestalts

Atm, all of this data is either returned in 1 unary call which returns an in-memory deserialised array of data, or it is returned with a stream. This creates a problem when the amount of data is large, or when you want to go to specific point in the stream and not have to stream from the beginning again.

The standard for doing this is "pagination". Pagination uses a "cursor" to index into a larger dataset. The 2 main ways of pagination are:

Cursor - use a ordered key and an limit
Offset - start with an offset

Of the 2, cursor pagination is the more "simpler" and flexible form and fits into our usecase quite well.

In addition to this, one can combine cursors with streaming to return a stream of results based on the cursor. The only difference at this point is holding things in memory when you are streaming versus accumulating the results in memory and returning the result.

In the case of returning a static result in-memory, you free-up locks but you use up more memory. In the case of returning a stream, you may use less memory but it ends up being more complicated and more locking takes place. Due to our usage of leveldb and its streams making use of leveldb snapshots this may hide some of the complexity.

As a first stage prototype, let's add in pagination to all unary calls, and return static result arrays to be used by the CLI/GUI. Later we can explore using streaming.

We've built pagination protocol before here: https://github.com/MatrixAI/js-pagination that library is intended to be used on the client, but it describes what you might expect the server to take. It would mean GRPC methods will need:

direction
seek
limit
seekBefore
seekAfter

The last 2 may not be so necessary as they introduce more flexibility.

We've done this before on the Prism project, so there are some fiddly things here to note that require further discussion.

Another point is that streaming may be more useful for "reactivity". Or observational APIs. PK isn't configured to push events out anywhere. If we intend to do some CQRS to the GUI to maintain eventual consistency, we may need to figure out if we designate streams as "live" events, so that downstream UIs can react to changes in state. See: https://stackoverflow.com/questions/39439653/events-vs-streams-vs-observables-vs-async-iterators/47214496

We want to apply the following parameters to any generator method. This should be the standard we use to allow pagination on steams. This can be applied to any GRPC streams as well.

    {
      order = 'asc',
      seek,
      limit
    }: {
      order?: 'asc' | 'desc';
      seek?: ClaimId;
      limit?: number;
    } = {},

Service handlers

GRPC service handlers for RPC calls that provide streaming need to support this pagination. The pagination parameters need to be supplied either as part of the requesting message or the call metadata. These parameters are provided to the generator we are consuming for the stream.

This will need to be applied to every RPC call that returns a stream.

CLI Commands

Some CLI commands output a list as a result. We need to apply pagination here as well. The CLI command will need to have parameters for seek, order and limit as specified above. Using these parameters it should be simple to make the GRPC call with them.

Domain Collections

In a few domains we provide a getXs that will provide a generator for a collection. And example of this is the sigchain.getClaims or gestaltGraph.getGestalts. Theses will need to take the pagination parameters as specified. THis will be the basis for the other two sections. Reference the sigchain's implementation for how this is done.

    {
      order = 'asc',
      seek,
      limit
    }: {
      order?: 'asc' | 'desc';
      seek?: ClaimId;
      limit?: number;
    } = {},

Additional context

https://cloud.google.com/apis/design/design_patterns#list_pagination - some discussion about list pagination and modelling them into protobuf message types
Related Sigchain Class API should provide paginated ordered claims by returning Array-POJO and indexed access #327 (comment)

Tasks

- Identify GRPC methods requiring pagination, method calls are the ones returning lists of results
- Identify which can be streams, or which should just be returning static arrays
- Identify which streams if any should be part of "reactive" APIs
- Change the protobuf message types to incorporate pagination parameters
- Update GRPC service handlers to make use of the pagination parameters in a common way, create utility functions to be shared between all of them
- Update tests to include testing the pagination mechanism, create a common utility functions for this to test all paginated APIs
- CLI code may not need to make use of the pagination, and may just ask for the entire stream of data, this is because CLI workflows rarely have any pagination built in, so we don't need this.
- GUI code does make use of pagination, pagination is more a common usecases in GUI, so this should be considered.
- Pagination parameters described above need to applied to all relevant iterators across all domains.

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2021-09-03T09:29:51Z

When working on pagination in Prism we discovered one UI/UX issue.

Noticed a bug in pagination. Actually I think I saw this before. The problem is that you're on the "first page", but the direction is still false. So if you change the limit, it changes the limit but doesn't reset the direction. Now I remember that the reset button was intended for this. But this is a bit unintuitive behaviour regarding cursor. Might be a good idea to visualise the cursor direction so that users know to control it.

With cursor pagination direction is essential for being able to "go back" in pages. If the pagination only allows going forward in pages, that's fine, but that usually is not the full functionality expected for user interfaces. Once you have the seek key, if you want to paginate backwards, it can only be done if direction is flipped to false. But if you do this, it can be complicated if the other parameters are being changed, such as changing the limit.

So there has to be a "redesign" of the pagination UX at least expected from the library (maybe after a review of CQRS as well), there is still too much logic being done on the client side. More logic needs to be incorporated into the js-pagination with examples of how it's done with VueX store and dealing with GC as well.

Also possible issues with dealing with the last page/end of the paginated stream should be considered as well.

CMCDragonkai · 2022-07-16T08:45:31Z

This should be reviewed with respect to push-pull dataflow and control flow.

Push vs Pull

Push vs pull is 2 paradigms of "reactivity" (https://en.wikipedia.org/wiki/Reactive_programming).

These concepts are applicable widely in many scenarios.

Configuration Management
- Push configuration like Ansible, that push desired state to a target system
- Pull configuration like Chef or agentful systems that pull desired state
- Always Push then Pull, pulling requires bootstrapping off a pushed configuration first
- Target system is not aware of origin system (origin system is where the origin of change begins)
Pagination vs Streaming
- Pagination is pulling
- Streaming is pushing
- Hybrid systems require a client to initiate a pull, which can establish a push channel for the server to stream results.
Amphibious Operations
- Initial beachhead is a push
- Subsequent logistics is pull
Polykey Integration
- Pushing capabilities require the pusher to have knowledge of the target system, and how to push - PK integrates into target systems
- Pulling capabilities require the target system to know how to pull from PK - target systems "integrate into PK"
Framework vs Library
- You call the Library
- You get called by the Framework
- Framework can push into your system, and Framework may pull from your system, either way the framework dictates your API
- Your system can push into the library, and you may pull from the Library, you dictate the library's API
Dependency Graphs
- These are pull based systems
- Downstream dependencies pull upstream dependencies
- Dataflow is from upstream to downstream
- Control flow is from downstream to upstream

You want both depending on the circumstance, and many systems are both push and pull, just in different ways.

Push and pull systems when composed together can form a graph. This graph does not need to be acyclic, cycles in the graph can occur. Reactive systems are ultimately something that can be cyclic. But cyclic does not imply unproductive infinite evaluation. Productivity can still occur with infinite evaluation. Complete evaluation of the graph is not possible. Fundamentally systems are lazy and eventually consistent. Consider 2 agents communicating to each other. Each agent is a state machine. Each transition of the state may trigger transition in state on the other agent. The relationship is not one way, but 2 ways. Even in configuration systems, real state forms feedback into desired state. Thus an iterative system occurs as long as the system is "unstable". Stablity may never be reached... it's possible divergence can occur. Managing divergence is an exercise in complexity. Think about machine learning systems: convergence and divergence. Stability may be a "process", not an end state, just like security. Perturbations occur in complex systems simply due to change and entropy.

The Origin of Change is an important concept. In a push interaction, the origin of change starts at the system pushing. In a pull interaction, the origin of change still occurs the system being pulled. It's the change being applied in a configuration management system.

The initiator of the transaction is also important. This dictates which system has knowledge about the other system. This is independent of the origin of change (which indicates the direction of dataflow). The initiator of the transaction implies a "dependency" relationship in terms of integration direction.

The direction of dataflow may be opposite to the direction of dependency (data flow vs control flow).

                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Push────────► Realised State │
│               │                    │                │
└───────┬───────┘                    └────────▲───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow



                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Pull────────► Realised State │
│               │                    │                │
└───────▲───────┘                    └────────┬───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow

In push based systems, the dataflow is the direction of the pusher to the pushed.

In pull based systems, the dataflow is the direction of the pulled to the puller.

In push based systems, the control flow is the direction of the pusher to the pushed. The pusher is aware of the pushed.

In pull based systems, the control flow is the direction of the puller to the pulled. The puller is aware of the pulled.

Primitives in JS in push vs pull:

https://stackoverflow.com/questions/39439653/events-vs-streams-vs-observables-vs-async-iterators

https://github.com/kriskowal/gtor/blob/master/presentation/README.md

So all of computing are reactive systems.

CMCDragonkai · 2022-12-06T10:01:24Z

This is being pulled from #327 to here:

Refactor src/agent/service/nodesChainDataGet.ts to instead be src/agent/service/sigchainClaimsGet.ts and this should also receive pagination parameters. This includes seek, limit and order. The order should be a protobuf enum: https://developers.google.com/protocol-buffers/docs/proto3#enum

CMCDragonkai · 2023-01-09T02:47:47Z

With the transition to JSON RPC, this is still valid. We will still be returning collection data as a stream of individual JSON messages. However we will need to take input parameters to act as a cursor to control where the stream starts.

CMCDragonkai · 2023-01-09T02:53:40Z

For the input JSON request, we can reserve a meta keyword in the params property. And this can be where we put "authentication" details.

Of course things like direction, limit, seek, they would just be at the root level of the params property.

CMCDragonkai · 2023-02-17T07:06:27Z

When we move from the GRPC to the JSON RPC, we want to have the seek, limit and order as parameters on the JSONRPCRequest object in the params subobject.

These will translate directly into the server streaming handlers, which themselves will just hand it over any async generator method.

CMCDragonkai · 2023-02-17T07:09:33Z

On the caller side, these parameters should be passable from the CLI parameters.

So for example:

pk vaults list --seek <vaultId> --limit 10 --order asc

So techncially our CLI doesn't really do much here. It doesn't become that useful until you get a the GUI ready.

Normally...

pk vaults list

Will just stream the entire collection fully.

For the calling side, if it calls a stream streaming method, it should use the output formater at each iteration, it shouldn't be accumulating all the data then outputting it. This is what will enable the CLI to also be streamable.

CMCDragonkai · 2023-03-24T05:47:46Z

@tegefaulkes I remember we discussed this especially in reference to changes you're doing for deadlines, did you add in the pagination capability to the stream handlers on the server side? And all that needs to be done is to propagate --seek, --limit and --order parameters from CLI to the client calls?

tegefaulkes · 2023-03-24T05:48:59Z

I don't think I've made changes for this yet.

CMCDragonkai · 2023-03-24T05:49:32Z

Neither the server nor client side?

tegefaulkes · 2023-03-26T23:54:07Z

I recall looking it over and seeing the generators implementing the seeking behaviour. Right now I don't recall if its standard across the board. I don't think all the bin commands where seeking applies have all of the seeking options right now.

CMCDragonkai · 2023-04-06T04:06:15Z

All bin commands has seeking? But do all client handlers have relevant seeking parameters?

tegefaulkes · 2023-04-06T04:22:17Z

I don't recall at this time. Its something I'll have to check.

CMCDragonkai · 2023-04-24T06:29:25Z

Check all CLI commands for --seek and --limit and --direction.

Check all JSON RPC handlers for seek, limit and direction.

Apply them to the generator codes.

tegefaulkes · 2023-07-28T07:51:09Z

nodesClaimsGet handler has the seeking disabled currently. It needs to be re-enabled as part of this issue.

CMCDragonkai · 2023-10-26T20:03:16Z

@addievo also review this too.

CMCDragonkai · 2023-11-14T23:47:46Z

In the context of audit domain - this will be important.

You start with a DB transactional snapshot iterator. That becomes a AsyncIterable through AsyncGenerator syntax, and then at the client service it comes a server streaming call.

#599 - dashboard backend may use js-rpc and js-ws to make a server streaming call to the seed cluster agents.

When doing so, it will need provide some parameters to control the result.

Normally the result is finite. You can control the finiteness using pagination parameters as expressed above.

@amydevs - there are no client service handlers that currently behave as per the OP spec.

We can start with the audit domain to do so.

If you do a while loop where you are continuously calling the server streaming call to get the new results, while preserving a cursor, that is equivalent to having an infinite iterator, that is one way to do get live updates for #599. This is still a pull-based architecture.

Alternatively there could be a server streaming call, that would always be alive. And it is now on the handler side's responsibility to push data into the call. Then the client is pulling forever. It would then only close if the client decides to close stream.

Dashboard service singleton which could do both.

If we do both, there should be a standard of distinguishing between these kinds of server stream calls.

getNodes() - default to finite
getNodesInfinite() - default to infinite?

Another way is to provide a parameter that distinguishes the 2.

    {
      order = 'asc',
      seek,
      limit
    }: {
      order?: 'asc' | 'desc';
      seek?: ClaimId;
      limit?: number;
    } = {},

Imagine:

limit = 10 - this is always finite
limit = undefined - ?
limit = Infinity - ? - the problem with this, is that it doesn't exist in JSON - however it does become null
limit = 0 - this is still finite
limit = -1 - ?

Therefore we could do something like:

limit = 10 - that is finite
limit = undefined - that is whatever the handler decides it is
limit = null - this means Infinity - at this point we could choose to have an infinite server stream - then you could have the audit handler facilitate this internally - if the handler cannot support infinite streams, it can treat this as equivalent to undefined

Also by default - I like to prefer undefined to mean result set of 1. Rather than 0.

CMCDragonkai · 2023-11-14T23:49:42Z

Also if you want to limit by a seek, you could add one more parameter that seekLimit - which represents the end of the seek.

It is mutually exclusive to limit. So you can only use one of them at a time.

CMCDragonkai · 2023-11-14T23:53:35Z

We can start this issue in the audit domain first - but closing this issue will require full adoption in the client service and agent service.

tegefaulkes · 2024-02-08T23:56:23Z

I'm moving this to todo since it's not actively worked on.

CMCDragonkai added development Standard development design Requires design enhancement New feature or request labels Sep 2, 2021

CMCDragonkai mentioned this issue Oct 18, 2021

Transport Agnostic RPC #249

Closed

14 tasks

CMCDragonkai mentioned this issue Dec 5, 2022

Sigchain Class API should provide paginated ordered claims by returning Array-POJO and indexed access #327

Closed

tegefaulkes changed the title ~~GRPC Pagination via Cursors and Streams~~ Pagination Deployment to Service Handlers and CLI Commands and Domain Collections Dec 6, 2022

CMCDragonkai mentioned this issue Feb 17, 2023

Migrate from GRPC to JSON RPC (with binary variant for higher performance) #495

Closed

CMCDragonkai assigned tegefaulkes Apr 6, 2023

tegefaulkes mentioned this issue May 23, 2023

Agent RPC migration #512

Closed

CMCDragonkai added the r&d:polykey:supporting activity Supporting core activity label Jul 10, 2023

This was referenced Jul 20, 2023

js-quic integration and Agent migration #525

Merged

Agent migration stage 2 #535

Merged

tegefaulkes mentioned this issue Oct 25, 2023

CLI Beta Launch MatrixAI/Polykey-CLI#40

Closed

CMCDragonkai assigned okneigres and amydevs and unassigned tegefaulkes Nov 14, 2023

amydevs mentioned this issue Nov 14, 2023

Setup audit domain for tracking user/action events and metrics #628

Closed

tegefaulkes unassigned okneigres Dec 13, 2023

tegefaulkes unassigned amydevs Feb 8, 2024

CMCDragonkai mentioned this issue Mar 15, 2024

Standardise STDOUT output for CLI commands MatrixAI/Polykey-CLI#22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237

Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237

CMCDragonkai commented Sep 2, 2021 •

edited by tegefaulkes

Loading

CMCDragonkai commented Sep 3, 2021

CMCDragonkai commented Jul 16, 2022

CMCDragonkai commented Dec 6, 2022 •

edited

Loading

CMCDragonkai commented Jan 9, 2023

CMCDragonkai commented Jan 9, 2023

CMCDragonkai commented Feb 17, 2023

CMCDragonkai commented Feb 17, 2023

CMCDragonkai commented Mar 24, 2023

tegefaulkes commented Mar 24, 2023

CMCDragonkai commented Mar 24, 2023

tegefaulkes commented Mar 26, 2023

CMCDragonkai commented Apr 6, 2023

tegefaulkes commented Apr 6, 2023

CMCDragonkai commented Apr 24, 2023

tegefaulkes commented Jul 28, 2023

CMCDragonkai commented Oct 26, 2023

CMCDragonkai commented Nov 14, 2023 •

edited

Loading

CMCDragonkai commented Nov 14, 2023

CMCDragonkai commented Nov 14, 2023

tegefaulkes commented Feb 8, 2024

Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237

Pagination Deployment to Service Handlers and CLI Commands and Domain Collections #237

Comments

CMCDragonkai commented Sep 2, 2021 • edited by tegefaulkes Loading

Specification

Service handlers

CLI Commands

Domain Collections

Additional context

Tasks

CMCDragonkai commented Sep 3, 2021

CMCDragonkai commented Jul 16, 2022

Push vs Pull

CMCDragonkai commented Dec 6, 2022 • edited Loading

CMCDragonkai commented Jan 9, 2023

CMCDragonkai commented Jan 9, 2023

CMCDragonkai commented Feb 17, 2023

CMCDragonkai commented Feb 17, 2023

CMCDragonkai commented Mar 24, 2023

tegefaulkes commented Mar 24, 2023

CMCDragonkai commented Mar 24, 2023

tegefaulkes commented Mar 26, 2023

CMCDragonkai commented Apr 6, 2023

tegefaulkes commented Apr 6, 2023

CMCDragonkai commented Apr 24, 2023

tegefaulkes commented Jul 28, 2023

CMCDragonkai commented Oct 26, 2023

CMCDragonkai commented Nov 14, 2023 • edited Loading

CMCDragonkai commented Nov 14, 2023

CMCDragonkai commented Nov 14, 2023

tegefaulkes commented Feb 8, 2024

CMCDragonkai commented Sep 2, 2021 •

edited by tegefaulkes

Loading

CMCDragonkai commented Dec 6, 2022 •

edited

Loading

CMCDragonkai commented Nov 14, 2023 •

edited

Loading