Discovery - revisiting Gestalt Vertices and error handling #328

emmacasolin · 2022-02-09T04:49:31Z

Specification

Unattended Discovery was implemented in #320, however, nodes/identities that have been previously discovered are currently stored in a set of visitedVertices and cannot be re-added to the queue for discovery. While we don't want to add these vertices back into the queue immediately, we need to design some sort of policy for revisiting these vertices when the discovery queue is empty (and/or after a certain amount of time has passed), so that new claims are reflected in the Gestalt Graph for existing Gestalts. This could potentially be achieved by adding TTLs or a lastVisited tag to vertices.

Note that, since child vertices are added into the queue during discovery, only one vertex (either a node or identity) per Gestalt needs to be added back into the queue for rediscovery.

Additional context

Original PR for Unattended Discovery: Growing the Gestalt Graph and Implementing Social Discovery #320
Issue for incorporating a system of priority into the queue (vertices queued for rediscovery should have the lowest priority): Generic Non-Blocking Task Management ("Queue") for discovery and nodes domains #329

Tasks

Add TTLs or a lastVisited tag to gestalt vertices
Design a policy for revisiting vertices - this should consider the current length of the queue, the vertices currently in the queue, and the amount of time since a given vertex was in the queue

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2022-09-15T07:10:41Z

Comparing to the NodeGraph, where buckets have TTLs, the bucket TTLs are eagerly refreshed. Whereas most cache TTLs are lazily refreshed.

But the nodes themselves do not have TTLs, they just a last visited tag.

Therefore gestalt vertexes are similar to nodes in this regard. They will have last visited tags, not TTLs. This is because when the TTL expires, we don't actually want to eagerly "discover" the gestalt vertex.

The discovering protocol is not driven by TTL expiry. It should be driven by kinetic priority determined by proximity and trust and interactivity.

CMCDragonkai · 2022-09-15T07:15:40Z

A bucket which is a group of nodes is therefore similar to a gestalt which is group of gestalt vertices.

Note that, since child vertices are added into the queue during discovery, only one vertex (either a node or identity) per Gestalt needs to be added back into the queue for rediscovery.

This is an interesting comment, because that would mean each gestalt is consider uniformly.

However unlike buckets in the node graph. Gestalts themselves have a priority and this is based on proximity.

So imagine your gestalt as the highest priority.

Then directly trusted gestalts as the second priority.

Then transitively trusted gestalts as subsequent priorities.

Therefore your the discovery loop would prefer discovering your own gestalt first. But not only the initial discovery order. But also preferring updates from your own gestalt first.

This would ensure that we get updates from our own gestalt quickly, but not really care about updates from gestalts completely unrelated to us.

CMCDragonkai · 2022-09-15T07:44:05Z

Discovery is about discovery cryptolinks between nodes and identities within a gestalt and across gestalts (based on the gestalts that we trust).

At the same time, the user will drive this by triggering a discovery on another identity.

However gestalts themselves would not have a TTL. Instead it would be the gestalt vertices that would have a last updated tag.

We will need to index by the last updated tag. So there will need to be a DB index for the last update.

When we decide to discover a gestalt. We may prefer to to a vertices that have the oldest update tag.

Maybe we can use this: #329 (comment)

Where each gestalt can have a "default priority" and some gestalts have higher default priorities.

Let's imagine all the gestalts are structured like a tree.

At the root of the tree, we have our own gestalt.

Then our immediate children are the gestalts that we trust... and so on. Actually this is a DAG, not at tree because 2 of our child gestalts may trust each other, or trust a common third party.

Now the order should then go from us to our immediate children (breadth-first). Topological sort may have some relevance here: https://en.wikipedia.org/wiki/Topological_sorting

The key point though, is that our gestalts are given a default priority. That default priority is a "weight" that decrements as the gestalt gets farther away from us. So suppose a degree N gestalt is N hops away from us, then it would have weight Max - N. Actually let's invert this number, so that 0 is the highest priority and N is some finite number away from 0. So then its weight would be N.

We should factor in fairness here too... so at the very least, we will eventually get to discovering other gestalts.

But there will be a limit, and that is a computational limit.

Alternatively we do something simpler.

We only care about our own gestalt, and also the gestalt of degree 1. Which is the gestalts that we are connected to.

At the same time the user may wish to crawl some random other gestalt. Then we add that into our discovery queue. At that point, we would only crawl itself, and not its children unless prompted to crawl that child gestalts.

Therefore we don't actually auto-crawl beyond one gestalt away from us.

This means that we have a "loop" for deciding what gestalts to focus on. And a "loop" deciding what vertexes to focus on within a gestalt.

If that's the case, we can just use the static priority. Highest priority goes to any user-driven gestalt. Then to our own gestalt. Then finally to all gestalts that we trust.

CMCDragonkai · 2022-09-15T07:46:41Z

To decide which vertex to visit, we would choose the last updated vertex in a gestalt.

Then if we get a list of child vertices of that vertex, we would then decide based on what was the last updated.

We would prefer vertexes that have never been updated/visited.

So then it is new vertices in the order we find them, then by age of existing vertices that we already know.

CMCDragonkai · 2022-09-15T07:47:22Z

This means TaskPriority only is used when working between gestalts. For individual vertex jobs, they would just inherit priority.

CMCDragonkai · 2022-09-15T07:47:35Z

Finally, we can eventually a governor for power control, but we can do that later.

CMCDragonkai · 2022-09-15T07:48:15Z

This would require a separate PR, it's an independent problem.

CMCDragonkai · 2022-11-25T07:21:43Z

@tegefaulkes this issue currently should be used to tackle the fact that Discovery as merged with #446 currently maintains a visited tag but this does not get refreshed, so we only ever crawl the GG once.

There are 2 ways to deal with this:

Add additional state to the GG, so we add booleans, ttls, last visited.
Create derived state in the discovery domain

Derived state is important here because if the GG deletes state, this would not necessarily translate to deleting state in discovery.

Currently in our KV DB, we don't really have the concept of "foreign keys". Meaning that when we create a new domain to refer to state in another domain, that state is loose. If the upstream state is mutated or deleted, this does not affect the downstream state.

The usual way we deal with this is that the downstream domain ends up encapsulating the mutation operations of the upstream domain. Like we do with GG and ACL.

However this doesn't always make sense, such as with Discovery and GG.

I don't want to implement foreign keys in our DB, that gets us too close to trying to re-implement SQL databases... and we are using rocksdb key-value for a reason.

Since we are doing application-level indexing, we might as well do application-level foreign keys.

To do this, we will need push-based dataflow to allow the discovery state to be reactive to changes in the GG state. This only makes sense to do when the changes don't need to be atomic.

So if discovery state like last visited... etc can be eventually consistent, then we can use push-dataflow to make this work.

This means this design will depend on #444.

CMCDragonkai · 2022-11-25T07:27:21Z

This is now an epic encompassing alot of discovery related issues in terms of revisiting logic and error handling and testing.

CMCDragonkai · 2022-11-25T07:28:14Z

Quoting #493 (comment)

I'm starting to see that having a separate gestalts subcommands would be useful.
pk nodes add
pk nodes find
pk nodes ping
pk nodes getall
pk nodes connections
pk nodes claim         <- claim only node

pk identities authenticate
pk identities authenticated
pk identities search
pk identities claim    <- claim only identity

pk gestalts discover
pk gestalts get
pk gestalts list
pk gestalts claim      <- claim either identity/node
pk gestalts invite     <- invite node to a gestalt
pk gestalts uninvite   <- uninvite node to a gestalt
pk gestalts permissions
pk gestalts allow
pk gestalts disallow
pk gestalts trust      <- short-hand for allow for trust
pk gestalts untrust    <- short-hand for disallow for trust
There are a few ambiguities here.

The claim subcommands can be referring to node or identity or both.

The invite is like pk vaults share and uninvite is like pk vaults unshare. It's combination of setting a permission AND sending a notification.

The trust and untrust is just a short-hand of allow and disallow

We can sort this out in the future issue.

I think the pk gestalts trust and pk gestalts untrust should be removed, it's a bit wishy washy and the generic allow is sufficient for all the kinds of permissions.

Also cross posted to MatrixAI/Polykey-CLI#6.

tegefaulkes · 2024-04-08T02:01:01Z

As of recent changes, the visitiedVerticies set has been replaced with a persistent tracking of the last time a vertex was visited. So Half of the tasks here have been addressed.

This is also an epic, so it's not technically completed until all the children are addressed as well. In any case, for this issue specifically we still require re-discovery to be implemented which is being worked on in a branch currently. Some small things still need to be solved for that.

emmacasolin added development Standard development design Requires design labels Feb 9, 2022

This was referenced Feb 9, 2022

Generic Non-Blocking Task Management ("Queue") for discovery and nodes domains #329

Closed

Growing the Gestalt Graph and Implementing Social Discovery #320

Merged

This was referenced Feb 18, 2022

Review the Identities CLI and RPC Handlers MatrixAI/Polykey-CLI#6

Open

CLI and Client & Agent Service test splitting #311

Merged

teebirdy added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 24, 2022

CMCDragonkai mentioned this issue Sep 11, 2022

Integrate TaskManager into NodeGraph and Discovery #445

Merged

15 tasks

tegefaulkes mentioned this issue Sep 15, 2022

Reduce the timeout for establishing a Node Connection within the Discovery domain (by adding timer override to NodeConnectionManager) #353

Closed

CMCDragonkai added the epic Big issue with multiple subissues label Nov 25, 2022

CMCDragonkai changed the title ~~Design a policy for revisiting Gestalt Vertices in Discovery Algorithm~~ Discovery - revisiting Gestalt Vertices and error handling Nov 25, 2022

CMCDragonkai assigned tegefaulkes Jul 10, 2023

CMCDragonkai mentioned this issue Jul 24, 2023

js-quic integration and Agent migration #525

Merged

26 tasks

CMCDragonkai assigned amydevs and unassigned tegefaulkes Nov 13, 2023

amydevs mentioned this issue Nov 13, 2023

Vault Sharing With GestaltIDs #626

Closed

This was referenced Mar 26, 2024

CLI Beta Launch MatrixAI/Polykey-CLI#40

Closed

General Discovery fixes and features #692

Closed

CMCDragonkai mentioned this issue May 6, 2024

Gestalt Synchronisation for ACL Configuration, Notifications and Vault Automation #715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery - revisiting Gestalt Vertices and error handling #328

Discovery - revisiting Gestalt Vertices and error handling #328

emmacasolin commented Feb 9, 2022 •

edited

Loading

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Nov 25, 2022

CMCDragonkai commented Nov 25, 2022

CMCDragonkai commented Nov 25, 2022 •

edited

Loading

tegefaulkes commented Apr 8, 2024

Discovery - revisiting Gestalt Vertices and error handling #328

Discovery - revisiting Gestalt Vertices and error handling #328

Comments

emmacasolin commented Feb 9, 2022 • edited Loading

Specification

Additional context

Tasks

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Sep 15, 2022

CMCDragonkai commented Nov 25, 2022

CMCDragonkai commented Nov 25, 2022

CMCDragonkai commented Nov 25, 2022 • edited Loading

tegefaulkes commented Apr 8, 2024

emmacasolin commented Feb 9, 2022 •

edited

Loading

CMCDragonkai commented Nov 25, 2022 •

edited

Loading