Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery background processing feedback #162

Closed
2 tasks done
amydevs opened this issue Mar 25, 2024 · 23 comments
Closed
2 tasks done

Discovery background processing feedback #162

amydevs opened this issue Mar 25, 2024 · 23 comments
Assignees
Labels
development Standard development

Comments

@amydevs
Copy link
Member

amydevs commented Mar 25, 2024

Specification

There should be status indicators for TaskManager tasks related to the Discovery domain.

This command should provide feedback on the progress of discovery tasks of gestalts, and what gestalts have been recently discovered.

Ideally there should also be a way to cancel some of these tasks.

Discovery events

Discovery is already evented since it's async-init decorated. But we need to updated to have it's usual domain specific events along with discovery related events. We need an event for the following. More events may be added if any seem relevant.

  1. Vertex queued
  2. Vertex Processed
  3. Vertex failed to process
  4. Vertex culled from future processing due to failing to process for too long

~audit discovery command~

NOTE: this section is being split out to a new issue/PR that addresses the audit domain generally.

We need to add the audit domain and the audit discovery command. It will function similar to connection auditing where we provide an endless stream of the above events.

Part of the seeking will include a start point and an end point. Both of these are optional and can be set in the past or the future. When outputting events, it will stream the history from the start timestamp to the end timestamp. If the endpoint is in the past then the command will yield the history till that point and then end. If the start point or end point is into the future then the command will wait until the end point or cancellation to end. Future events will be yielded as they happen.

We want to add some degree of filtering to the events that are yielded. We could allow filtering for a specific or collection of nodes/identities. But also possibly filter on a vertex's parent or gestalt. If filtering is added then we'd need to work out how to specify filters as CLI options and how they interact.`

identities discover command

This command needs to be modified to provide feedback. Normally this command is non-blocking where it just adds a vertex to the queue and returns. But we need to add a --blocking flag or some flag that indicates feedback. When enabled this command will output events related to the vertex it queued and it's children vertices. Then command will end when all child vertices are processed. It could also be cancelled the normal way by aborting the command.

When events are printed out, they should be consistent between the two commands.

Additional context

MatrixAI/Polykey#691

Tasks

  • 1. Add relevant discovery events to be emitted by the discovery domain.
    1. ~Implement audit discovery command.(Name TBD)~ - Being addressed in a new PR for the audit domain
  • 3. Implement feedback for identities discover command. Command will default to non-blocking with no feedback unless flagged for blocking.
@amydevs amydevs added the development Standard development label Mar 25, 2024
Copy link

linear bot commented Mar 25, 2024

@tegefaulkes
Copy link
Contributor

I feel like this should be an endless stream command much like the audit domain. Where it in real time outputs steps the discovery tasks are taking. It should be pretty useful for debug too.

I'm also thinking the manual discovery command will do the same but only output the events relevant to it.

@tegefaulkes
Copy link
Contributor

I think this is more of a generic feedback for discovery issue. To fully address it we'd need a command that returns the events of all discovery happening. But also identities discover should be updated to provide feedback as well.

identities discover command

The identities discover command is currently non-blocking. All it does is add a vertex to the discovery queue. So for it to provide the feedback it needs a blocking version of the logic. Where it queues a vertex. but also waits for all discovery events relating to that first vertex. But given that discovery functions as a search algorithm, the discovery doesn't keep good track of what triggered it. So we'll need to maintain a tree of what triggered each event so we can track and do some detailed filtering of the events.

For example, we queue a vertex A, this discovers B and C. both B,C are queued for the next step, Say B discovers D. We'd form a tree of A -> B -> D, A - C. If we tracked the parents for each step we'd be able to simply filter the events for a general group of discovery. This will be very useful for the other command. Also note that we could have a vertex E that links to C as well that was started separately. This would make E and A a parent to C.

But does that make sense? A link is bi-directional, so its really just and edge and not a parent. By that logic the filtering on a per gestalt basis. I suppose that makes sense since gestalts is the only real way we group vertices anyway.

discovery auditing.

Right now this issue is called identities status Command. But in reality the functionality is an auditing feature very similar to the connections auditing we added a while back. This can just be an endless stream of discovery events including the all of the history. We can include the filtering mentioned above as well. We can use the connection auditing command as a template here, since it's identical but for discovery events rather than connection events.

So the auditing command will be the big picture history and version of the feedback. While the identities discover command will provide feedback just for the discovery tasks it initiated.

Any thoughts about this @amydevs ?

@amydevs
Copy link
Member Author

amydevs commented Mar 28, 2024

I originally thought that identities discover should still be non-blocking, but identities status would instead just show indicators for all discovery tasks as to their completion progress. Whilst, audit events would show the raw events that the indicators would derive from.

@tegefaulkes
Copy link
Contributor

For the most case yeah, identities discover would be non-blocking. But we can add a --block flag that waits for discovery to complete. Or a feedback flag that makes it long-running with the feedback.

@amydevs
Copy link
Member Author

amydevs commented Mar 28, 2024

ah alright

@tegefaulkes
Copy link
Contributor

Along with this, we also need to list out queued discovery tasks. So I suppose it will list queued, then move on to emitting events when they are processed? Not sure, needs more spec.

@tegefaulkes tegefaulkes closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2024
@tegefaulkes tegefaulkes self-assigned this Apr 11, 2024
@tegefaulkes tegefaulkes reopened this Apr 11, 2024
Copy link
Contributor

The GestaltIdentityIdEncoded uses " marks in it's formatting. This means it needs escape characters when used in the JSON format or if we wanted to pass it through a command flag. We may want to look into using an alternative format for it.

@CMCDragonkai
Copy link
Member

Can you give examples of this?

@tegefaulkes
Copy link
Contributor

Human readable

    queued         identity-["test-provider","abc"]
    queued         node-v6np16s1ui7m70sm802lne5pc923uk7ln1shoesujjg9h0are4r2g
    processed      identity-["test-provider","abc"]
    queued         node-v6k1uedkaqogphvm4lv5af1i8q8ofclvv45223ou9tm3jtk8ggntg
    processed      node-v6np16s1ui7m70sm802lne5pc923uk7ln1shoesujjg9h0are4r2g
    processed      node-v6k1uedkaqogphvm4lv5af1i8q8ofclvv45223ou9tm3jtk8ggntg

Formatted as JSON

   {"event":"queued","vertex":"identity-[\"test-provider\",\"abc\"]"}
    {"event":"queued","vertex":"node-vm6p5coet30377qql6vpm0oc6l9i5v4q6bn3e80digsv3ii4s9mm0"}
    {"event":"processed","vertex":"identity-[\"test-provider\",\"abc\"]"}
    {"event":"queued","vertex":"node-v4kjv3139hij4kv609e1dc53ruujhhlhmbuirh8uu8vh06j6kcbi0"}
    {"event":"processed","vertex":"node-vm6p5coet30377qql6vpm0oc6l9i5v4q6bn3e80digsv3ii4s9mm0"}
    {"event":"processed","vertex":"node-v4kjv3139hij4kv609e1dc53ruujhhlhmbuirh8uu8vh06j6kcbi0"}

So the format of identity-["test-provider","abc"] is not ideal in JSON Or if we had to provide it as a CLI option. I'm not sure how we should format it though. Since the ProviderId and IdentityId can be pretty permissive, likely URL valid format names, We're pretty limited on how to delimit it. Possibly using |? as in identity|test-provider|abc or identity|abc@test-provider?

Copy link
Member

The encoded version looks fine to me. Why is it a problem? I don't really want to introduce custom delimitation. It should be easily machine parseable.

@tegefaulkes
Copy link
Contributor

I'm just thinking if we had to provide an encodedGestaltId via a command flag, having to include escape characters when writing that out would be clunky.

Copy link
Member

It is fine because CLI commands use the providerId:identityId format.

This is the case with polykey identities claim and polykey identities discover. When it is possible to share to a gestalt, then polykey vaults share providerId:identityId should be possible too. And : is a restricted character for hostnames. https://stackoverflow.com/a/3523068/582917

Copy link
Member

It looks like what you're describing is logging output/auditing output. In that case, it may continue to work… since its more diagnostic. But we reserve the right to change it the more we test it.

Copy link
Member

CMCDragonkai commented Apr 24, 2024

So if it is now:

  • polykey audit discovery - this should be going into a new audit issue - this is whole new polykey CLI command
  • polykey identities discover provideId:identityId --feedback - where a flag makes the command blocking… and shows the status of this. But I'm not sure about --feedback. In a recent discussion we have --force, --follow for streaming the data, I think we can use --follow here even if --seek and --limit doesn't apply here.

Copy link
Member

Actually since --feedback still terminates, --follow doesn't make sense.

Copy link
Member

How about --monitor?

Copy link
Member

CMCDragonkai commented Apr 24, 2024

Link up the new issue for audit commands then here too @brian.botha. Especially polykey audit P1 P2 P3 … where P1 can be topic.topic.topic.

Copy link
Contributor

Some minor details that need to be addressed

  • --future-events should be called --follow
  • --feedback should be called --monitor
  • flags specific to a command should not have a short option.
  • Update the spec of the identities status issue.

Copy link
Member

The title of this issue isn't accurate, since there is not identities status command anymore.

Copy link
Contributor

tegefaulkes commented Apr 24, 2024

An additional Issue and PR needs to be made for the audit domain.

The current audit discovery changes need to be split out into a new PR. A new issue for a generic audit domain command needs to be made and speced out. Paths can be specified as a list of dot paths a.b.c, a.b. Multiple of these paths can be specified. Some optimization needs to be done to reduce work done when iterating over multiple paths. We're opting for a multiple iterator zipper method to achieve this.

Copy link
Member

Yes you would want to filter out all subpaths of polykey audit P1 P2 P3 …to eliminate unnecessary work.

@tegefaulkes tegefaulkes changed the title identities status Command Discovery background processing feedback Apr 26, 2024
@tegefaulkes
Copy link
Contributor

I accidentally linked to the wrong linear issue when creating a new PR. As a result this got automatically re-opened. Closing it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

No branches or pull requests

3 participants