feat: add model streaming #8973

gt2345 · 2024-03-07T17:46:23Z

Description

Add model streaming, this is not yet exposed to the WebUI.
There will be a separate PR for EE to handle rbac.

Test Plan

Connect to ws://localhost:8080/stream, should be able to subscribe and unsubscribe for models.

Commentary (optional)

There is a separate ticket for adding unit test, so unit test is not yet included in this PR.

Checklist

Changes have been manually QA'd
User-facing API changes need the "User-facing API Change" label.
Release notes should be added as a separate file under docs/release-notes/.
See Release Note for details.
Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

MD-261

netlify · 2024-03-07T17:46:40Z

✅ Deploy Preview for determined-ui canceled.

Name	Link
🔨 Latest commit	`c33a76d`
🔍 Latest deploy log	https://app.netlify.com/sites/determined-ui/deploys/65f99fddcef390000835b543

corban-beaird · 2024-03-07T18:28:20Z

master/static/migrations/20240304145201_add-models-trigger.tx.up.sql

+DROP TRIGGER IF EXISTS stream_model_trigger_seq ON models;
+CREATE TRIGGER stream_model_trigger_seq
+    BEFORE INSERT OR UPDATE OF
+    name, description, creation_time, last_updated_time, metadata, labels, user_id, archived, notes, workspace_id


it's likely outside the scope of this ticket; however, due to the limitations of postgres notify/listen queues, we'll want to only communicate over these channels using filterable columns. It doesn't look like the work on making the transition to the strategy is complete.

That being said, this will need to be adjusted with the work so that we don't potentially exceed that 8k character limit with user defined fields

For models, I think it should be safe to exclude creation_time and user_id from the trigger, but do you mean we also want to exclude metadata and notes? I know we have a ticket to address the character limit, I wonder if that's still valid, maybe @jgongd has more information?

The ticket is related to this issue. I plan to add unit tests next before focusing on this ticket. Let me know if you think this should take priority, and I'll be happy to tackle it first.

corban-beaird

The streaming updates code looks solid, could integration tests be added?

codecov · 2024-03-10T21:20:32Z

Codecov Report

Attention: Patch coverage is 67.94258% with 67 lines in your changes are missing coverage. Please review.

Project coverage is 47.56%. Comparing base (68017dd) to head (c33a76d).
Report is 38 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8973      +/-   ##
==========================================
+ Coverage   47.55%   47.56%   +0.01%     
==========================================
  Files        1168     1169       +1     
  Lines      176706   176883     +177     
  Branches     2356     2353       -3     
==========================================
+ Hits        84026    84137     +111     
- Misses      92522    92588      +66     
  Partials      158      158

Flag	Coverage Δ
backend	`42.84% <74.13%> (+0.09%)`	⬆️
harness	`63.91% <37.14%> (-0.03%)`	⬇️
web	`42.89% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
master/internal/stream/authz_basic_impl.go	`83.33% <100.00%> (+8.33%)`	⬆️
master/internal/stream/messages.go	`0.00% <ø> (ø)`
master/internal/stream/projects.go	`77.01% <100.00%> (+1.55%)`	⬆️
master/internal/stream/subscription.go	`94.82% <100.00%> (+2.14%)`	⬆️
master/internal/stream/test_util.go	`68.57% <100.00%> (+2.15%)`	⬆️
master/cmd/stream-gen/main.go	`0.00% <0.00%> (ø)`
master/internal/stream/publisher.go	`65.08% <92.30%> (+1.61%)`	⬆️
harness/determined/common/streams/_client.py	`80.00% <53.84%> (-1.40%)`	⬇️
master/internal/stream/util.go	`69.49% <71.42%> (+1.74%)`	⬆️
harness/determined/common/streams/wire.py	`69.33% <27.27%> (-17.46%)`	⬇️
... and 1 more

... and 3 files with indirect coverage changes

gt2345 · 2024-03-11T15:38:25Z

The streaming updates code looks solid, could integration tests be added?

Thanks! I refactored the project entity to use common utility functions, and I also added an integration test for streaming multiple entities.

azhou-determined · 2024-03-13T12:06:47Z

master/internal/stream/models.go

+// determined:stream-gen source=client
+type ModelSubscriptionSpec struct {
+	WorkspaceIDs []int `json:"workspace_ids"`
+	ModelIDs     []int `json:"Model_ids"`


is this supposed to be capitalized?

azhou-determined · 2024-03-13T12:36:26Z

master/internal/stream/models.go

+// ModelSubscriptionSpec is what a user submits to define a Model subscription.
+//
+// determined:stream-gen source=client
+type ModelSubscriptionSpec struct {


it might be helpful to be able to subscribe to model streaming updates by model name(s)? i think in most cases users won't easily have access to the model ID, and the model name is unique.

for reference, the python SDK accepts both model ID and model name as possible fetch parameters (client.get_model(ID|name)), and the CLI only accepts name (in det model describe NAME)

There's hesitancy to allow for subscribing based on model name due to limitations on postgres notify/listen queues. To allow for subscribing based on model names, we'd have to enforce an maximum length. The current maximum size of a model name is only restricted by the index limit in postgres as of right now (8191 bytes), which will exceed the capacity of the postgres queues.

azhou-determined · 2024-03-13T12:45:24Z

master/internal/stream/models.go

+// ModelSubscriptionSpec is what a user submits to define a Model subscription.
+//
+// determined:stream-gen source=client
+type ModelSubscriptionSpec struct {


similarly, would it be a good idea to be subscribe by user ID and/or labels? i'm imagining a use case where you don't want to have to figure out specific identifiers for the models you're interested in, so maybe a broader subscription scope would be helpful.

I don't see an issue with allow subscription based on user id; assuming RBAC is implemented in EE.

I would be hesitant to allow label-based subscriptions for the same reason as above, especially since there doesn't seem to be a limit on the length of a tag.

azhou-determined · 2024-03-13T14:12:49Z

should model streaming be added to the python streaming client _client.py as well? or is that not relevant to this PR?

azhou-determined · 2024-03-13T14:14:40Z

a few general design questions wrt streaming updates and not in the scope of this PR (feel free to redirect me if this isn't the right place to ask these, just some thoughts i had while reading through the code as a newcomer to streaming updates 😄 ):

it seems like streaming can only happen on a single table, and events returned to the client are also scoped to a single table. what's the plan for more complicated streaming use cases in the future? say for example i want to listen for the latest validation metric for a given trial, or the best checkpoint for an experiment (something our SDK/CLI/REST APIs currently support).
in the current streaming updates world, this means i would have to listen for updates on the experiments, trials, checkpoints, and also metrics tables? or do we intend to allow joins, either in the SubscriptionSpecs or the returned events? i.e. if i subscribe to the runs table, there's a foreign key best_validation_id to the metrics table, and that FK row automatically gets joined and included in the event.
is the assumption that streaming is mainly for listening for updates for an object that has already been created? i'm realizing that for a lot of these SubscriptionSpecs, we're gonna require IDs, but in order to know the IDs, you'd have to have created the object before you start streaming -- for some tables this means you basically can't listen for creation/insert events. in this case i wonder if broader filters in the SubscriptionSpec might be useful, like created_ts > now().

azhou-determined

lgtm. commented a few suggestions/questions but up to you whether they're relevant.

gt2345 · 2024-03-13T18:57:32Z

a few general design questions wrt streaming updates and not in the scope of this PR (feel free to redirect me if this isn't the right place to ask these, just some thoughts i had while reading through the code as a newcomer to streaming updates 😄 ):

it seems like streaming can only happen on a single table, and events returned to the client are also scoped to a single table. what's the plan for more complicated streaming use cases in the future? say for example i want to listen for the latest validation metric for a given trial, or the best checkpoint for an experiment (something our SDK/CLI/REST APIs currently support).
in the current streaming updates world, this means i would have to listen for updates on the experiments, trials, checkpoints, and also metrics tables? or do we intend to allow joins, either in the SubscriptionSpecs or the returned events? i.e. if i subscribe to the runs table, there's a foreign key best_validation_id to the metrics table, and that FK row automatically gets joined and included in the event.

is the assumption that streaming is mainly for listening for updates for an object that has already been created? i'm realizing that for a lot of these SubscriptionSpecs, we're gonna require IDs, but in order to know the IDs, you'd have to have created the object before you start streaming -- for some tables this means you basically can't listen for creation/insert events. in this case i wonder if broader filters in the SubscriptionSpec might be useful, like created_ts > now().

Hi @azhou-determined , thanks for your review. To answer you questions:

One subscription should only target one table in database, the join is supposed to happen at the client side. Just like you said, we can listen for updates for multiple tables, then join data at the front end.
The streaming updates only serves read, for creating/updating objects, we are still using the rest apis. Could you please expand on the kinds of tables we can't listen for creation/insert events?

corban-beaird · 2024-03-18T21:07:46Z

master/internal/stream/models.go

+type ModelSubscriptionSpec struct {
+	WorkspaceIDs []int    `json:"workspace_ids"`
+	ModelIDs     []int    `json:"model_ids"`
+	ModelNames   []string `json:"model_names"`


Are model names being restricted in length to ensure that we don't exceed the postgres payload limit?

I believe not, thanks for your input, I decided to remove subscription by model name

corban-beaird

Looks great! 😄

gt2345 added 2 commits March 6, 2024 11:41

db migration

bf946a8

add model as streamable

b1e2354

gt2345 requested review from a team as code owners March 7, 2024 17:46

gt2345 requested review from szewaiyuen6 and eecsliu March 7, 2024 17:46

cla-bot bot added the cla-signed label Mar 7, 2024

gt2345 requested review from stoksc and corban-beaird and removed request for szewaiyuen6 and eecsliu March 7, 2024 17:47

corban-beaird reviewed Mar 7, 2024

View reviewed changes

gt2345 added 2 commits March 10, 2024 11:50

refactor

9130fa3

fix publisher and add intg test

2596c13

gt2345 added 3 commits March 10, 2024 17:53

dupl

074af3a

lint

3b81953

nolint

22e108b

gt2345 requested a review from corban-beaird March 11, 2024 15:38

add stream gen tag

d8d4125

gt2345 requested a review from azhou-determined March 12, 2024 16:57

azhou-determined reviewed Mar 13, 2024

View reviewed changes

azhou-determined approved these changes Mar 13, 2024

View reviewed changes

gt2345 added 2 commits March 13, 2024 11:31

modify py client

f52a2de

move migration

d804c10

py lint

98ef766

gt2345 added 3 commits March 13, 2024 16:20

subscribe by user id and model name

74a880f

fix

7eeac98

fix test

5fc0da7

corban-beaird reviewed Mar 18, 2024

View reviewed changes

remove subs to model by name

7c41743

gt2345 requested a review from a team as a code owner March 18, 2024 22:26

gt2345 requested a review from wes-turner March 18, 2024 22:26

corban-beaird approved these changes Mar 18, 2024

View reviewed changes

gt2345 added 2 commits March 18, 2024 17:47

add user id to client

a59b57d

fmt

c33a76d

gt2345 merged commit 137bfcd into main Mar 19, 2024
70 of 83 checks passed

gt2345 deleted the gt/261-stream-model branch March 19, 2024 14:49

maxrussell pushed a commit that referenced this pull request Mar 21, 2024

feat: add model streaming (#8973)

04a3bc7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add model streaming #8973

feat: add model streaming #8973

gt2345 commented Mar 7, 2024 •

edited

Loading

netlify bot commented Mar 7, 2024 •

edited

Loading

corban-beaird Mar 7, 2024

gt2345 Mar 11, 2024

jgongd Mar 11, 2024

corban-beaird left a comment

codecov bot commented Mar 10, 2024 •

edited

Loading

gt2345 commented Mar 11, 2024

azhou-determined Mar 13, 2024

azhou-determined Mar 13, 2024

corban-beaird Mar 18, 2024

azhou-determined Mar 13, 2024

corban-beaird Mar 18, 2024

azhou-determined commented Mar 13, 2024

azhou-determined commented Mar 13, 2024

azhou-determined left a comment

gt2345 commented Mar 13, 2024

corban-beaird Mar 18, 2024

gt2345 Mar 18, 2024

corban-beaird left a comment •

edited

Loading

feat: add model streaming #8973

feat: add model streaming #8973

Conversation

gt2345 commented Mar 7, 2024 • edited Loading

Description

Test Plan

Commentary (optional)

Checklist

Ticket

netlify bot commented Mar 7, 2024 • edited Loading

✅ Deploy Preview for determined-ui canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corban-beaird left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 10, 2024 • edited Loading

Codecov Report

gt2345 commented Mar 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azhou-determined commented Mar 13, 2024

azhou-determined commented Mar 13, 2024

azhou-determined left a comment

Choose a reason for hiding this comment

gt2345 commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corban-beaird left a comment • edited Loading

Choose a reason for hiding this comment

gt2345 commented Mar 7, 2024 •

edited

Loading

netlify bot commented Mar 7, 2024 •

edited

Loading

codecov bot commented Mar 10, 2024 •

edited

Loading

corban-beaird left a comment •

edited

Loading