Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline inputs #4499

Merged
merged 19 commits into from
Jan 5, 2023
Merged

Pipeline inputs #4499

merged 19 commits into from
Jan 5, 2023

Conversation

ukclivecox
Copy link
Contributor

@ukclivecox ukclivecox commented Dec 12, 2022

Allows pipelines to take inputs from other pipelines.

  • Adds new input section to pipeline spec
  • Updates dataflow engine to take into consideration the pipeline to filter and pipeline header to add after inputs
  • Adds an example notebook
  • Adds docs

Does not (maybe for follow on PRs) :

  • check for circular dependencies
  • extend pipeline status to include status of dependent pipelines

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments for now, until I go over the remaining bit.

apis/go/mlops/scheduler/scheduler_grpc.pb.go Show resolved Hide resolved
apis/go/mlops/scheduler/storage.pb.go Show resolved Hide resolved
docs/source/contents/pipelines/index.md Outdated Show resolved Hide resolved
docs/source/contents/pipelines/index.md Outdated Show resolved Hide resolved

Further examples can be found in the [pipeline-to-pipeline examples](../examples/pipeline-to-pipeline.md).

Present caveats:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving more comments, mostly questions.

@@ -106,3 +106,33 @@ func (tn *TopicNamer) GetFullyQualifiedTensorMap(pipelineName string, tin map[st
}
return tout
}

func (tn *TopicNamer) GetFullyQualifiedPipelineTensorMap(tin map[string]string) map[string]string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would personally use more representative variable names instead tin and tout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

parts := strings.Split(k, pipeline.StepNameSeperator)
var kout string
switch len(parts) {
case 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comment in code why 3 and 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

scheduler/pkg/kafka/topics.go Show resolved Hide resolved
scheduler/pkg/kafka/topics.go Show resolved Hide resolved
scheduler/pkg/proxy/chainer.go Show resolved Hide resolved
scheduler/pkg/store/pipeline/validate.go Show resolved Hide resolved
@@ -200,6 +252,29 @@ func updateInputSteps(pipelineName string, inputs []string) []string {
return updatedInputs
}

func updateExternalInputSteps(inputs []string) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add more documentation and/or unit test to explain logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add test

scheduler/pkg/kafka/dataflow/server_test.go Show resolved Hide resolved
Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks a great feature that would allow decoupled end-2-end pipeline creations.
However we should make sure that we have the right tooling to support users with their async decoupled pipelines and they are able to inspect and debug dataflow end-2-end.

So perhaps we want to discuss what would be potential follow ups beyond this PR.

scheduler/pkg/kafka/dataflow/server_test.go Show resolved Hide resolved
scheduler/pkg/proxy/chainer.go Show resolved Hide resolved
scheduler/pkg/store/pipeline/pipeline.go Show resolved Hide resolved
Comment on lines +263 to +265
case 1: // Add outputs if just pipeline specified
updatedInputs = append(updatedInputs, fmt.Sprintf("%s.%s", inp, StepOutputSpecifier))
case 3: // Add outputs if step name only specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as related to previous comments, I am not sure for the reader why it is 1 or 3 parts? perhaps add more description in code.

scheduler/pkg/store/pipeline/validate.go Show resolved Hide resolved
@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sakoush
To complete the pull request process, please assign
You can assign the PR to them by writing /assign in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ukclivecox ukclivecox added the v2 label Dec 17, 2022
ukclivecox and others added 14 commits December 17, 2022 17:26
Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com>
Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com>
* Capitalise Seldon when used as proper noun

* Formatting, capitalisation, etc.

* Add detail to inference docs page

* Formatting and typo fixes

* Use consistent capitalisation of model & pipeline through inference docs page

* Fix typo in inference docs for Kafka topics

* Specify use of inference v2 protocol high up in inference docs

* Add mention of headers to sync inference introduction

* Formatting + minor rewording for clarity

* Use tabs for Compose vs. k8s methods for finding the seldon-mesh endpoint

* Add note on port-forwarding seldon-mesh svc for inference requests

* Add note on service meshes for sending inference requests

* Add section on inference request routing with headers

* Add section on path-based routing for inference endpoints

* Add subsection header for Seldon routing (vs. ingress routing)

* Add section on routing from ingress -> seldon-mesh for inference calls

* Add links to RFCs for host & authority headers

* Update link to RFC for  HTTP/1 Host header

RFC-7230 obsoletes RFC-2616, the previous link.

* Add line describing virtual hosts vs. physical ones

* Use tabs for alternate ways of making inference requests

* Add inference request example with Seldon CLI

* Use consistent capitalisation of v2 for inference protocol

* Add note on Kafka headers for pipelines

* Use ordinal numbering for bullet points

* Update URI for consistency and to avoid confusion

* Move section on making requests above section on routing

* Use interpolation syntax to clarify usage of path-based routing in Seldon mesh

* Add second form of path-based routing for pipelines in Seldon mesh

* Clarify wording re virtual endpoints in SCv2

* Add section for header-based routing examples

This section builds on the examples from the prior section on making inference requests.

* Update basic examples to exclude routing headers

Routing headers are then given in the examples relevant to that section.

* Formatting

* Use group-tabs for example requests with different clients

* Add emphasis to header lines in examples for header-based routing

* Add notes on support for subdomain-based routing

* Add example snippets for subdomain routing

* Add Open Inference schema for iris model for examples

* Move pipeline inference tip lower for better flow

* Fix datatype for iris model inputs
* add flag for autoscaling in grpc msg

* autogen files

* extract helper function

* adjust comment

* wire up autoscaling flag in server

* wire up autoscaling in agent client

* set thresholds for scaling in local deployment

* add autoscaling flag to scheduler

* add a toggle for autoscaling service

* revert autoscaling envs set in local deployment

* disable scaling for local deployment

* use a disable toggle instead

* do not disable by default scaling service
Not sure if this is necessary but it actually took me some time to figure it out as I was sure that I have `docker compose` already installed. According to the [Docker documentation](https://docs.docker.com/compose/reference/) the spaced version looks like the newer one and maybe the makefile should be updated for that.
* Fix possible SIGSEV after producer close in modelgateway

* Set running after setup

* review comments
* link compose github for easier installation

* Update docs/source/contents/getting-started/docker-installation/index.md

Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com>

Co-authored-by: Alex Rakowski <20504869+agrski@users.noreply.github.com>
Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ukclivecox ukclivecox merged commit 5209861 into SeldonIO:v2 Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants