Skip to content

Commit

Permalink
feat: add support for assignment table
Browse files Browse the repository at this point in the history
When the graph database returns clusters that satisfy a jobspec, they need to be redirected
to the rainbow cluster to be assigned. This is a two step process, where first we add
the jobid to the jobs table (and it is not assigned) and then we will add it to an assignment
table with each cluster id that can receive it. At this point we are going to ask the clusters
if they can satisfy (and when) to assign it, and I do think we need a push model for the assignment.
We will need to allow for failure to connect (and retry) and some kind of heartbeat to do that,
but I first need to think about how a cluster can have work pushed to it - likely we need a
special client running there that I have not written yet

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Mar 3, 2024
1 parent 1823f59 commit 0832df1
Show file tree
Hide file tree
Showing 14 changed files with 451 additions and 346 deletions.
15 changes: 9 additions & 6 deletions api/v1/rainbow.proto
Expand Up @@ -39,12 +39,15 @@ message RegisterRequest {
// we currently accept nodes, tasks, and the command
message SubmitJobRequest {
string name = 1;
string cluster = 2;
string token = 3;
int32 nodes = 4;
int32 tasks = 5;
string command = 6;
google.protobuf.Timestamp sent = 7;
repeated Cluster clusters = 2;
string jobspec = 3;
google.protobuf.Timestamp sent = 4;

message Cluster {
string name = 1;
string token = 2;
}

}

// RequestJobsRequest is used by a cluster (or other entity that can run jobs)
Expand Down
2 changes: 1 addition & 1 deletion backends/memory/cluster.go
Expand Up @@ -267,7 +267,7 @@ func (g *ClusterGraph) LoadClusterNodes(
if !ok {
return fmt.Errorf("destination %s is defined as an edge, but missing as node in graph", edge.Label)
}
fmt.Printf("Adding edge from %s -%s-> %s\n", ss.Vertices[src].Type, edge.Relation, ss.Vertices[dest].Type)
// fmt.Printf("Adding edge from %s -%s-> %s\n", ss.Vertices[src].Type, edge.Relation, ss.Vertices[dest].Type)
err := ss.AddEdge(src, dest, 0, edge.Relation)
if err != nil {
return err
Expand Down
4 changes: 3 additions & 1 deletion backends/memory/subsystem.go
Expand Up @@ -26,6 +26,8 @@ func NewSubsystem() *Subsystem {
// DFSForMatch WILL is a depth first search for matches
// It starts by looking at total cluster resources on the top level,
// and then traverses into those that match the first check
// THIS IS EXPERIMENTAL and likely wrong, or missing details,
// which is OK as we will only be using it for prototyping.
func (s *Subsystem) DFSForMatch(jobspec *js.Jobspec) ([]string, error) {

// Return a list of matching clusters
Expand Down Expand Up @@ -252,7 +254,7 @@ func (s *Subsystem) depthFirstSearch(matches []string, jobspec *js.Jobspec) ([]s
fmt.Printf(" ❌️ %s not a match, %s\n", cluster, reason)
return false
} else {
reason := fmt.Sprintf("%d of needed %s satisfied", foundMatches, resource.Type)
reason := fmt.Sprintf("%d/%d of needed %s satisfied", foundMatches, resource.Count, resource.Type)
fmt.Printf(" ⏳️ %s still contender, %s\n", cluster, reason)
}
}
Expand Down
4 changes: 2 additions & 2 deletions cmd/rainbow/submit/submit.go
Expand Up @@ -6,9 +6,9 @@ import (
"log"
"strings"

js "github.com/compspec/jobspec-go/pkg/jobspec/v1"
"github.com/converged-computing/rainbow/pkg/client"
"github.com/converged-computing/rainbow/pkg/config"
"github.com/converged-computing/rainbow/pkg/jobspec"
)

// Run will check a manifest list of artifacts against a host machine
Expand Down Expand Up @@ -38,7 +38,7 @@ func Run(
}

// Convert the simple command / nodes / etc into a JobSpec
js, err := jobspec.NewSimpleJobspec(jobName, command, int32(nodes), int32(tasks))
js, err := js.NewSimpleJobspec(jobName, command, int32(nodes), int32(tasks))
if err != nil {
return nil
}
Expand Down
23 changes: 21 additions & 2 deletions docs/commands.md
Expand Up @@ -154,6 +154,11 @@ and then working on the next interaction, the client submit command, which is go

## Submit Job

Submission has two steps that are discussed below.

### 1. Satisfy Request

The satisfy request interacts with the graph database and determines if any clusters can satisfy the jobspec.
To submit a job, we need the client `token` associated with a cluster. We are going to use the following strategy, and allow the following submission types:

- **simple**: for basic users, a command and the most basic of parameters will be provided and converted to a Jobspec.
Expand Down Expand Up @@ -253,8 +258,22 @@ cluster keebler does not have sufficient resource type node - actual 3 vs needed
match: 😥️ no clusters could satisfy this request. We are sad
```

Note that the above is not technically a graph search yet - we are just checking the global resources of each cluster. I need to perform the DFS when I better
understand / think about a good strategy for that, likely reading Fluxion code.
Note that the above has a two step process:

- A quick check against clusters in the graph database if total resources can be satisfied.
- For that set, a (Vanessa written and janky) "DFS" that likely has bugs that traverses the graph

This will be improved upon with Fluxion and actual graph databases, but this is OK for the prototype.

### 2. Pre-Assignment

When the initial satisfy resquest is done (the step above) and we have a list of clusters, we can then tell rainbow about them.
This means that a list of clusters is returned that is passed from the same client request to rainbow
to do assignment, and logically, if there are no clusters that can sastify, that response is returned to the client.


## Request Jobs


## Request Jobs

Expand Down
5 changes: 3 additions & 2 deletions go.mod
Expand Up @@ -4,9 +4,8 @@ go 1.20

require (
github.com/akamensky/argparse v1.4.0
github.com/compspec/jobspec-go v0.0.0-20240226213125-007327866207
github.com/compspec/jobspec-go v0.0.0-20240302201731-e7fb2bf2627f
github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510
github.com/google/uuid v1.6.0
github.com/mattn/go-sqlite3 v1.14.22
github.com/pkg/errors v0.9.1
Expand All @@ -17,8 +16,10 @@ require (

require (
github.com/golang/protobuf v1.5.3 // indirect
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 // indirect
golang.org/x/net v0.21.0 // indirect
golang.org/x/sys v0.17.0 // indirect
golang.org/x/text v0.14.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240123012728-ef4313101c80 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
11 changes: 7 additions & 4 deletions go.sum
@@ -1,22 +1,23 @@
github.com/akamensky/argparse v1.4.0 h1:YGzvsTqCvbEZhL8zZu2AiA5nq805NZh75JNj4ajn1xc=
github.com/akamensky/argparse v1.4.0/go.mod h1:S5kwC7IuDcEr5VeXtGPRVZ5o/FdhcMlQz4IZQuw64xA=
github.com/compspec/jobspec-go v0.0.0-20240226213125-007327866207 h1:p872BOJceUTU2+FOXKjVz68/VwAkN0zGdSigiWTMao0=
github.com/compspec/jobspec-go v0.0.0-20240226213125-007327866207/go.mod h1:BaJyxaOhESe2DD4lqBdwTEWOw0TaTZVJGPrFh6KyXQM=
github.com/compspec/jobspec-go v0.0.0-20240302201731-e7fb2bf2627f h1:JHOVu3snvprXuO3UDT2FngfmfDGj+2g5inZof2my9IA=
github.com/compspec/jobspec-go v0.0.0-20240302201731-e7fb2bf2627f/go.mod h1:BaJyxaOhESe2DD4lqBdwTEWOw0TaTZVJGPrFh6KyXQM=
github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe h1:Tk//RW3uKn4A7N8gpHRXs+ZGlR7Fxkwh+4/Iml0GBV4=
github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe/go.mod h1:+DhVyLXGVfBsfta4185jd33jqa94inshCcdvsXK2Irk=
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
github.com/golang/protobuf v1.5.3 h1:KhyjKVUg7Usr/dYsdSqoFveMYd5ko72D+zANwlG1mmg=
github.com/golang/protobuf v1.5.3/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 h1:El6M4kTTCOh6aBiKaUGG7oYTSPP8MxqL4YI3kZKwcP4=
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510/go.mod h1:pupxD2MaaD3pAXIBCelhxNneeOaAeabZDe5s4K6zSpQ=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/mattn/go-sqlite3 v1.14.22 h1:2gZY6PC6kBnID23Tichd1K+Z0oS6nE/XwU+Vz/5o4kU=
github.com/mattn/go-sqlite3 v1.14.22/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 h1:lZUw3E0/J3roVtGQ+SCrUrg3ON6NgVqpn3+iol9aGu4=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1/go.mod h1:uToXkOrWAZ6/Oc07xWQrPOhJotwFIyu2bBVN41fcDUY=
golang.org/x/net v0.21.0 h1:AQyQV4dYCvJ7vGmJyKki9+PBdyvhkSd8EIx/qb0AYv4=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/sys v0.17.0 h1:25cE3gD+tdBA7lp7QfhuV+rJiE9YXTcS3VG1SqssI/Y=
Expand All @@ -36,3 +37,5 @@ gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E=
sigs.k8s.io/yaml v1.4.0/go.mod h1:Ejl7/uTz7PSA4eKMyQCUTnhZYNmLIl+5c2lQPGR2BPY=

0 comments on commit 0832df1

Please sign in to comment.