Refactor memory graph (#32)

* feat: basic io match is working There is still more work to be done with jobspec-go and parsing from raw values, and also checking the other match types, but this is a start. * refactor: memory graph database Problem: we currently do not have a good model to support traversal of more than one scheduled slot (a group of resources) and checking of requires within and outside of the slot. Solution: Jobspec nextgen provides a function to expose schedul-able slots. A slot does not necessarily start at the top - it can have some set of resources at the top level (with requirements) and then the slot is below it. This means that the graph databases recursive algorithm needs to first traverse into a vertex to find the slot, but along the way check the subsystem requirements for types. For example, even if we want N nodes, we should not continue search if a node does not have an attribute we are interested in. Once we find a slot, we create what is akin to a traverser, and the traverser carries with it a resource counter. The resource counter holds the count of needed slots vs. found slots, and then is able to return as soon as we found as many as we need. It also holds the current state (status) of a current search, meaning we decrement either a resource or subsystem count when we find it somewhere in the subgraph of the slot. This is just the early prototype, and so far just working for the simple case of submitting a job with some need for cores and nodes. I am next going to go back through the more specific IO cases and ensure that they still with, with the goal to get back to the spack case. I am going back to sleep for a bit first, kind of tired. * io example is working This example is needing to search both compatibility requirements and look for resources within a slot. Signed-off-by: vsoch <vsoch@users.noreply.github.com>
converged-computing · May 10, 2024 · 502bfbb · 502bfbb
1 parent a79a19d
commit 502bfbb
Show file tree

Hide file tree

Showing 37 changed files with 851 additions and 895 deletions.
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -25,7 +25,6 @@ ENV GO_VERSION=1.20.14
         unzip && \
     apt-get clean -y && apt -y autoremove
 
-# Install go 19.10
 RUN wget https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz  && tar -xvf go${GO_VERSION}.linux-amd64.tar.gz && \
     mv go /usr/local && rm go${GO_VERSION}.linux-amd64.tar.gz
 

diff --git a/README.md b/README.md
@@ -13,6 +13,7 @@ For more information:
 
 ## TODO
 
+- match/equals can have repeated fields, so we need to honor that list.
 - cypher: when we have another cypher graph, move the memgraph cypher logic into the graph match algorithm, add an endpoint to return cypher. Currently the match algorithms (beyond basic containment) are not implemented
 - subsystems
   - make also a function to delete subsystems

diff --git a/cmd/rainbow/rainbow.go b/cmd/rainbow/rainbow.go
@@ -15,7 +15,6 @@ import (
 
 	// Register database backends and selection algorithms
 	_ "github.com/converged-computing/rainbow/plugins/algorithms/match"
-	_ "github.com/converged-computing/rainbow/plugins/algorithms/range"
 	_ "github.com/converged-computing/rainbow/plugins/backends/memgraph"
 	_ "github.com/converged-computing/rainbow/plugins/backends/memory"
 	_ "github.com/converged-computing/rainbow/plugins/selection/constraint"

diff --git a/cmd/rainbow/submit/submit.go b/cmd/rainbow/submit/submit.go
@@ -5,7 +5,7 @@ import (
 	"fmt"
 	"log"
 
-	js "github.com/compspec/jobspec-go/pkg/jobspec/experimental"
+	js "github.com/compspec/jobspec-go/pkg/nextgen/v1"
 	"github.com/converged-computing/rainbow/pkg/client"
 	"github.com/converged-computing/rainbow/pkg/config"
 	jscli "github.com/converged-computing/rainbow/pkg/jobspec"

diff --git a/cmd/server/server.go b/cmd/server/server.go
@@ -12,7 +12,6 @@ import (
 
 	// Register database backends
 	_ "github.com/converged-computing/rainbow/plugins/algorithms/match"
-	_ "github.com/converged-computing/rainbow/plugins/algorithms/range"
 	_ "github.com/converged-computing/rainbow/plugins/backends/memgraph"
 	_ "github.com/converged-computing/rainbow/plugins/backends/memory"
 	_ "github.com/converged-computing/rainbow/plugins/selection/constraint"

diff --git a/docs/advanced.md b/docs/advanced.md
@@ -23,5 +23,6 @@ go run ./cmd/rainbow/rainbow.go submit --config-path ./docs/examples/scheduler/r
 ```
 
 The above demonstrates using a more advanced selection algorithm.
+Note that this cluster state requires further discussion and thinking about where and how to accommodate it - it currently uses the old design with attributes on the level of the Jobspec, and while this works, we likely want to be using the attributes on the level of schedule-able unit.
 
 [home](/README.md#rainbow-scheduler)
diff --git a/docs/algorithms.md b/docs/algorithms.md
@@ -141,11 +141,7 @@ task:
         max: "0.5.5"
 ```
 
-This is the most realisic use case I think.
-
-### Equals
-
-The "equals" type is going to look exactly at some exact value for a field in the metadata. It will return true (match) if it matches what the subsystem needs. For example, given this task:
+This is the most realisic use case I think. The above demonstrates the match "equals" and range types. Using just match (or equals) is shown below:
 
 ```yaml
 task:
@@ -297,7 +293,7 @@ scheduler:
                     - select: random
 ```
 
-The above is saying for first priority, filter down to clusters that have nodes free. Then calculate an estimate of the cost for the build. Here is the logic. If we have a linaer model (Y = mX + b) to describe memory and runtime, so `runtime = (slope * memory) + intercept` and here our intercept is some value we can derive on the level of the package (and write into the jobspec) and seconds_per_gb is the slope of the line, then we can get an estimated runtime (in seconds) with `seconds_per_gb * memory_per_node`. If we multiply by 60 we get minutes, and again we get hours. So the piece of the equation `memory_per_node * seconds_per_gb)/60/60` is giving us an estimated runtime in hours based on the package being built. If we multiply that by the cost per node hour, then we get an estimate of the cost for the build.
+The above is saying for first priority, filter down to clusters that have nodes free. Then calculate an estimate of the cost for the build. Here is the logic. If we have a linear model (Y = mX + b) to describe memory and runtime, so `runtime = (slope * memory) + intercept` and here our intercept is some value we can derive on the level of the package (and write into the jobspec) and seconds_per_gb is the slope of the line, then we can get an estimated runtime (in seconds) with `seconds_per_gb * memory_per_node`. If we multiply by 60 we get minutes, and again we get hours. So the piece of the equation `memory_per_node * seconds_per_gb)/60/60` is giving us an estimated runtime in hours based on the package being built. If we multiply that by the cost per node hour, then we get an estimate of the cost for the build.
 
 The "select" field is saying how to choose the final cluster from the set that remain. Options here can be first, last, or random.
 

diff --git a/docs/commands.md b/docs/commands.md
@@ -495,23 +495,25 @@ The new portion from the above is seeing that the subsystem "io" is satisfied at
 
 ```console
 ...
-  🔍️ Exploring cluster keebler deeper with depth first search
-
-    👀️ Looking for 'node' in cluster keebler
-      => Checking vertex 'cluster' (count=1) for 'node' (need=2)
-      => Checking vertex 'cluster' (count=1) for 'node' (need=2)
-      => Checking vertex 'rack' (count=1) for 'node' (need=2)
-      => Checking vertex 'node' (count=1) for 'node' (need=2)
-      => Checking vertex 'node' (count=1) for 'node' (need=2)
-     ⏳️ keebler still contender, 3/2 of needed node satisfied
+🍇️ Satisfy request to Graph 🍇️
+ jobspec: {"version":1,"resources":{"ior":{"type":"node","replicas":1,"with":[{"type":"core","count":2,"attributes":{}}],"requires":[{"field":"type","match":"shm","name":"io"}],"attributes":{}}}}
+  🎰️ Resources that that need to be satisfied with matcher match
+     node:  (slot)  1
+       requires
+         field: type
+         match: shm
+         name: io
 
-    👀️ Looking for 'slot' in cluster keebler
-      => Assessing needs for subsystem io
-      => Resource 'node' satisfies subsystem io shm
-    🎯️ dfs: we found 1 clusters to satisfy the request
-2024/03/09 13:49:09 SELECT * from clusters WHERE name LIKE "keebler" LIMIT 1: keebler
-2024/03/09 13:49:09 📝️ received job ior for 1 contender clusters
-2024/03/09 13:49:09 📝️ job ior is assigned to cluster keebler
+  🔍️ Exploring cluster keebler deeper with depth first search
+      => Searching for resource type core from parent contains->rack
+      => Searching for resource type core from parent contains->node
+           Found subsystem edge for io with type shm
+           Minimum slot needs are satisfied at node for io at shm, returning early.
+         slotNeeds are satisfied, returning 1 slots matched
+Slots found 1/1 for vertex cluster
+  match: ✅️ there are 1 matches with sufficient resources
+2024/05/07 19:40:56 📝️ received job app for 1 contender clusters
+2024/05/07 19:40:56 📝️ job app is assigned to cluster [keebler]
 ```
 
 And the work is still assigned to the cluster.
@@ -539,7 +541,7 @@ go run cmd/rainbow/rainbow.go register cluster --cluster-name spack-builder --no
 go run cmd/rainbow/rainbow.go register subsystem  --subsystem spack --nodes-json ./docs/examples/match-algorithms/range/spack-subsystem.json --config-path ./docs/examples/match-algorithms/range/rainbow-config.yaml
 
 # Submit a job that asked for a valid range
-go run ./cmd/rainbow/rainbow.go submit --config-path ./docs/examples/match-algorithms/range/rainbow-config.yaml --jobspec ./docs/examples/match-algorithms/range/jobspec-valid-range.yaml --match-algorithm range
+go run ./cmd/rainbow/rainbow.go submit --config-path ./docs/examples/match-algorithms/range/rainbow-config.yaml --jobspec ./docs/examples/match-algorithms/range/jobspec-valid-range.yaml
 ```
 For the above job, you'll see it's satisfied:
 
@@ -552,8 +554,8 @@ For the above job, you'll see it's satisfied:
 Try submitting a job that can't be satisfied for the range.
 
 ```bash
-# Submit a job that asked for a valid range
-go run ./cmd/rainbow/rainbow.go submit --config-path ./docs/examples/match-algorithms/range/rainbow-config.yaml --jobspec ./docs/examples/match-algorithms/range/jobspec-invalid-range.yaml --match-algorithm range
+# Submit a job that asked for an invalid range
+go run ./cmd/rainbow/rainbow.go submit --config-path ./docs/examples/match-algorithms/range/rainbow-config.yaml --jobspec ./docs/examples/match-algorithms/range/jobspec-invalid-range.yaml
 ```
 ```console
 Slots found 0/1 for vertex cluster
@@ -594,6 +596,4 @@ Awesome! Next we can put that logic in a flux instance (from the Python grpc to
 accept some number of them. The response back to the rainbow scheduler will be those to accept, which will then be removed from the database. For another day.
 
 
-
-
 [home](/README.md#rainbow-scheduler)
diff --git a/docs/examples/match-algorithms/range/jobspec-invalid-range.yaml b/docs/examples/match-algorithms/range/jobspec-invalid-range.yaml
@@ -1,23 +1,18 @@
 version: 1
 resources:
-- count: 2
-  type: node
-  with:
-  - count: 1
-    label: default
-    type: slot
+  spack:
+    replicas: 2
+    type: node
+    requires:
+    - name: spack
+      field: version       
+      min: "0.7.1"
+      max: "0.7.5"
+
     with:
     - count: 2
       type: core
-task:
-  command:
-  - spack
-  slot: default
-  count:
-    per_slot: 1
-  resources:
-    spack:
-      range:
-      - field: version
-        min: "0.7.1"
-        max: "0.7.5"
+
+tasks:
+  - command: [ior]
+    resources: spack
diff --git a/docs/examples/match-algorithms/range/jobspec-valid-range.yaml b/docs/examples/match-algorithms/range/jobspec-valid-range.yaml
@@ -1,23 +1,18 @@
 version: 1
 resources:
-- count: 2
-  type: node
-  with:
-  - count: 1
-    label: default
-    type: slot
+  spack:
+    replicas: 2
+    type: node
+    requires:
+    - name: spack
+      field: version       
+      min: "0.5.1"
+      max: "0.5.5"
+
     with:
     - count: 2
       type: core
-task:
-  command:
-  - ior
-  slot: default
-  count:
-    per_slot: 1
-  resources:
-    spack:
-      range:
-      - field: version
-        min: "0.5.1"
-        max: "0.5.5"
+
+tasks:
+  - command: [ior]
+    resources: spack
diff --git a/docs/examples/match-algorithms/range/rainbow-config.yaml b/docs/examples/match-algorithms/range/rainbow-config.yaml
@@ -8,7 +8,7 @@ scheduler:
             name: match
 cluster:
     name: spack-builder
-    secret: 37e5b798-189f-4c38-bc1c-0a14877acbcf
+    secret: 594c79ea-fc65-4d82-93bb-5e4dc3469276
 graphdatabase:
     name: memory
     host: 127.0.0.1:50051

diff --git a/docs/examples/scheduler/jobspec-constraint.yaml b/docs/examples/scheduler/jobspec-constraint.yaml
@@ -1,20 +1,17 @@
+
 version: 1
 resources:
-- count: 2
-  type: node
-  with:
-  - count: 1
-    label: default
-    type: slot
+  spack:
+    replicas: 2
+    type: node
     with:
     - count: 2
       type: core
-task:
-  command:
-  - ior
-  slot: default
-  count:
-    per_slot: 1
+
+tasks:
+  - command: [ior]
+    resources: spack
+
 attributes:
   parameter:
-    seconds_per_gb: 0.4
+    seconds_per_gb: 0.4    
diff --git a/docs/examples/scheduler/jobspec-io.yaml b/docs/examples/scheduler/jobspec-io.yaml
@@ -1,22 +1,14 @@
 version: 1
 resources:
-- count: 2
-  type: node
-  with:
-  - count: 1
-    label: default
-    type: slot
+  ior:
+    type: node
+    replicas: 1
+    requires:
+    - name: io
+      match: shm
+      field: type
     with:
     - count: 2
       type: core
 task:
-  command:
-  - ior
-  slot: default
-  count:
-    per_slot: 1
-  resources:
-    io:
-      match:
-      - field: type
-        value: shm
+  command: [ior]
diff --git a/docs/examples/scheduler/rainbow-config.yaml b/docs/examples/scheduler/rainbow-config.yaml
@@ -8,7 +8,7 @@ scheduler:
             name: match
 cluster:
     name: keebler
-    secret: df4d1009-a95a-4fe6-8d8d-ae3cf9f016cd
+    secret: c1971ac9-8350-440f-8f5d-b64d97e929a4
 graphdatabase:
     name: memory
     host: 127.0.0.1:50051

diff --git a/go.mod b/go.mod
@@ -6,7 +6,7 @@ require (
 	github.com/Knetic/govaluate v3.0.0+incompatible
 	github.com/Masterminds/semver/v3 v3.2.1
 	github.com/akamensky/argparse v1.4.0
-	github.com/compspec/jobspec-go v0.0.0-20240406210339-886aab99ffbe
+	github.com/compspec/jobspec-go v0.0.0-20240510054255-ee02cdc7d3d4
 	github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe
 	github.com/fatih/color v1.16.0
 	github.com/google/uuid v1.6.0

diff --git a/go.sum b/go.sum
@@ -4,8 +4,8 @@ github.com/Masterminds/semver/v3 v3.2.1 h1:RN9w6+7QoMeJVGyfmbcgs28Br8cvmnucEXnY0
 github.com/Masterminds/semver/v3 v3.2.1/go.mod h1:qvl/7zhW3nngYb5+80sSMF+FG2BjYrf8m9wsX0PNOMQ=
 github.com/akamensky/argparse v1.4.0 h1:YGzvsTqCvbEZhL8zZu2AiA5nq805NZh75JNj4ajn1xc=
 github.com/akamensky/argparse v1.4.0/go.mod h1:S5kwC7IuDcEr5VeXtGPRVZ5o/FdhcMlQz4IZQuw64xA=
-github.com/compspec/jobspec-go v0.0.0-20240406210339-886aab99ffbe h1:AMgW4uL//FX/Rl0lVP0bjvr0s/tjJqUSdxd1enFvMp4=
-github.com/compspec/jobspec-go v0.0.0-20240406210339-886aab99ffbe/go.mod h1:BaJyxaOhESe2DD4lqBdwTEWOw0TaTZVJGPrFh6KyXQM=
+github.com/compspec/jobspec-go v0.0.0-20240510054255-ee02cdc7d3d4 h1:4MaTp3OcUmp6HFEojeI//GthUt7GMYnB8K5OSZdKxZA=
+github.com/compspec/jobspec-go v0.0.0-20240510054255-ee02cdc7d3d4/go.mod h1:BaJyxaOhESe2DD4lqBdwTEWOw0TaTZVJGPrFh6KyXQM=
 github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe h1:Tk//RW3uKn4A7N8gpHRXs+ZGlR7Fxkwh+4/Iml0GBV4=
 github.com/converged-computing/jsongraph-go v0.0.0-20240229082022-c6887a5a00fe/go.mod h1:+DhVyLXGVfBsfta4185jd33jqa94inshCcdvsXK2Irk=
 github.com/fatih/color v1.16.0 h1:zmkK9Ngbjj+K0yRhTVONQh1p/HknKYSlNT+vZCzyokM=

diff --git a/pkg/client/client.go b/pkg/client/client.go
@@ -4,7 +4,7 @@ import (
 	"context"
 	"log"
 
-	js "github.com/compspec/jobspec-go/pkg/jobspec/experimental"
+	js "github.com/compspec/jobspec-go/pkg/nextgen/v1"
 
 	pb "github.com/converged-computing/rainbow/pkg/api/v1"
 	"github.com/converged-computing/rainbow/pkg/config"

diff --git a/pkg/client/endpoint.go b/pkg/client/endpoint.go
@@ -7,7 +7,7 @@ import (
 	"os"
 	"time"
 
-	js "github.com/compspec/jobspec-go/pkg/jobspec/experimental"
+	js "github.com/compspec/jobspec-go/pkg/nextgen/v1"
 	pb "github.com/converged-computing/rainbow/pkg/api/v1"
 	"github.com/converged-computing/rainbow/pkg/config"
 	"github.com/converged-computing/rainbow/pkg/graph"

diff --git a/pkg/graph/algorithm/algorithm.go b/pkg/graph/algorithm/algorithm.go
@@ -6,7 +6,6 @@ import (
 	"fmt"
 	"log"
 
-	v1 "github.com/compspec/jobspec-go/pkg/jobspec/experimental"
 	"github.com/converged-computing/rainbow/pkg/types"
 )
 
@@ -23,8 +22,7 @@ type MatchAlgorithm interface {
 	Init(map[string]string) error
 
 	// A MatchAlgorithm needs to take a slot and determine if it matches
-	GetSlotResourceNeeds(slot *v1.Task) *types.SlotResourceNeeds
-	CheckSubsystemEdge(slotNeeds *types.SlotResourceNeeds, edge *types.Edge, vtx *types.Vertex)
+	CheckSubsystemEdge(slotNeeds *types.MatchAlgorithmNeeds, edge *types.Edge)
 }
 
 // List returns known algorithms

diff --git a/pkg/graph/backend/backend.go b/pkg/graph/backend/backend.go
@@ -4,7 +4,7 @@ import (
 	"fmt"
 	"log"
 
-	js "github.com/compspec/jobspec-go/pkg/jobspec/experimental"
+	js "github.com/compspec/jobspec-go/pkg/nextgen/v1"
 
 	"github.com/converged-computing/jsongraph-go/jsongraph/v2/graph"
 	"github.com/converged-computing/rainbow/pkg/graph/algorithm"