Skip to content

Commit

Permalink
feat: slurm support (#98)
Browse files Browse the repository at this point in the history
This change adds a dispatcher resource manager, which gives Determined the ability to run on top of Slurm.

Co-authored-by: rcorujo <90728398+rcorujo@users.noreply.github.com>
Co-authored-by: Phillip Gaisford <phillip.gaisford@hpe.com>
Co-authored-by: phillip-gaisford <98362331+phillip-gaisford@users.noreply.github.com>
Co-authored-by: Jerry J. Harrow <84593277+jerryharrow@users.noreply.github.com>
Co-authored-by: Jagadeesh Madagundi <jagadeesh545@gmail.com>
Co-authored-by: Philip Norman <philipnrmn@users.noreply.github.com>
  • Loading branch information
7 people authored and determined-ci committed Feb 2, 2024
1 parent a75ffc1 commit 6a98e08
Show file tree
Hide file tree
Showing 25 changed files with 3,977 additions and 2 deletions.
188 changes: 188 additions & 0 deletions agent/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
module github.com/determined-ai/determined/agent

go 1.21

require (
github.com/determined-ai/determined/master v0.0.0
github.com/docker/distribution v2.8.2+incompatible
github.com/docker/docker v20.10.24+incompatible
github.com/docker/docker-credential-helpers v0.6.4
github.com/docker/go-connections v0.4.0 // indirect
github.com/ghodss/yaml v1.0.1-0.20190212211648-25d852aebe32
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/google/uuid v1.3.0
github.com/gorilla/websocket v1.5.0
github.com/labstack/echo/v4 v4.9.1
github.com/pkg/errors v0.9.1
github.com/shirou/gopsutil v3.21.11+incompatible
github.com/sirupsen/logrus v1.8.1
github.com/spf13/cobra v1.6.1
github.com/spf13/pflag v1.0.5
golang.org/x/sys v0.10.0
gotest.tools v2.2.0+incompatible
)

require (
github.com/davecgh/go-spew v1.1.1
github.com/stretchr/testify v1.8.1
go.opentelemetry.io/contrib/instrumentation/github.com/labstack/echo/otelecho v0.29.0
golang.org/x/exp v0.0.0-20220328175248-053ad81199eb
)

require (
cloud.google.com/go v0.94.0 // indirect
cloud.google.com/go/storage v1.10.0 // indirect
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
github.com/Azure/go-autorest v14.2.0+incompatible // indirect
github.com/Azure/go-autorest/autorest v0.11.20 // indirect
github.com/Azure/go-autorest/autorest/adal v0.9.15 // indirect
github.com/Azure/go-autorest/autorest/date v0.3.0 // indirect
github.com/Azure/go-autorest/logger v0.2.1 // indirect
github.com/Azure/go-autorest/tracing v0.6.0 // indirect
github.com/Microsoft/go-winio v0.5.0 // indirect
github.com/aead/chacha20 v0.0.0-20180709150244-8b13a72661da // indirect
github.com/aead/chacha20poly1305 v0.0.0-20170617001512-233f39982aeb // indirect
github.com/aead/poly1305 v0.0.0-20180717145839-3fee0db0b635 // indirect
github.com/aws/aws-sdk-go v1.40.34 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cenkalti/backoff/v4 v4.1.3 // indirect
github.com/cespare/xxhash/v2 v2.1.1 // indirect
github.com/coreos/go-systemd v0.0.0-20190719114852-fd7a80b32e1f // indirect
github.com/determined-ai/determined/proto v0.0.0-00010101000000-000000000000 // indirect
github.com/docker/go-metrics v0.0.1 // indirect
github.com/docker/go-units v0.4.0 // indirect
github.com/docker/libtrust v0.0.0-20150114040149-fa567046d9b1 // indirect
github.com/dustinkirkland/golang-petname v0.0.0-20191129215211-8e5a1ed0cff0 // indirect
github.com/elastic/go-elasticsearch/v7 v7.9.0 // indirect
github.com/emirpasic/gods v1.18.1 // indirect
github.com/fatih/color v1.15.0 // indirect
github.com/go-logr/logr v1.2.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-pg/migrations/v8 v8.1.0 // indirect
github.com/go-pg/pg/v10 v10.10.6 // indirect
github.com/go-pg/zerochecker v0.2.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt v3.2.2+incompatible // indirect
github.com/golang-jwt/jwt/v4 v4.4.3 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.2 // indirect
github.com/google/go-cmp v0.5.8 // indirect
github.com/google/gofuzz v1.1.0 // indirect
github.com/googleapis/gax-go/v2 v2.1.0 // indirect
github.com/googleapis/gnostic v0.5.5 // indirect
github.com/gorilla/mux v1.8.0 // indirect
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.7.0 // indirect
github.com/hashicorp/errwrap v1.0.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/hashicorp/golang-lru v0.5.1 // indirect
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
github.com/inconshreveable/mousetrap v1.0.1 // indirect
github.com/jackc/chunkreader/v2 v2.0.1 // indirect
github.com/jackc/pgconn v1.9.0 // indirect
github.com/jackc/pgio v1.0.0 // indirect
github.com/jackc/pgpassfile v1.0.0 // indirect
github.com/jackc/pgproto3/v2 v2.1.1 // indirect
github.com/jackc/pgservicefile v0.0.0-20200714003250-2b9c44734f2b // indirect
github.com/jackc/pgtype v1.8.0 // indirect
github.com/jackc/pgx/v4 v4.12.0 // indirect
github.com/jinzhu/copier v0.3.5 // indirect
github.com/jinzhu/inflection v1.0.0 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/jmoiron/sqlx v1.2.1-0.20190826204134-d7d95172beb5 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/labstack/echo-contrib v0.11.0 // indirect
github.com/labstack/gommon v0.4.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.19 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 // indirect
github.com/moby/term v0.0.0-20210619224110-3f7ff695adc6 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/morikuni/aec v1.0.0 // indirect
github.com/o1egl/paseto v1.0.0 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.0.2 // indirect
github.com/opentracing/opentracing-go v1.2.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_golang v1.11.1 // indirect
github.com/prometheus/client_model v0.2.0 // indirect
github.com/prometheus/common v0.26.0 // indirect
github.com/prometheus/procfs v0.6.0 // indirect
github.com/rogpeppe/go-internal v1.9.0 // indirect
github.com/santhosh-tekuri/jsonschema/v2 v2.2.0 // indirect
github.com/segmentio/backo-go v0.0.0-20200129164019-23eae7c10bd3 // indirect
github.com/shopspring/decimal v1.2.0 // indirect
github.com/soheilhy/cmux v0.1.4 // indirect
github.com/tklauser/go-sysconf v0.3.11 // indirect
github.com/tklauser/numcpus v0.6.0 // indirect
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc // indirect
github.com/uber/jaeger-client-go v2.25.0+incompatible // indirect
github.com/uber/jaeger-lib v2.4.0+incompatible // indirect
github.com/uptrace/bun v1.1.14 // indirect
github.com/uptrace/bun/dialect/pgdialect v1.1.14 // indirect
github.com/uptrace/bun/extra/bundebug v1.1.14 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasttemplate v1.2.1 // indirect
github.com/vmihailenco/bufpool v0.1.11 // indirect
github.com/vmihailenco/msgpack/v5 v5.3.5 // indirect
github.com/vmihailenco/tagparser v0.1.2 // indirect
github.com/vmihailenco/tagparser/v2 v2.0.0 // indirect
github.com/xtgo/uuid v0.0.0-20140804021211-a0b114877d4c // indirect
github.com/yusufpapurcu/wmi v1.2.2 // indirect
go.opencensus.io v0.23.0 // indirect
go.opentelemetry.io/otel v1.6.1 // indirect
go.opentelemetry.io/otel/exporters/otlp/internal/retry v1.6.1 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.6.1 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.6.1 // indirect
go.opentelemetry.io/otel/sdk v1.6.1 // indirect
go.opentelemetry.io/otel/trace v1.6.1 // indirect
go.opentelemetry.io/proto/otlp v0.12.1 // indirect
go.uber.org/atomic v1.10.0 // indirect
golang.org/x/crypto v0.0.0-20220829220503-c86fa9a7ed90 // indirect
golang.org/x/net v0.7.0 // indirect
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 // indirect
golang.org/x/sync v0.1.0 // indirect
golang.org/x/term v0.5.0 // indirect
golang.org/x/text v0.7.0 // indirect
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac // indirect
google.golang.org/api v0.56.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20211223182754-3ac035c7e7cb // indirect
google.golang.org/grpc v1.45.0 // indirect
google.golang.org/protobuf v1.28.0 // indirect
gopkg.in/guregu/null.v3 v3.4.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/segmentio/analytics-go.v3 v3.1.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/api v0.20.14 // indirect
k8s.io/apimachinery v0.20.14 // indirect
k8s.io/client-go v0.20.14 // indirect
k8s.io/klog/v2 v2.30.0 // indirect
k8s.io/kube-openapi v0.0.0-20211115234752-e816edb12b65 // indirect
k8s.io/utils v0.0.0-20210930125809-cb0fa318a74b // indirect
mellium.im/sasl v0.3.1 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.1.2 // indirect
sigs.k8s.io/yaml v1.2.0 // indirect
)

require (
github.com/coreos/go-oidc/v3 v3.1.0 // indirect
github.hpe.com/hpe/hpc-ard-launcher-go/launcher v0.1.2 // indirect
gopkg.in/oauth2.v3 v3.12.0 // indirect
gopkg.in/square/go-jose.v2 v2.5.1 // indirect
)

replace github.com/determined-ai/determined/master => ../master

replace github.com/determined-ai/determined/proto => ../proto

// Determined AI's CircleCI doesn't have access to "github.hpe.com/hpe/hpc-ard-launcher-go",
// so the build will fail in CircleCI. Therefore, we had to do a "git clone" of the
// launcher repo to store a local copy. We make use of the "replace" directive to use the
// local copy and not try to pull it from GitHub.
replace github.hpe.com/hpe/hpc-ard-launcher-go/launcher => ../hpc-ard-launcher-go/launcher
29 changes: 29 additions & 0 deletions hpc-ard-launcher-go/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# hpc-ard-launcher-go

This repo is the home of the Capsules (hpc-ard-capsules-core) dispatch server Go client.

The code found here is generated automatically using openapi tools from the Capsules REST API specification. It can be build wit the following command line executed in the hpc-ard-capsules-core project:

```
mvn -pl com.cray.analytics.capsules:capsules-dispatch-client clean generate-sources -P go-client
```
To install the package to your Go environment:

If you use ssh to interact with github.hpe.com, add the following to your ~/.gitconfig:
```
[url "ssh://git@github.hpe.com/"]
insteadOf = https://github.hpe.com/
```
Then:
```
% export GOPRIVATE=github.hpe.com/hpe/hpc-ard-launcher-go
% go get github.hpe.com/hpe/hpc-ard-launcher-go/launcher
```
Import the launcher package to your Go program thus:
```
import (
<other imports go here>
"github.hpe.com/hpe/hpc-ard-launcher-go/launcher"
)
```
5 changes: 5 additions & 0 deletions hpc-ard-launcher-go/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
module github.hpe.com/hpe/hpc-ard-launcher-go

go 1.13

require golang.org/x/oauth2 v0.0.0-20210218202405-ba52d332ba99

0 comments on commit 6a98e08

Please sign in to comment.