Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
15951a6
pool initial
Dec 8, 2023
27b169e
pool add
Dec 20, 2023
74c2c22
merge pool logic
Dec 27, 2023
4c946e0
add local instance to the pool
Jan 10, 2024
a314da7
multitask
Jan 12, 2024
7f3316c
update
Jan 16, 2024
7e1fc3a
update
Jan 23, 2024
6229341
update
Jan 31, 2024
e246148
debug
Feb 1, 2024
28dfa37
debug
Feb 1, 2024
718a7d4
fix mypy
Feb 4, 2024
97d1c1b
improve dstack run
Feb 5, 2024
d575d4a
fixup! improve dstack run
Feb 5, 2024
9acd2df
Merge remote-tracking branch 'origin/master' into pool
Feb 5, 2024
89160da
Refactor check_relevance to use gpuhunt
Egor-S Feb 5, 2024
d7a32b1
Fix tests discovery for Python 3.8
Egor-S Feb 5, 2024
2b6e24f
Improve profile.pool_name handling
Feb 5, 2024
24d1d55
Merge remote-tracking branch 'origin/master' into pool
Feb 5, 2024
43e1607
Merge remote-tracking branch 'origin/pool-gpuhunt' into pool
Feb 5, 2024
cae383a
Fix spot policy mapping
Egor-S Feb 5, 2024
58dab52
fixup! Improve profile.pool_name handling
Feb 5, 2024
506e679
Merge remote-tracking branch 'origin/pool-gpuhunt' into pool
Feb 5, 2024
702609b
Merge remote-tracking branch 'origin/master' into pool
Feb 5, 2024
eb574b1
Merge remote-tracking branch 'origin/master' into pool
Feb 6, 2024
e047d4e
small fixies
Feb 6, 2024
c618305
Fix rich formatting, require --name
Egor-S Feb 6, 2024
9393ff6
Filter out deleted instances in pools
Egor-S Feb 6, 2024
2cfc268
Update runner, fix status
Feb 7, 2024
c963523
Merge remote-tracking branch 'origin/pool-cli-improvements' into pool
Feb 7, 2024
48cba16
Merge remote-tracking branch 'origin/master' into pool
Feb 7, 2024
64f3ef8
Improve job statuses
Feb 7, 2024
e3f61a4
Merge remote-tracking branch 'origin/master' into pool
Feb 7, 2024
f36ca26
fix hardcode
Feb 7, 2024
28b0e19
Fix CI
Feb 7, 2024
03a80d0
Fix currency missing in dstack pool show PRICE
r4victor Feb 8, 2024
447ed53
Fix ssh keys. Fix review
Feb 8, 2024
9daa7a3
Merge remote-tracking branch 'origin/issue_790_dstack_pool' into issu…
Feb 8, 2024
586c8bb
Merge remote-tracking branch 'origin/master' into issue_790_dstack_pool
Feb 8, 2024
8895557
Do not require repo in `dstack add`
Egor-S Feb 8, 2024
f9fb2cd
Show provisioned instance in `dstack pool add`
Egor-S Feb 8, 2024
9f01bca
Validate `pool add` resources args
Egor-S Feb 8, 2024
e79bfb6
Print pool name in run plan
Egor-S Feb 8, 2024
b831a60
Add TODOs
Egor-S Feb 8, 2024
c35e63b
Add InstanceAvailability for pool instances
Egor-S Feb 8, 2024
9bc636b
Timout for runner download
Feb 9, 2024
5011a93
Remove duplicate code. Fix termination_idle_time. Small review fixes
Feb 9, 2024
31722f5
Merge remote-tracking branch 'origin/master' into issue_790_dstack_pool
Feb 9, 2024
80a478c
Merge remote-tracking branch 'origin/issue_790_dstack_pool-improve-cl…
Feb 9, 2024
a3a3cf3
Always load job
Feb 9, 2024
60b1167
Merge remote-tracking branch 'origin/master' into issue_790_dstack_pool
Feb 9, 2024
95f34ce
TODO and small fix
Egor-S Feb 9, 2024
b01ec50
Fix job_name
Feb 9, 2024
2076ad5
`dstack show` now works with no pool provided
Egor-S Feb 9, 2024
e0f2240
Improve dstack pool show formatting
Egor-S Feb 9, 2024
80df9f1
Merge remote-tracking branch 'origin/issue_790_dstack_pool-default-po…
Feb 9, 2024
c9ce8da
Ask confirmation on `dstack pool remove`
Egor-S Feb 9, 2024
ddad6ee
Fix done jobs being aborted
r4victor Feb 12, 2024
4c0a45d
Remove mypy and comments
Feb 12, 2024
907e5d6
Merge remote-tracking branch 'origin/issue_790_dstack_pool' into issu…
Feb 12, 2024
16cabf6
Rename url pool/set-default to pool/set_default
Feb 12, 2024
2a10473
Replace instance.instance_id with instance.name
Egor-S Feb 12, 2024
b61ad10
dstack pool remove: name as positional argument
Egor-S Feb 12, 2024
e2208de
fix review
Feb 12, 2024
60f505d
Merge remote-tracking branch 'origin/issue_790_dstack_pool-fix-names-…
Feb 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ jobs:
VERSION=$((${{ github.run_number }} + ${{ env.BUILD_INCREMENT }}))
go build -ldflags "-X '$REPO_NAME/runner/cmd/runner/version.Version=$VERSION' -extldflags '-static'" -o dstack-runner-$GOOS-$GOARCH $REPO_NAME/runner/cmd/runner
go build -ldflags "-X '$REPO_NAME/runner/cmd/shim/version.Version=$VERSION' -extldflags '-static'" -o dstack-shim-$GOOS-$GOARCH $REPO_NAME/runner/cmd/shim
echo $VERSION
- uses: actions/upload-artifact@v3
with:
name: dstack-runner
Expand Down
9 changes: 8 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.0
rev: v0.2.1
hooks:
- id: ruff
name: ruff common
args: ['--fix']
- id: ruff-format
- repo: https://github.com/golangci/golangci-lint
rev: v1.56.1
hooks:
- id: golangci-lint-full
entry: bash -c 'cd runner && golangci-lint run -D depguard --presets import,module,unused "$@"'
stages: [manual]
42 changes: 42 additions & 0 deletions docs/docs/reference/pool/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# dstack pool

## What is `dstack pool`

The primary element that enables you to precisely control how compute instances are used is the `dstack pool`.

- Sometimes the desired instance for the task might not be available. The `dstack pool` will wait for compute instances to become available and, when possible, allocate instances before running tasks on these instances.

- You need reserved compute instances to work on a constant load. The dstack will pre-allocate ondemand instances and allow you to run tasks on them when they are available.

- I want to speed up tasks start. Searching for instances and provisioning the runner will take time. When using dstack pool, tasks will be distributed to already running instances.

- You have your own compute instances. You can connect them to a dstack pool and use them with cloud instances.

## How to use

Any task that runs without setted the argument `--pool` by default uses a pool named `default`.

When you specify a pool name for a task, for example `dstack run --pool mypool` there are two ways the task will be executed:

- if `mypool` exists, the task will be run on a available instance with the suitable configuration
- if `mypool` does not exist, this pool will be created and the compute instances required for the pool are created and connected to that pool.

### CLI

- `dstack pool list`
- `dstack pool create`
- `dstack pool show <poolname>`
- `dstack pool add `
- `dstack pool delete`

### Instance lifecycle

- idle time
- reservation policy (instance termination)
- task retry policy

### Add your own compute instance

When connecting your own instance, it must have public ip-address for the dstack server to connect.

To connect you need to pass the ip-addres and ssh credentials to the command `dstack poll add --host HOST --port PORT --ssh-private-key-fileSSH_PRIVATE_KEY_FILE`.
5 changes: 4 additions & 1 deletion runner/cmd/runner/cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ func App() {
},
},
Action: func(c *cli.Context) error {
start(paths.tempDir, paths.homeDir, paths.workingDir, httpPort, logLevel)
err := start(paths.tempDir, paths.homeDir, paths.workingDir, httpPort, logLevel, Version)
if err != nil {
return cli.Exit(err, 1)
}
return nil
},
},
Expand Down
31 changes: 20 additions & 11 deletions runner/cmd/runner/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,46 @@ package main
import (
"context"
"fmt"
"github.com/dstackai/dstack/runner/internal/log"
"github.com/dstackai/dstack/runner/internal/runner/api"
"github.com/sirupsen/logrus"
"io"
_ "net/http/pprof"
"os"
"path/filepath"

"github.com/dstackai/dstack/runner/internal/log"
"github.com/dstackai/dstack/runner/internal/runner/api"
"github.com/sirupsen/logrus"
"github.com/ztrue/tracerr"
)

func main() {
App()
}

func start(tempDir string, homeDir string, workingDir string, httpPort int, logLevel int) {
func start(tempDir string, homeDir string, workingDir string, httpPort int, logLevel int, version string) error {
if err := os.MkdirAll(tempDir, 0755); err != nil {
log.Error(context.TODO(), "Failed to create temp directory", "err", err)
os.Exit(1)
return tracerr.Errorf("Failed to create temp directory: %w", err)
}

defaultLogFile, err := log.CreateAppendFile(filepath.Join(tempDir, "default.log"))
if err != nil {
log.Error(context.TODO(), "Failed to create default log file", "err", err)
os.Exit(1)
return tracerr.Errorf("Failed to create default log file: %w", err)
}
defer func() { _ = defaultLogFile.Close() }()
defer func() {
err = defaultLogFile.Close()
if err != nil {
tracerr.Print(err)
}
}()

log.DefaultEntry.Logger.SetOutput(io.MultiWriter(os.Stdout, defaultLogFile))
log.DefaultEntry.Logger.SetLevel(logrus.Level(logLevel))

server := api.NewServer(tempDir, homeDir, workingDir, fmt.Sprintf(":%d", httpPort))
server := api.NewServer(tempDir, homeDir, workingDir, fmt.Sprintf(":%d", httpPort), version)

log.Trace(context.TODO(), "Starting API server", "port", httpPort)
if err := server.Run(); err != nil {
log.Error(context.TODO(), "Server failed", "err", err)
return tracerr.Errorf("Server failed: %w", err)
}

return nil
}
2 changes: 1 addition & 1 deletion runner/cmd/runner/version.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package main

// Version A default build-time variable. The value is overridden via ldflags.
var Version = "0.0.1.dev1"
var Version = "0.0.1.dev2"
81 changes: 32 additions & 49 deletions runner/cmd/shim/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,35 +2,29 @@ package main

import (
"context"
"errors"
"fmt"
"log"
"net/http"
"os"
"path/filepath"
"time"

"github.com/dstackai/dstack/runner/internal/gerrors"
"github.com/dstackai/dstack/runner/consts"
"github.com/dstackai/dstack/runner/internal/shim"
"github.com/dstackai/dstack/runner/internal/shim/api"
"github.com/dstackai/dstack/runner/internal/shim/backends"
"github.com/urfave/cli/v2"
)

func main() {
var backendName string
var args shim.CLIArgs
args.Docker.SSHPort = 10022

app := &cli.App{
Name: "dstack-shim",
Usage: "Starts dstack-runner or docker container. Kills the VM on exit.",
Usage: "Starts dstack-runner or docker container.",
Version: Version,
Flags: []cli.Flag{
&cli.StringFlag{
Name: "backend",
Usage: "Cloud backend provider",
Required: true,
Destination: &backendName,
EnvVars: []string{"DSTACK_BACKEND"},
},
/* Shim Parameters */
&cli.PathFlag{
Name: "home",
Expand Down Expand Up @@ -85,18 +79,6 @@ func main() {
Usage: "Starts docker container and modifies entrypoint",
Flags: []cli.Flag{
/* Docker Parameters */
&cli.BoolFlag{
Name: "with-auth",
Usage: "Waits for registry credentials",
Destination: &args.Docker.RegistryAuthRequired,
},
&cli.StringFlag{
Name: "image",
Usage: "Docker image name",
Required: true,
Destination: &args.Docker.ImageName,
EnvVars: []string{"DSTACK_IMAGE_NAME"},
},
&cli.BoolFlag{
Name: "keep-container",
Usage: "Do not delete container on exit",
Expand All @@ -112,48 +94,48 @@ func main() {
},
Action: func(c *cli.Context) error {
if args.Runner.BinaryPath == "" {
if err := args.Download("linux"); err != nil {
return gerrors.Wrap(err)
if err := args.DownloadRunner(); err != nil {
return cli.Exit(err, 1)
}
defer func() { _ = os.Remove(args.Runner.BinaryPath) }()
}

log.Printf("Backend: %s\n", backendName)
args.Runner.TempDir = "/tmp/runner"
args.Runner.HomeDir = "/root"
args.Runner.WorkingDir = "/workflow"

var err error

// set dstack home path
args.Shim.HomeDir, err = getDstackHome(args.Shim.HomeDir)
if err != nil {
return gerrors.Wrap(err)
return cli.Exit(err, 1)
}
log.Printf("Docker: %+v\n", args)

server := api.NewShimServer(fmt.Sprintf(":%d", args.Shim.HTTPPort), args.Docker.RegistryAuthRequired)
return gerrors.Wrap(server.RunDocker(context.TODO(), &args))
},
},
{
Name: "subprocess",
Usage: "Docker-less mode",
Action: func(c *cli.Context) error {
return gerrors.New("not implemented")
dockerRunner, err := shim.NewDockerRunner(args)
if err != nil {
return cli.Exit(err, 1)
}

address := fmt.Sprintf(":%d", args.Shim.HTTPPort)
shimServer := api.NewShimServer(address, dockerRunner, Version)

defer func() {
shutdownCtx, cancelShutdown := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelShutdown()
_ = shimServer.HttpServer.Shutdown(shutdownCtx)
}()

if err := shimServer.HttpServer.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
return cli.Exit(err, 1)
}

return nil
},
},
},
}

defer func() {
backend, err := backends.NewBackend(context.TODO(), backendName)
if err != nil {
log.Fatal(err)
}
if err = backend.Terminate(context.TODO()); err != nil {
log.Fatal(err)
}
}()

if err := app.Run(os.Args); err != nil {
log.Fatal(err)
}
Expand All @@ -163,9 +145,10 @@ func getDstackHome(flag string) (string, error) {
if flag != "" {
return flag, nil
}

home, err := os.UserHomeDir()
if err != nil {
return "", gerrors.Wrap(err)
return "", err
}
return filepath.Join(home, ".dstack"), nil
return filepath.Join(home, consts.DSTACK_DIR_PATH), nil
}
2 changes: 1 addition & 1 deletion runner/cmd/shim/version.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
package main

var Version = "0.0.0dev1"
var Version = "0.0.0dev2"
41 changes: 2 additions & 39 deletions runner/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,6 @@ module github.com/dstackai/dstack/runner
go 1.19

require (
cloud.google.com/go/compute v1.23.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.3.1
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4 v4.2.1
github.com/aws/aws-sdk-go-v2/config v1.18.39
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.13.11
github.com/aws/aws-sdk-go-v2/service/ec2 v1.118.0
github.com/bluekeyes/go-gitdiff v0.6.0
github.com/creack/pty v1.1.18
github.com/docker/docker v24.0.6+incompatible
Expand All @@ -18,28 +12,15 @@ require (
github.com/sirupsen/logrus v1.9.0
github.com/stretchr/testify v1.8.1
github.com/urfave/cli/v2 v2.25.7
github.com/ztrue/tracerr v0.4.0
golang.org/x/crypto v0.14.0
)

require (
cloud.google.com/go/compute/metadata v0.2.3 // indirect
dario.cat/mergo v1.0.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.7.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.1.1 // indirect
github.com/Microsoft/go-winio v0.6.1 // indirect
github.com/ProtonMail/go-crypto v0.0.0-20230717121422-5aa5874ade95 // indirect
github.com/acomagu/bufpipe v1.0.4 // indirect
github.com/aws/aws-sdk-go-v2 v1.21.0 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.13.37 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.41 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.35 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.3.42 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.35 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.13.6 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.15.6 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.21.5 // indirect
github.com/aws/smithy-go v1.14.2 // indirect
github.com/cloudflare/circl v1.3.3 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
Expand All @@ -49,27 +30,18 @@ require (
github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 // indirect
github.com/go-git/go-billy/v5 v5.4.1 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt/v5 v5.0.0 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/s2a-go v0.1.4 // indirect
github.com/google/uuid v1.3.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.2.3 // indirect
github.com/googleapis/gax-go/v2 v2.11.0 // indirect
github.com/h2non/filetype v1.1.3 // indirect
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/juju/errors v0.0.0-20181118221551-089d3ea4e4d5 // indirect
github.com/juju/loggo v1.0.0 // indirect
github.com/kevinburke/ssh_config v1.2.0 // indirect
github.com/klauspost/compress v1.15.13 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/moby/term v0.5.0 // indirect
github.com/morikuni/aec v1.0.0 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.0.2 // indirect
github.com/pjbgf/sha1cd v0.3.0 // indirect
github.com/pkg/browser v0.0.0-20210911075715-681adbf594b8 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
Expand All @@ -78,22 +50,13 @@ require (
github.com/ulikunitz/xz v0.5.11 // indirect
github.com/xanzy/ssh-agent v0.3.3 // indirect
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect
go.opencensus.io v0.24.0 // indirect
golang.org/x/mod v0.13.0 // indirect
golang.org/x/net v0.16.0 // indirect
golang.org/x/oauth2 v0.8.0 // indirect
golang.org/x/sys v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/tools v0.14.0 // indirect
google.golang.org/api v0.126.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20230530153820-e85fd2cbaebc // indirect
google.golang.org/grpc v1.55.0 // indirect
google.golang.org/protobuf v1.30.0 // indirect
gopkg.in/mgo.v2 v2.0.0-20190816093944-a6b53ec6cb22 // indirect
gopkg.in/warnings.v0 v0.1.2 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gotest.tools/v3 v3.5.0 // indirect
)

Expand Down
Loading