Skip to content

Commit

Permalink
exclusive worker mode as default (#513)
Browse files Browse the repository at this point in the history
* exclusive worker mode as default
* context settings - rename max-parallel-jobs to max-jobs in mists config to keep consistent naming. Its json representation uses maxJobs for this field
  • Loading branch information
dos65 committed Aug 23, 2018
1 parent 7e598b0 commit 186be19
Show file tree
Hide file tree
Showing 9 changed files with 121 additions and 29 deletions.
2 changes: 1 addition & 1 deletion docs/src/main/tut/01_about.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Hydrosphere Mist is a serverless proxy for Spark.
It provides a better way to write, deploy, run, and manage spark applications.

Features:
- Spark Function as a Service. Deploy Spark functions rather than nodetebooks or scripts.
- Spark Function as a Service. Deploy Spark functions rather than notebooks or scripts.
- Typeful library api for scala and java + python dsl.
- Spark contexts managing. Decoupling of user API from Spark settings, cluster provisioning, resource isolation, sharing and auto-scaling.
- Support several interfaces for communication:
Expand Down
12 changes: 8 additions & 4 deletions docs/src/main/tut/02_quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ Demo:
<source src="/mist-docs/img/quick-start-ui.webm" type='video/webm; codecs="vp8, vorbis"'>
</video>


### Build your own function

Mist provides typeful library for writing functions in scala/java and special dsl for python.
Expand Down Expand Up @@ -316,7 +315,6 @@ def hello_mist(sc, n):
return {'result': pi}
```


### Connect to your existing Apache Spark cluster

**Note** For this section it's recommended to use mist from binary distributive. Using mist from docker
Expand Down Expand Up @@ -345,7 +343,7 @@ Configuring a particular function to be executed on a different Spark Cluster is
```

Yarn, Mesos and Kubernetes cluster settings are managed the same way.
Please, follow to offical guides:
Please, follow to official guides:
- [Yarn](https://spark.apache.org/docs/latest/running-on-yarn.html)
- [Mesos](https://spark.apache.org/docs/latest/running-on-mesos.html)
- [Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html)
Expand All @@ -354,5 +352,11 @@ Please, follow to offical guides:

**Note** It may be required to correctly configure `host` values to make mist visible from outside - see [configuration page](/mist-docs/configuration.html)

Mist uses `spark-submit` under the hood, if you need to provide environment variables for it, just set it up before launching Mist Worker.
Mist uses `spark-submit` under the hood, if you need to provide environment variables for it, just set it up before launching Mist Master.

### Next

To get a more information about how mist works and how to configure contexts check following pages:
- [Invocation details](/mist-docs/invocation.html)
- [Context configuration](/mist-docs/contexts.html)

28 changes: 26 additions & 2 deletions docs/src/main/tut/07_invocation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,38 @@ permalink: invocation.html
position: 7
---

### Invocation
### Invocation model

Job could be in one of possible statuses:
Mist treats every function invocation as a job.
To invoke its mist has to have a spark-driver application with spark-context.
This application called `mist-worker` and mist starts and manage them automatically.

Before starting job execution mist master queues them
and waits when there will be a free worker for a context where the job should be executed.
There are two worker-modes for contexts: `exclusive` and `shared`.
For exclusive worker-mode mist starts a new worker per every job and shuts it down after its completion.
For shared mist doesn't shutdown workers right after job completion, it reuses them to execute next jobs that should be executed in the same context.

Also, this invocation model provides following benefits:
- **Parallelization**:
For a particular context it's possible to start more than one `mist-worker` at the same time and balance job requests between them
to provide a way to execute jobs in parallel. Parallelization level is controlled by `max-parallel-jobs` in contexts settings - [contexts doc](/mist-docs/contexts.html)
- **Fault-tolerance**:
If a worker was crushed mists tries to restart it to invoke the next job.
If workers were crushed more than `maxConnFailures` ([contexts doc](/mist-docs/contexts.html)) in sequence this context marks as broken
and fails all jobs until it will be updated.

### Job statuses

Job may be in one of possible statuses:
- `queued`
- `started`
- `canceled`
- `finished`
- `failed`

### Steps

Steps on Mist after receiving a new job request:
- assign `id` fo job.
For example if you use http:
Expand All @@ -32,3 +55,4 @@ Steps on Mist after receiving a new job request:
- transmit function artifact to worker(jar/py)
- invoke function. status turns into `started` and then after compeletion `finished` or `failed`
6 changes: 6 additions & 0 deletions docs/src/main/tut/10_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,9 @@ mist {
}
}
```

### Default context

To override settings for `default` context use prefix `mist.context-defaults` - (master.conf)[https://github.com/Hydrospheredata/mist/blob/master/mist/master/src/main/resources/master.conf#L73].


91 changes: 74 additions & 17 deletions docs/src/main/tut/11_contexts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,80 @@ title: "Contexts"
permalink: contexts.html
position: 11
---
## Contexts
### Contexts

Mist creates and orchestrates Apache Spark contexts automatically. Every job is run in a context.
In fact context describes a named Spark context and Mist settings for this Spark context.
Mist context settings:
- `spark-conf` - settings for a [spark](https://spark.apache.org/docs/latest/configuration.html)
- `max-parallel-jobs` - amount of jobs that can be executed in parallel on the same context
- `run-options` - additional option with command line arguments that will be used during worker creationa using spark-submit.
by default it's empty. [spark docs](https://spark.apache.org/docs/latest/submitting-applications.html)
- `streaming-duration` - spark streaming duration
- `worker-mode`:
There are two types of modes:
- `shared`:
By default when you request a job to run the first time, Mist creates a worker node with new Spark context.
The second request will use the created namespace so the context will be alive while Mist is running.
settings:
- `downtime`(durarion) - idle-timeout for `shared` worker
- `precreated`(boolean) - start context at mist startup time

- `exclusive` - spawn new driver application for every job invocation.

Contexts may be created using [mist-cli](/mist-docs/mist-cli.html) or [http-api](/mist-docs/http_api.html).
Also, there is special `default` context. It may be configured only using mist-configuration file.
It's goal to setup default values for all context, so for creating a new context it isn't required to define values for all its fields.

Settings:
<table>
<thead>
<tr>
<td>Key</td>
<td>Default</td>
<td>Meaning</td>
</tr>
</thead>
<tbody>
<tr>
<td>sparkConf</td>
<td>empty</td>
<td>settings for a [spark](https://spark.apache.org/docs/latest/configuration.html)</td>
</tr>
<tr>
<td>maxJobs</td>
<td>1</td>
<td>amount of jobs executed in parallel</td>
</tr>
<tr>
<td>workerMode</td>
<td>exclusive</td>
<td>
<ul>
<li>exclusive - starts new worker for every job invocation</li>
<li>shared - reuses worker between several jobs</li>
</ul>
</td>
</tr>
<tr>
<td>maxConnFailures</td>
<td>5</td>
<td>
allowed amount of worker crushes before context will be switched into `broken` state
(it fails all incoming requests until context settings is updated).
</td>
</tr>
<tr>
<td>runOptions</td>
<td>""</td>
<td>
additional command line arguments for building spark-submit command to start worker, e.x: pass `--jars`
</td>
</tr>
<tr>
<td>streamingDuration</td>
<td>1s</td>
<td>
spark streaming duration
</td>
</tr>
<tr>
<td>precreated</td>
<td>false</td>
<td>
if true starts worker immediately,
if false await first job start requests before starting worker
*NOTE*: works only with `shared` workerMode
</td>
</tr>
<tr>
<td>downtime</td>
<td>false</td>
<td>idle-timeout for `shared` worker</td>
</tr>
</tbody>
</table>
2 changes: 1 addition & 1 deletion docs/src/main/tut/13_mist_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ position: 13
Mist-cli provides an command line interface to mist-server to manage contexts/functions.
Actually under the hood it uses [http api](/mist_docs/http_api.html).

Instanciation of a new endpoint on mist requres following steps:
Instanciation of a new endpoint on mist requires following steps:
- upload an articat with function
- create context for it's invocation
- create function
Expand Down
5 changes: 3 additions & 2 deletions mist/master/src/main/resources/master.conf
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ mist {
context-defaults {
downtime = 1 hour
streaming-duration = 1 seconds
max-parallel-jobs = 4
max-parallel-jobs = 1
max-jobs = 1
precreated = false
worker-mode = "shared" # shared | exclusive
worker-mode = "exclusive" # shared | exclusive
spark-conf {
#spark.default.parallelism = 128
#spark.driver.memory = "512m"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ object ConfigRepr {
.map(entry => cleanKey(entry.getKey) -> entry.getValue.unwrapped().toString)
.toMap,
downtime = Duration(config.getString("downtime")),
maxJobs = config.getInt("max-parallel-jobs"),
maxJobs = config.getOptInt("max-jobs").getOrElse(config.getInt("max-parallel-jobs")),
precreated = config.getBoolean("precreated"),
workerMode = runMode(config.getString("worker-mode")) ,
runOptions = config.getString("run-options"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class ContextsStorageSpec extends FunSpec with Matchers with BeforeAndAfter {

it("should fallback to default") {
val contexts = testStorage()
val expected = TestUtils.contextSettings.default.copy(name = "new")
val expected = contexts.defaultConfig.copy(name = "new")
contexts.getOrDefault("new").await shouldBe expected
}

Expand Down

0 comments on commit 186be19

Please sign in to comment.