-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
42 changed files
with
420 additions
and
260 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Commands | ||
|
||
Harness includes an Admin command line interface. It runs using the Harness REST interface and can be run remotely. | ||
|
||
## Conventions | ||
|
||
Internal to Harness are ***Engines Instances*** that implement some algorithm and contain datasets and configuration parameters. All input data is validated by the engine, and must be readable by the algorithm. The simple form of workflow is: | ||
|
||
1. start server | ||
2. add engine | ||
3. input data to the engine | ||
4. train (for Lambda, Kappa will auto train with each new input) | ||
5. query | ||
|
||
See the [Workflow](workflow.md) section for more detail. | ||
|
||
Harness uses resource-ids to identify all objects in the system. The Engine Instance must have a JSON file, which contains all parameters for Harness engine management including its Engine Instance resource-id as well as algorithm parameters that are specific to the Engine type. All Harness global configuration is stored in `harness-env` see [Harness Config](harness_config.md) for details. | ||
|
||
- The file `<some-engine.json>` can be named anything and put anywhere. | ||
- The working copy of all engine parameters and input data is actually in a shared database. Add or update an Engine Instance to change its configuraiton. Changing the file will not update the Engine Instance. See the `add` and `update` commands. | ||
|
||
# Harness Start and Stop | ||
|
||
Scripts that start and stop Harness are included with the project in the `sbin/`. These are used inside container startup and can be used directly in the OS level installation. | ||
|
||
- **`harness-start [-f]`** starts the harness server based on configuration in `harness-env`. The `-f` argument forces a restart if Harness is already running. All other commands require the service to be running, it is always started as a daemon/background process. All previously configured engines are started in the state they were in when harness was last run. | ||
|
||
- **`harness-stop`** gracefully stops harness and all engines. If the pid-file has become out of sync, look for the `HarnessServer` process with `jps -lm` or `ps aux | grep HarnessServer` and execute `kill <pid>` to stop it. | ||
|
||
# Harness Administration | ||
|
||
- **`harnctl status [engines [<engine-id>], users [<user-id>]]`** These print status information about the objects requested. Asking for user status requires the Harness Auth-server, which is optional. | ||
- **`harnctl add <some-engine.json>`** creates and starts an Engine Instance of the type defined by the `engineFactory` parameter. | ||
- **`harnctl update <some-engine.json>`** updates an existing Engine Instance with values defined in `some-engine.json`. The Engine knows what is safe to update and may warn if some value is not updatable but this will be rare. | ||
- **`harnctl delete <some-engine-id>`** The Engine Instance will be stopped and the accumulated dataset and model will be deleted. No artifacts of the Engine Instance will remain except the `some-engine.json` file and any mirrored events. | ||
- **`harnctl import <some-engine-id> [<some-directory> | <some-file>]`** This is typically used to replay previously mirrored events or load bootstrap datasets created from application logs. It is equivalent to sending all imported events to the REST API. | ||
- **`harnctl export <some-engine-id> [<some-directory> | <some-file>]`** If the directory is supplied with the protocol "file:" the export will go to the harness server host's file system. This is for use with vertically scaled Harness. For more general storage use HDFS (the Hadoop File System) flagged by the protocol `hdfs` for example: `hdfs://some-hdfs-server:9000/users/<some-user>/<some-directory>`. [**to me implemented in 0.5.0**] | ||
- **`harnctl train <some-engine-id>`** For Lambda style engines like the UR this will create or update a model. This is required for Lambda Engines before queries will return values. | ||
|
||
# Harness Auth-server Administration | ||
|
||
There are several extended commands that manage Users and Role. These are only needed when using the Harness Auth-server to create secure multi-tenancy. Open multi-tenancy is the default and requires no Auth-Server | ||
|
||
- **`harnctl user-add [client <engine-id> | admin]`** Returns a new user-id and their secret. Grants the role's permissions. Client Users have access to one or more `engine-id`s, `admin` Users have access to all `engine-id`s as well as admin only commands and REST endpoints. | ||
- **`harnctl user-delete <user-id>`** removes all access for the `user-id` | ||
- **`harnctl grant <user-id> [client <engine-id> | admin]`** adds permissions to an existing user | ||
- **`harnctl revoke <user-id> [client <engine-id> | admin]`** removes permissions from an existing user | ||
|
||
# Bootstrapping With Import | ||
|
||
Import can be used to restore backed up data but also for bootstrapping a new Engine instance with previously logged or collected batches of data. Imagine a recommender that takes in people's purchase history. This might exist in server logs and converting these to files of JSON events is an easy and reproducible way to "bootstrap" your recommender with previous data before you start to send live events. This, in effect, trains your recommender retro-actively, improving the quality of recommendations at its first startup. | ||
|
||
# Backup with Export | ||
|
||
[**to me implemented in 0.5.0**] Lambda style Engines, which store all Events, usually support `harnctl export ...` This command will create files with a single JSON Event per line in the same format as the [Mirror](mirroring.md) function. To backup an Engine Instance use the export and store somewhere safe. These files can be re-imported for re-calculation of the input DB and, after training, the model. | ||
|
||
Engines that follow the Kappa style do not save input but rather update the model with every new input Event. So use [Mirroring](mirroring.md) to log each new Event. In a sense this is an automatic backup that can also be used to re-instantiate a Kappa style model. |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Input Mirroring | ||
|
||
Harness will mirror (log) all raw events with no validation, when configured to do so for a specific Engine Instance. For online learning like Kappa style Engines this is the only way to backup data since Events are not stored. For Lambda style learning, like the Universal Recommender, you mak choose to mirror or export data periodically. | ||
|
||
Mirroring is useful if you wanted to be able to backup/restore all data or are experimenting with changes in engine parameters and wish to recreate the models using past mirrored data. | ||
|
||
To accomplish this, you must set up mirroring for the Engine Instance. Once the Engine Instance is launched with a mirrored configuration all events sent to `POST /engines/<engine-id>/events` will be mirrored to a location set in `some-engine.json`. **Note** Events will be mirrored until the config setting is changed and so can grow without limit, like unrotated server logs. | ||
|
||
To enable mirroring add the following to the `some-engine.json` for the Engine Instance you want to mirror: | ||
|
||
"mirrorType": "localfs" | "hdfs", // optional, turn on a type of mirroring | ||
"mirrorContainer": "path/to/mirror", // optional, where to mirror input | ||
|
||
Mirroring is similar to logging. Each new Event is logged to a file before any validation. The format is JSON one event per line. This can be used to backup an Engine Instance or to move data to a new instance. | ||
|
File renamed without changes.
File renamed without changes.
Oops, something went wrong.