Skip to content

Kaixhin/FGMachine

Repository files navigation

Build Status Dependency Status devDependency Status MIT License Docker Pulls Docker Stars Gitter

FGMachine

FGLab is a machine learning dashboard, designed to make prototyping experiments easier. Experiment details and results are sent to a database, which allows analytics to be performed after their completion. The server is FGLab, and the clients are FGMachines.

Contents

Installation

FGMachine tries to follow the SemVer standard whenever possible. Releases can be found here.

Option 1: Local

  1. Install Node.js from the website or your package manager.
  2. Either clone this repository or download and extract a zip/tar.
  3. Move inside the FGMachine folder.
  4. Run npm install.
  5. FGMachine requires a .env file in this directory. For most installations, it should be possible to copy example.env to .env, but it may require customisation for non-standard FGLab or FGMachine ports. An alternative is to set the following environment variables:
  • FGLAB_URL (FGLab URL, including port if necessary)
  • FGMACHINE_URL (FGMachine URL, including port)

Run node machine (or npm start) to start FGMachine. On the first run it will create specs.json and register itself with FGLab. Please read the overview to understand how FGMachine can interface with your machine learning code.

Note: If you use a virtual environment, e.g. virtualenv, activate the environment before running node machine.

Note: If you delete your machine in FGLab, delete specs.json before running FGMachine again to re-register.

To update, run npm run update.

Option 2: Docker

Start a FGLab container and link it to the FGMachine container:

sudo docker run -d --name fgmachine -h $(hostname) -v /var/run/docker.sock:/var/run/docker.sock -e FGLAB_URL=<FGLab URL> -e FGMACHINE_URL=<FGMachine URL> -p 5081:5081 kaixhin/fgmachine

The FGLab URL will be the address of the host running FGLab, including the protocol ("http://") and port (:5080) - note that localhost will not work but the local network IP/hostname should. The FGMachine URL will be the address of the current host (as accessible by FGLab), including the protocol ("http://") and port (:5081). Docker's socket is passed to allow FGMachine to launch Docker containers itself. Note that as these are sibling containers, volume mounts (-v) are relative to the host, not the FGMachine container.

To launch NVIDIA Docker containers, use the following:

sudo docker run -d --name fgmachine -h $(hostname) -v /var/run/docker.sock:/var/run/docker.sock --net=host `curl -s localhost:3476/docker/cli` -e FGLAB_URL=<FGLab URL> -e FGMACHINE_URL=<FGMachine URL> -p 5081:5081 kaixhin/fgmachine

Note that --net=host is passed to allow access to the NVIDIA Docker API. When launching a sibling container, you will need to run `curl -s localhost:3476/docker/cli` and manually add the arguments to the project implementation in the container, with docker as the command (do not use nvidia-docker).

Overview

Projects

After a project has been created on FGLab, a corresponding project implementation must be specified in projects.json. If this machine is available to run experiments for the project created on FGLab, then add the following field to projects.json (an example is available at example.projects.json). FGLab has an "Add to Machine" button which can automatically set up a template in projects.json for you (creating projects.json if it doesn't exist already). Note that <project_id> links the created project on FGLab and FGMachine's project implemetations in projects.json.

"<project_id>": {
  "cwd": "<working directory (e.g. .)>",
  "command": "<program (e.g. caffe)>",
  "args": "<first command line options (e.g. train)>",
  "options": "<command line options style for options (e.g. double-dash)>",
  "boolean": "<optional: only pass flag if true, mandatory: pass flag and true/false argument>",
  "capacity": "<machine capacity needed (as a fraction) (e.g. 0.5)>",
  "results": "<absolute path to results directory (without experiment ID) (e.g. results)>"
}

cwd is the working directory for the machine learning code. cwd can either be an absolute path, or a relative path, in which case it it relative to the FGMachine directory. command is the program/executable to be run. args is the first set of command line options to be sent to the program, prior to the experiment options. options processes the options in 4 different ways. For option settings: {seed: 123, model: "cnn.v2", L2: true}, exemplar methods would be as such (with boolean as "mandatory"):

options Program Command Line [command] [args] [options]
plain node node [args] seed 123 model cnn.v2 L2 true
single-dash th th [args] -seed 123 -model cnn.v2 -L2 true
double-dash caffe caffe [args] --seed=123 --model=cnn.v2 --L2=true
function matlab matlab [args w/o final arg] [final arg]('seed',123,'model','cnn.v2','L2',true)

boolean can be set to "optional" when boolean flags should be passed only when true, e.g., -L2, or set to "mandatory" if the value should always be passed, e.g., -L2 true and -L2 false. capacity is a number between in the range 0-1 (inclusive) that represents (the inverse of) the amount of instances of the program the FGMachine host system can run in parallel (as a heuristic); for example a capacity of 0.5 indicates that the host is only capable of running 2 instances of the program at once. results is the directory in which the experiment results must be written into (see below for more details). results can either be an absolute path, or a relative path, in which case it it relative to the FGMachine directory.

If you receive a "No machine capacity available" error message when submitting a new experiment, which can occur erroneously (for example, if experiments crash), then you can reset a machine's capacity on the machine's page in FGLab.

FGMachine automatically reloads the projects.json file when it is changed.

GPU capacity support

In order to handle projects, which require GPUs to perform a task, you need to add two parameters for each project in projects.json file:

{
  "gpu_capacity": "<gpu capacity needed (as a fraction of one GPU capacity, e.g. 0.5)>",
  "gpu_command": "<option to pass to script to identify card number, including command line option style (e.g. -gpu)>",
}

Note that gpu_capacity represents (the inverse of) instances of the program the FGMachine host system can run on one GPU; for example a machine with 4 GPUs will be able to run 8 instances of the program with capacity 0.1 and gpu_capacity 0.5. However, if the capacity was 0.25 in the previous example, the machine would only be able to run 4 instances of the program.

gpu_capacity automatically assigns a GPU for experiments, which makes it easier to run batch experiments. Note that like nvidia-smi, GPU IDs passed via gpu-command are 0-indexed. For manual control, it is recommended to use a GPU flag as part of the experiment hyperparameters in the project schema.

Experiments

Results and custom data must be saved as files into a subfolder in the specified results directory, where the name of the subfolder is the experiment ID, e.g. /data/mnist/55e069f9cf4e1fe075b76b95. For an example that uses the following features, see rand.js.

Non-JSON files are uploaded to MongoDB GridFS via FGLab, which allows them to be downloaded later in their native format. Images and videos are automatically displayed on the experiment page, allowing plots to be created by the machine learning code. JSON files are automatically parsed, with fields being added to the experiment object. An example, notes.json, may look like this:

{
  "Framework": {
    "Name": "Theano",
    "Version Number": 0.7
  },
  "Notes": "Best parameters saved at epoch 55"
}

Multiple top-level fields can exist in the same file, but nested fields cannot be updated separately e.g. Framework.Name. Note that fields preceded with _ are reserved for processing by FGLab. Currently supported fields are listed below:

_scores

The _scores field is a map that can be used to store multiple floats that represent the performance of the model. For example:

{
  "_scores": {
    "F1": "float",
    "BLEU": "float",
    "METEOR": "float"
  }
}

_notes

The _notes field is a free-form text field. Its primary use is via the experiment page on FGLab, where text written in the "Notes" text box is automatically saved (at an interval of 0.5s), displaying on both the experiment page itself and the table of experiment results.

_charts

The _charts field is a either an object or array of objects that can be used to store data that will be charted on FGLab using C3.js, and hence mimics its API. Given that FGLab renders uploaded images, this is to allow the interactivity afforded by C3.js. This means that it is possible to create different chart types and adjust plotting options, with a minor change in the API so that numeric arrays can be directly exported. Rather than prepending arrays in the columns array with the column names, the columnNames array is used to perform this on FGLab.

Charts with lots of values are downsampled for performance reasons, using the Largest-Triangle-Three-Buckets algorithm for visualisation purposes. By default the following options are added to disable points and enable zoom, but these can be overriden:

{
  "point": {"show": false},
  "zoom": {"enabled": true}
}

An example Multiple XY Line Chart would be structured as such:

{
  "_charts": {
    "columnNames": [
      "train",
      "val",
      "x1",
      "x2"
    ],
    "data": {
      "xs": {
        "train": "x1",
        "val": "x2"
      },
      "columns": [
        [1.0, 0.8, 0.6, 0.4, 0.3, 0.2, 0.1, 0.1, 0.1, 0.1, 0.0],
        [1.0, 0.9, 0.6, 0.4, 0.3],
        [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
        [2, 6, 8, 9, 11]
      ]
    },
    "axis": {
      "x": {
        "label": {
          "text": "Iterations"
        }
      },
      "y": {
        "label": {
          "text": "Losses"
        }
      }
    }
  }
}

The usage of _charts has an inherent tradeoff between storing numerical results in a more intuitive place in the experiment object and easily visualising data. The recommendation is to use _charts for visualising data where desired (which may not be necessary if plots are generated by the machine learning code), and extract the data given the _charts' structure. However, it is still possible to duplicate the numerical results in a separate array under a custom field in a JSON file.

Examples

Examples utilising the range of abilities of FGLab/FGMachine can be found in the examples folder.