Merge pull request #138 from HDI-Project/documentation_update

Updated documentation on ModelHub
HDI-Project · May 9, 2019 · a283c19 · a283c19
2 parents 1889701 + 8bae295
commit a283c19
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 22 deletions.
diff --git a/docs/database.rst b/docs/database.rst
@@ -17,13 +17,14 @@ A Dataset represents a single set of data which can be used to train and test
 models by ATM. The table stores information about the location of the data as
 well as metadata to help with analysis.
 
-- ``dataset_id`` (Int): Unique identifier for the dataset.
+- ``id`` (Int): Unique identifier for the dataset.
 - ``name`` (String): Identifier string for a classification technique.
-- ``description`` (String): Human-readable description of the dataset.
-    - not described in the paper
+- ``class_column`` (String): Name of the class label column.
 - ``train_path`` (String): Location of the dataset train file.
 - ``test_path`` (String): Location of the dataset test file.
-- ``class_column`` (String): Name of the class label column.
+- ``description`` (String): Human-readable description of the dataset.
+
+  - not described in the paper
 
 The metadata fields below are not described in the paper.
 
@@ -41,15 +42,16 @@ A Datarun is a single logical job for ATM to complete. The Dataruns table
 contains a reference to a dataset, configuration for ATM and BTB, and
 state information.
 
-- ``datarun_id`` (Int): Unique identifier for the datarun.
+- ``id`` (Int): Unique identifier for the datarun.
 - ``dataset_id`` (Int): ID of the dataset associated with this datarun.
 - ``description`` (String): Human-readable description of the datarun.
-    - not in the paper
+  - not described in the paper
 
 BTB configuration:
 
 - ``selector`` (String): Selection technique for hyperpartitions.
     - called "hyperpartition_selection_scheme" in the paper
+
 - ``k_window`` (Int): The number of previous classifiers the selector will
   consider, for selection techniques that set a limit of the number of
   historical runs to use.
@@ -63,7 +65,7 @@ BTB configuration:
   numeric hyperparameter will be chosen from a set of ``gridding`` discrete,
   evenly-spaced values. If set to 0 or NULL, values will be chosen from the
   full, continuous space of possibilities.
-    - not in the paper
+    - not described in the paper
 
 ATM configuration:
 
@@ -79,29 +81,29 @@ ATM configuration:
 - ``deadline`` (DateTime): If provided, and if ``budget_type`` is set to
   "walltime", the datarun will run until this absolute time. This overrides the
   ``budget`` column.
-    - not in the paper
+    - not described in the paper
 - ``metric`` (String): The metric by which to score each classifier for
   comparison purposes. Can be one of ["accuracy", "cohen_kappa", "f1",
   "roc_auc", "ap", "mcc"] for binary problems, or ["accuracy", "rank_accuracy",
   "cohen_kappa", "f1_micro", "f1_macro", "roc_auc_micro", "roc_auc_macro"] for
   multiclass problems
-    - not in the paper
+    - not described in the paper
 - ``score_target`` (Enum): One of ["cv", "test", "mu_sigma"]. Determines how the
   final comparative metric (the *judgment metric*) is calculated.
     - "cv" (cross-validation): the judgment metric is the average of a 5-fold
       cross-validation test.
     - "test": the judgment metric is computed on the test data.
     - "mu_sigma": the judgment metric is the lower error bound on the mean CV
       score.
-  - not in the paper
+  - not described in the paper
 
 State information:
 
 - ``start_time`` (DateTime): Time the DataRun began.
 - ``end_time`` (DateTime): Time the DataRun was completed.
 - ``status`` (Enum): Indicates whether the run is pending, in progress, or has
   been finished. One of ["pending", "running", "complete"].
-    - not in the paper
+    - not described in the paper
 
 
 Hyperpartitions
@@ -113,38 +115,38 @@ instance must be associated with a single datarun; the performance of a
 hyperpartition in a previous datarun is assumed to have no bearing on its
 performance in the future.
 
-- ``hyperparition_id`` (Int): Unique identifier for the hyperparition.
+- ``id`` (Int): Unique identifier for the hyperparition.
 - ``datarun_id`` (Int): ID of the datarun associated with this hyperpartition.
 - ``method`` (String): Code for, or path to a JSON file describing, this
   hyperpartition's classification method (e.g. "svm", "knn").
-- ``categoricals`` (Base64-encoded object): List of categorical hyperparameters
+- ``categoricals_hyperparameters_64`` (Base64-encoded object): List of categorical hyperparameters
   whose values are fixed to define this hyperpartition.
     - called "partition_hyperparameter_values" in the paper
-- ``tunables`` (Base64-encoded object): List of continuous hyperparameters which
+- ``tunables_hyperparameters_64`` (Base64-encoded object): List of continuous hyperparameters which
   are free; their values must be selected by a Tuner.
     - called "conditional_hyperparameters" in the paper
-- ``constants`` (Base64-encoded object): List of categorical or continuous
+- ``constants_hyperparameters_64`` (Base64-encoded object): List of categorical or continuous
   parameters whose values are always fixed. These do not define the
   hyperpartition, but their values must be passed to the classification method
   to fully parameterize it.
-    - not in the paper
+    - not described in the paper
 - ``status`` (Enum): Indicates whether the hyperpartition has caused too many
   classifiers to error, or whether the grid for this partition has been fully
   explored. One of ["incomplete", "gridding_done", "errored"].
-    - not in the paper
+    - not described in the paper
 
 
 Classifiers
 -----------
 A Classifier represents a single train/test run using a method and a set of hyperparameters with a particular dataset.
 
-- ``classifier_id`` (Int): Unique identifier for the classifier.
+- ``id`` (Int): Unique identifier for the classifier.
 - ``datarun_id`` (Int): ID of the datarun associated with this classifier.
 - ``hyperpartition_id`` (Int): ID of the hyperpartition associated with this
   classifier.
 - ``host`` (String): IP address or name of the host machine where the classifier
   was tested.
-    - not in the paper
+    - not described in the paper
 - ``model_location`` (String): Path to the serialized model object for this
   classifier.
 - ``metrics_location`` (String): Path to the full set of metrics computed during
@@ -153,8 +155,9 @@ A Classifier represents a single train/test run using a method and a set of hype
   cross-validated training data.
 - ``cv_judgment_metric_stdev`` (Number): Standard deviation of the
   cross-validation test.
+  - not described in the paper
 - ``test_judgment_metric`` (Number): Judgment metric computed on the test data.
-- ``hyperparameters_values`` (Base64-encoded object): The full set of
+- ``hyperparameters_values_64`` (Base64-encoded object): The full set of
   hyperparameter values used to create this classifier.
 - ``start_time`` (DateTime): Time that a worker started working on the
   classifier.

diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -31,7 +31,7 @@ Create a datarun
 ----------------
 
 Before we can train any classifiers, we need to create a datarun. In ATM, a
-datarun is a single logical machine learning task. The ``enter_data.py`` script
+datarun is a single logical machine learning task. The ``enter_data`` command
 will set up everything you need.::
 
 (atm-env) $ atm enter_data
@@ -75,7 +75,7 @@ An ATM *worker* is a process that connects to a ModelHub, asks it what dataruns
 need to be worked on, and trains and tests classifiers until all the work is
 done. To run one, use the following command::
 
-(atm-env) $ atm worker.py
+(atm-env) $ atm worker
 
 This will start a process that builds classifiers, tests them, and saves them to
 the ./models/ directory. As it runs, it should print output indicating which