Update docs

daizutabi · Jun 1, 2020 · b49e39d · b49e39d
1 parent 95061b7
commit b49e39d
Show file tree

Hide file tree

Showing 22 changed files with 312 additions and 204 deletions.
diff --git a/docs/api/ivory.callbacks.md b/docs/api/ivory.callbacks.md
@@ -0,0 +1,11 @@
+# Callbacks
+
+## ivory.callbacks.results
+
+![mkapi](ivory.callbacks.results.Results)
+
+![mkapi](ivory.callbacks.results.concatenate)
+
+## ivory.callbacks.early_stopping
+
+![mkapi](ivory.callbacks.early_stopping.EarlyStopping)
diff --git a/docs/index.md b/docs/index.md
@@ -11,3 +11,45 @@ Ivory is library-agnostic. You can use it with any machine learning library.
 Get started using the Quickstart.
 
 - [Quickstart](quickstart)
+
+{{ ## cache:clear }}
+
+Or take a look at the code below.
+
+```python
+import numpy as np
+
+from ivory.callbacks.results import Results
+from ivory.core.data import Data, Dataset, Datasets
+from ivory.core.run import Run
+from ivory.sklearn.estimator import Estimator
+from ivory.sklearn.metrics import Metrics
+
+data = Data()
+data.index = np.arange(30)
+data.input = np.arange(60).reshape(30, -1)
+data.target = np.sum(data.input, axis=1)
+data.fold = data.index % 4
+datasets = Datasets(data, Dataset, fold=0)
+
+estimator = Estimator(
+    model='sklearn.ensemble.RandomForestRegressor',
+    n_estimators=10,
+    max_depth=5,
+)
+
+run = Run(
+    name='first example',
+    datasets=datasets,
+    estimator=estimator,
+    results=Results(),
+    metrics=Metrics()
+)
+run.start()
+```
+
+```python
+import matplotlib.pyplot as plt
+
+plt.scatter(run.results.val.target, run.results.val.output)
+```
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -10,9 +10,9 @@ Install Ivory using `pip`.
 $ pip install ivory
 ~~~
 
-## Using an Ivory Client
+## Ivory Client
 
-Ivory has the `Client` class that manages the workflow of machine learning. Let's create your first `Client` instance. In this quickstart, we are working with examples under the `examples` directory.
+Ivory has the `Client` class that manages the workflow of machine learning. Let's create your first `Client` instance. In this quickstart, we are working with examples under the `examples` directory. Pass `examples` to the first argument of `ivory.create_client()`:
 
 ```python hide
 import os
@@ -29,42 +29,42 @@ client = ivory.create_client("examples")
 client
 ```
 
-The representation of the `client` shows that it has two objects. These objects can be accessed by *index notation* or *dot notation*.
+The representation of the `client` shows that it has two instances. These instances can be accessed by *index notation* or *dot notation*.
 
 ```python
 client[0]  # or client['tracker'], or client.tracker
 ```
 
-The first object is a `Tracker` instance which connects Ivory to [MLFlow Tracking](https://mlflow.org/docs/latest/tracking.html).
+The first instance is a `Tracker` instance that connects Ivory to [MLFlow Tracking](https://mlflow.org/docs/latest/tracking.html).
 
-Because a `Client` instance is an iterable, you can get all of the objects by applying `list()` to it.
+Because a `Client` instance is an iterable, you can get all of the instances by applying `list()` to it.
 
 ```python
 list(client)
 ```
 
-The second objects is named `tuner`.
+The second instance is named `tuner`.
 
 ```python
 client.tuner
 ```
 
 A `Tuner` instance connects Ivory to [Optuna: A hyperparameter optimization framework](https://preferred.jp/en/projects/optuna/).
 
-We can customize these objects with a YAML file named `client.yml` under the woking directory.  In our case, the file just contains the minimum settings.
+We can customize these objects with a YAML file named `client.yml` under the working directory.  In our case, the file just contains the minimum settings.
 
 #File client.yml {%=/examples/client.yml%}
 
 !!! note
-    A YAML file for client is not required. If there is no file for client, Ivory creates a default client with a tracker and without a tuner.
+    If you don't need any customization, the YAML file for client is not required. If there is no file for client, Ivory creates a default client with a tracker and tuner. (So, the above file is unnecessary.)
 
-    If you don't need a tracker, for example in debugging, use `ivory.create_client(tracker=False)`.
+    If you don't need a tracker and/or tuner, for example in debugging, use `ivory.create_client(tracker=False, tuner=False)`.
 
 ## Create NumPy data
 
-In this quickstart, we try to predict rectangles area from thier width and height using [PyTorch](https://pytorch.org/). First, prepare the data as [NumPy](https://numpy.org/) arrays. In `rectangle/data.py` under the working directory, a `create_data()` function is defined. The `ivory.create_client()` function automatically inserts the working directory to `sys.path`, so that we can import the module regardless of the current directory.
+In this quickstart, we try to predict rectangles area from their width and height using [PyTorch](https://pytorch.org/). First, prepare the data as [NumPy](https://numpy.org/) arrays. In `rectangle/data.py` under the working directory, a `create_data()` is defined. The `ivory.create_client()` automatically inserts the working directory to `sys.path`, so that we can import the module regardless of the current directory.
 
-Let's check the `create_data()` function defined in `rectangle/data.py` and an example output:
+Let's check the `create_data()` defined in `rectangle/data.py` and an example output:
 
 ```python hide
 import rectangle.data
@@ -83,11 +83,7 @@ xy
 z
 ```
 
-## Set of Data classes
-
-Ivory defines a set of Data classes (`Data`, `Dataset`, `Datasets`). But now, we use the `Data` class only.
-
-In the above file, the `kfold_split()` function creates a fold array.
+`ivory.utils.fold.kfold_split()` creates a fold array.
 
 ```python
 import numpy as np
@@ -96,7 +92,11 @@ from ivory.utils.fold import kfold_split
 kfold_split(np.arange(10), n_splits=3)
 ```
 
-Now, we can get a `Data` instance.
+## Set of Data Classes
+
+Ivory defines a set of base classes for data (`Data`, `Dataset`, `Datasets`, and `DataLoaders`) that user's custom classes can inherit. But now, we use the `Data` only.
+
+Now, we can get a `rectangle.data.Data` instance.
 
 ```python
 data = rectangle.data.Data()
@@ -107,11 +107,11 @@ data
 data.get(0)  # get data of index = 0.
 ```
 
-This returned value is a tuple of (index, input, target). Ivory always keeps data index so that we can know where a sample comes from.
+The returned value is a tuple of (index, input, target). Ivory always keeps data index so that we can know where a sample comes from.
 
 ## Define a model
 
-We use a simple MLP model here.
+We use a simple MLP model. Note that the number of hidden layers and the size of each hidden layer are customizable.
 
 ```python hide
 import rectangle.torch
@@ -125,15 +125,15 @@ Ivory configures a run using a YAML file. Here is a full example.
 
 #File torch.yaml {%=/examples/torch.yml%}
 
-Let's create a run by `Client.create_run()`
+Let's create a run calling the `Client.create_run()`.
 
 ```python
 run = client.create_run('torch')
 run
 ```
 
 !!! note
-    `Client.create_run(<name>)` creates an experiment named `<name>` if it hasn't existed yet. By cliking an icon (<i class="far fa-eye-slash" style="font-size:0.8rem; color: #ff8888;"></i>) in the above cell, you can see the log.
+    `Client.create_run(<name>)` creates an experiment named `<name>` if it hasn't existed yet. By clicking an icon (<i class="far fa-eye-slash" style="font-size:0.8rem; color: #ff8888;"></i>) in the above cell, you can see the log.
 
     Or you can directly create an experiment then make the experiment create a run:
 
@@ -142,21 +142,22 @@ run
     run = experiment.create_run()
     ~~~
 
-A `Run` instance have a `params` attribute that holds the parameters for the run.
+A `Run` instance have an attribute `params` that holds the parameters for the run.
 
 ```python
 import yaml
 
 print(yaml.dump(run.params, sort_keys=False))
 ```
 
-This is similar to the YAML file we read before, but is slightly changed by the Ivory Client.
+This is similar to the YAML file we read before, but has been slightly changed.
 
-* Run and experiment sections are inserted.
-* ExperimentID and RunID are assigned by MLFlow Tracking.
-* Default classes are specified, for example `ivory.torch.trainer.Trainer` for a `trainer` instance.
+* Run and experiment keys are inserted.
+* Run name is assigned by Ivory Client.
+* Experiment ID and Run ID are assigned by MLFlow Tracking.
+* Default classes are specified, for example the `ivory.torch.trainer.Trainer` class for a `trainer` instance.
 
-The `Client.create_run()` method can take keyword arguments to modify these parameters:
+The `Client.create_run()` can take keyword arguments to modify these parameters:
 
 ```python
 run = client.create_run(
@@ -204,7 +205,7 @@ run.results.val.target[:5]
 
 ## Test a model
 
-Testing a model is as simple as training. Just call `run.start('test')` instead of a (default) `'train'` argument.
+Testing a model is as simple as training. Just call `Run.start('test')` instead of a (default) `'train'` argument.
 
 ```python
 run.start('test')
@@ -225,23 +226,23 @@ run.results.test.target[:5]
 
 ## Task for multiple runs
 
-Ivory implements a special run type called **Task** which controls multiple nested runs. A task is useful for parameter search or cross validation.
+Ivory implements a special run type called **Task** that controls multiple nested runs. A task is useful for parameter search or cross validation.
 
 ```python
 task = client.create_task('torch')
 task
 ```
 
-The `Task` class has two methods to generate multiple runs: `Task.prodcut()` and `Task.chain()`. These two methods have the same functionality as [`itertools`](https://docs.python.org/3/library/itertools.html) of Python starndard library. Let's try to perform cross validation.
+The `Task` class has two functions to generate multiple runs: `Task.prodcut()` and `Task.chain()`. These two functions have the same functionality as [`itertools`](https://docs.python.org/3/library/itertools.html) of Python starndard library. Let's try to perform cross validation.
 
 ```python
 runs = task.product(fold=range(4), verbose=0, epochs=3)
 runs
 ```
 
-Like `itertools`'s functions, `Task.prodcut()` and `Task.chain()` return a generator, which yields runs that are configured by different parameters you specify. In this case, this generator will yield 4 runs with a fold number ranging from 0 to 3 for each. A `task` instance doesn't start any training by itself. In addtion, you can pass fixed parameters to update the original parameters in the YAML file.
+Like `itertools`'s functions, `Task.prodcut()` and `Task.chain()` return a generator, which yields runs that are configured by different parameters you specify. In this case, this generator will yield 4 runs with a fold number ranging from 0 to 3 for each. A `task` instance doesn't start any training by itself. In addition, you can pass fixed parameters to update the original parameters in the YAML file.
 
-Then start 4 runs by a `for` loop including `run.start('both')`. Here `'both'` means execution of test after training.
+Then start 4 runs by a `for` loop including `run.start('both')`. Here `'both'` means successive test after training.
 
 ```python
 for run in runs:
@@ -250,7 +251,7 @@ for run in runs:
 
 ## Collect runs
 
-Our client has a `Tracker` instance. It stores the state of runs in background using MLFlow Tracking. The `Client` class provides several methods to access the stored runs. For example, `Client.search_run_ids()` returns a generator which yields RunID created by MLFlow Tracking.
+Our client has a `Tracker` instance. It stores the state of runs in background using MLFlow Tracking. The `Client` provides several functions to access the stored runs. For example, `Client.search_run_ids()` returns a generator that yields Run ID assigned by MLFlow Tracking.
 
 ```python
 # A helper function.
@@ -273,12 +274,12 @@ print_run_info(run_ids)
 ```
 
 ```python
-# If `parent_run_id` is specified, nested runs having the parent are returned.
+# If `parent_run_id` is specified, nested runs with the parent are returned.
 run_ids = client.search_run_ids('torch', parent_run_id=task.id)
 print_run_info(run_ids)
 ```
 
-`Client.get_run_id()` and `Client.get_run_ids()` fetch RunID from run name, more strictly, (run class name in lower case) plus (run number).
+`Client.get_run_id()` and `Client.get_run_ids()` fetch Run ID from run name, more strictly, a key-value pair of (run class name in lower case, run number).
 
 ```python
 run_ids = [client.get_run_id('torch', run=0),
@@ -293,10 +294,10 @@ print_run_info(run_ids)
 
 ## Load runs and results
 
-The `Client` instance can load runs. First select RunID(s) to load. We want to perform cross validation here, so that we need a run collection created by the `task#0`. In this case, we can use `Client.get_nested_run_ids()`. Why don't we use `Client.search_run_ids()` as we did above? Because we don't have an easy way to get a very long RunID after we restart a Python session and lose the `Task` instance. On the ohter hand, a run name is easy to manage and write.
+A `Client` instance can load runs. First select Run ID(s) to load. We want to perform cross validation here, so that we need a run collection created by the `task#0`. In this case, we can use `Client.get_nested_run_ids()`. Why don't we use `Client.search_run_ids()` as we did above? Because we don't have an easy way to get a very long Run ID after we restart a Python session and lose the `Task` instance. On the other hand, a run name is easy to manage and write.
 
 ```python
-# Assume that we restart a session so we have no run instances now.
+# Assume that we restarted a session so we have no run instances now.
 run_ids = list(client.get_nested_run_ids('torch', task=0))
 print_run_info(run_ids)
 ```
@@ -308,7 +309,7 @@ run = client.load_run(run_ids[0])
 run
 ```
 
-Note that the `Client.load_run()` function doesn't require an experiment name because RunID is [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier).
+Note that the `Client.load_run()` doesn't require an experiment name because Run ID is [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier).
 
 As you expected, the fold number is 3.
 
@@ -334,50 +335,50 @@ print('[target]')
 print(target)
 ```
 
-If you don't need a whole run instance, the `Client.load_instance()` function is a better choice to save time and memory.
+If you don't need a whole run instance, `Client.load_instance()` is a better choice to save time and memory.
 
 ```python
 results = client.load_instance(run_ids[0], 'results')
 results
 ```
 
 ```python
-for mode in results:  # Yield a mode.
-    print(mode, results[mode].output.shape)
+for mode, result in results.items():
+    print(mode, result.output.shape)
 ```
 
-For cross validation, we need 4 runs. In order to load multiple run's results at the same time, the Ivory `Client` provides a convenient method.
+For cross validation, we need 4 runs. In order to load multiple run's results at the same time, the Ivory `Client` provides a convenient function.
 
 ```python
 results = client.load_results(run_ids, verbose=False)  # No progress bar.
 results
 ```
 
 ```python
-for mode, result in results.items():  # Yield a (mode, result).
+for mode, result in results.items():
     print(mode, result.output.shape)
 ```
 
 !!! note
     `Client.load_results()` drops train data for saving memory.
 
-The lengths of validation data and test data are both 800 (200 times 4). But be careful about the test data. The length of unique samples is 200 (one fold size).
+The lengths of the validation and test data are both 800 (200 times 4). But be careful about the test data. The length of unique samples should be 200 (one fold size).
 
 ```python
 import numpy as np
 
 len(np.unique(results.val.index)), len(np.unique(results.test.index))
 ```
 
-Usually, duplicated samples in test data are averaged for ensembling. The `Results.mean()` function performs this *mean reduction* and returns a newly created `Rusults` instance.
+Usually, duplicated samples in test data are averaged for ensembling. `Results.mean()` performs this *mean reduction* and returns a newly created `Rusults` instance.
 
 ```python
 reduced_results = results.mean()
 for mode, result in reduced_results.items():
     print(mode, result.output.shape)
 ```
 
-Compare these results.
+Compare these two results.
 
 ```python
 index = results.test.index
@@ -393,7 +394,7 @@ print('[reduced_results]')
 print(x)
 ```
 
-For convenience, The `Client.load_results()` function has a `reduction` keyword argument.
+For convenience, The `Client.load_results()` has a `reduction` keyword argument.
 
 ```python
 results = client.load_results(run_ids, reduction='mean', verbose=False)
@@ -405,15 +406,15 @@ for mode, result in results.items():
     print(mode, result.output.shape)
 ```
 
-A cross validation (CV) score can be calculated as follows:
+The cross validation (CV) score can be calculated as follows:
 
 ```python
 true = results.val.target
 pred = results.val.output
 np.mean(np.sqrt((true - pred) ** 2))  # Use any function for your metric.
 ```
 
-And we got a prediction for the test data using 4 MLP models.
+And we got prediction for the test data using 4 MLP models.
 
 ```python
 results.test.output[:5]