<p align="center">
  <img src="https://github.com/vpj/lab/raw/832758308905ee20ba9841fa80c47c77d7e58fda/images/logo.png?raw=true" width="100" title="Logo">
</p>

# [Lab 3.0](https://github.com/vpj/lab)

This library helps you organize and track
 machine learning experiments.
 
[🎛 Dashboard](https://github.com/vpj/lab_dashboard) is the web 
 interface for Lab.


## Features

Main features of Lab are:
* Organizing experiments
* [🎛 Dashboard](https://github.com/vpj/lab_dashboard) to browse experiments
* Logger
* Managing configurations and hyper-parameters

### Organizing Experiments

Lab keeps track of all the model training statistics.
It keeps them in a SQLite database and also pushes them to Tensorboard.
It also organizes the checkpoints and any other artifacts you wish to save.
All of these could be accessed with the Python API and also stored
 in a human friendly folder structure.
This could be  of your training pro
Maintains logs, summaries and checkpoints of all the experiment runs 
 in a folder structure.

### [🎛 Dashboard](https://github.com/vpj/lab_dashboard) to browse experiments
<p align="center">
  <img style="max-width:100%;"
   src="https://raw.githubusercontent.com/vpj/lab/master/images/dashboard.png"
   width="1024" title="Dashboard Screenshot">
</p>

The web dashboard helps navigate experiments and multiple runs.
You can checkout the configs and a summary of performance.
You can launch TensorBoard directly from there.

*Eventually, we want to let you edit configs and run new experiments and analyse
outputs on the dashboard.*

### Logger

Logger has a simple API to produce pretty console outputs.

It also comes with a bunch of helper functions that manages
 iterators and loops.
 
<p align="center">
 <img style="max-width:100%" 
   src="https://raw.githubusercontent.com/vpj/lab/master/images/loop.gif"
  />
</p>

### Manage configurations and hyper-parameters

You can setup configs/hyper-parameters with functions.
[🧪lab](https://github.com/vpj/lab) would identify the dependencies and run 
them in topological order.

```python
@Configs.calc()
def model(c: Configs):
    return Net().to(c.device)
```

You can setup multiple options for configuration functions. 
So you don't have to write a bunch if statements to handle configs.

```python
@Configs.calc(Configs.optimizer)
def sgd(c: Configs):
    return optim.SGD(c.model.parameters(), lr=c.learning_rate, momentum=c.momentum)

@Configs.calc(Configs.optimizer)
def adam(c: Configs):
    return optim.Adam(c.model.parameters())
```

## Getting Started

### Clone and install

```bash
git clone git@github.com:vpj/lab.git
cd lab
pip install -e .
```

To update run a git update

```bash
cd lab
git pull
```

<!-- ### Install it via `pip` directly from github.

```bash
pip install -e git+git@github.com:vpj/lab.git#egg=lab
``` -->

### Create a `.lab.yaml` file.
An empty file at the root of the project should
be enough. You can set project level configs for
 'check_repo_dirty' and 'path'
in the config file.

Lab will store all experiment data in folder `logs/` 
relative to `.lab.yaml` file.
If `path` is set in `.lab.yaml` then it will be stored in `[path]logs/`
 relative to `.lab.yaml` file.

You don't need the `.lab.yaml` file if you only plan on using the logger.

### [Samples](https://github.com/vpj/lab/tree/master/samples)

[Samples folder](https://github.com/vpj/lab/tree/master/samples) contains a
 bunch of examples of using 🧪 lab.

Here are some [annotated samples](http://blog.varunajayasiri.com/ml/lab3/#samples%2Fmnist_loop.py).

## Tutorial

*The outputs lose color when viewing on github. Run [readme.ipynb](https://github.com/vpj/lab/blob/master/readme.ipynb) locally to try it out.*

This short tutorial covers most of the usage patterns. We still don't have a proper documentation, but the source code of the project is quite clean and I assume you can dive into it if you need more details.

### Logger

In [2]:
from lab import logger
from lab.logger.colors import Text, Color

#### Logging with colors

In [6]:
logger.log("Colors are missing when views on github", Text.highlight)

You can use predifined styles

In [8]:
logger.log([
    ('Styles ', Text.heading),
    ('Danger ', Text.danger),
    ('Warning ', Text.warning),
    ('Meta ', Text.meta),
    ('Key ', Text.key),
    ('Meta2 ', Text.meta2),
    ('Title ', Text.title),
    ('Heading ', Text.heading),
    ('Value ', Text.value),
    ('Highlight ', Text.highlight),
    ('Subtle', Text.subtle)
])

Or, specify colors

In [9]:
logger.log([
    ('Colors ', Text.heading),
    ('Red ', Color.red),
    ('Black ', Color.black),
    ('Blue ', Color.blue),
    ('Cyan ', Color.cyan),
    ('Green ', Color.green),
    ('Orange ', Color.orange),
    ('Purple Heading ', [Color.purple, Text.heading]),
    ('White', Color.white),
])

##### Logging debug info

You can pretty print python objects

In [12]:
logger.info(a=2, b=1)

In [13]:
logger.info(dict(name='Name', price=22))

### Sections

Sections let you monitor time taken for
different tasks and also helps *keep the code clean*
by separating different blocks of code.

In [14]:
import time

In [15]:
with logger.section("Load data"):
    # code to load data
    time.sleep(2)

In [16]:
with logger.section("Load saved model"):
    time.sleep(1)
    logger.set_successful(False)

You can also show progress while a section is running

In [17]:
with logger.section("Train", total_steps=100):
    for i in range(100):
        time.sleep(0.1)
        # Multiple training steps in the inner loop
        logger.progress(i)

### Iterator and Enumerator

You can use `logger.iterate` and `logger.enumerate` with any iterable object.
In this example we use a PyTorch `DataLoader`.

In [18]:
# Create a data loader for illustration
import torch
from torchvision import datasets, transforms

test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./data',
                       train=False,
                       download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=32, shuffle=True)

In [19]:
for data, target in logger.iterate("Test", test_loader):
    time.sleep(0.01)

In [20]:
for i, (data, target) in logger.enum("Test", test_loader):
    time.sleep(0.01)

### Loop

This can be used for the training loop.
The `loop` keeps track of the time taken and time remaining for the loop.
You can use *sections*, *iterators* and *enumerators* within loop.

`logger.write` outputs the current status along with global step.

In [21]:
for step in logger.loop(range(0, 400)):
	logger.write()

The global step is used for logging to the screen, TensorBoard and when logging analytics to SQLite. You can manually set the global step. Here we will reset it.

In [22]:
logger.set_global_step(0)

You can manually increment global step too.

In [23]:
for step in logger.loop(range(0, 400)):
    logger.add_global_step(5)
    logger.write()

### Log indicators

Here you specify indicators and the logger stores them temporarily and write in batches.
It can aggregate and write them as means or histograms.

In [26]:
import numpy as np

# dummy train function
def train():
    return np.random.randint(100)

# Reset global step because we incremented in previous loop
logger.set_global_step(0)

This stores all the loss values and writes the logs the mean on every tenth iteration.
Console output line is replaced until `new_line` is called.

In [27]:
for i in range(1, 401):
    logger.add_global_step()
    loss = train()
    logger.store(loss=loss)
    if i % 10 == 0:
        logger.write()
    if i % 100 == 0:
        logger.new_line()
    time.sleep(0.02)

#### Indicator settings

In [28]:
from lab.logger.indicators import Queue, Scalar, Histogram

# dummy train function
def train2(idx):
    return idx, 10, np.random.randint(100)

# Reset global step because we incremented in previous loop
logger.set_global_step(0)

`Histogram` indicators will log a histogram of data.
`Queue` will store data in a `deque` of size `queue_size`, and log histograms.
Both of these will log the means too. And if `is_print` is `True` it will print the mean.

queue size of `10` and the values are printed to the console

In [29]:
logger.add_indicator(Queue('reward', 10, True))

By default values are not printed to console; i.e. `is_print` defaults to `False`.

In [30]:
logger.add_indicator(Scalar('policy'))

Settings `is_print` to `True` will print the mean value of histogram to console

In [31]:
logger.add_indicator(Histogram('value', True))

In [32]:
for i in range(1, 400):
    logger.add_global_step()
    reward, policy, value = train2(i)
    logger.store(reward=reward, policy=policy, value=value, loss=1.)
    if i % 10 == 0:
        logger.write()
    if i % 100 == 0:
        logger.new_line()

### Experiment

Lab will keep track of experiments if you declare an Experiment. It will keep track of logs, code diffs, git commits, etc.

In [33]:
from lab.experiment.pytorch import Experiment

The `name` of the defaults to the calling python filename. However when invoking from a Jupyter Notebook it must be provided because the library cannot find the calling file name. `comment` can be changed later from the [🎛 Dashboard](https://github.com/vpj/lab_dashboard).

In [34]:
exp = Experiment(name="mnist_pytorch",
                 comment="Test")

Starting an experiments creates folders, stores the experiment meta data, git commits, and source diffs.

In [35]:
exp.start()

You can also start from a previously saved checkpoint. A `run_uuid` of `` means that it will load from the last run.

```python
exp.start(run_uuid='')
```

### Configs

In [36]:
from lab import configs

The configs will be stored and in future be adjusted from  [🎛 Dashboard](https://github.com/vpj/lab_dashboard)

In [37]:
class DeviceConfigs(configs.Configs):
    use_cuda: bool = True
    cuda_device: int = 0

    device: any

Some configs can be calculated

In [38]:
import torch

In [39]:
@DeviceConfigs.calc(DeviceConfigs.device)
def cuda(c: DeviceConfigs):
    is_cuda = c.use_cuda and torch.cuda.is_available()
    if not is_cuda:
        return torch.device("cpu")
    else:
        if c.cuda_device < torch.cuda.device_count():
            return torch.device(f"cuda:{c.cuda_device}")
        else:
            logger.log(f"Cuda device index {c.cuda_device} higher than "
                       f"device count {torch.cuda.device_count()}", Text.warning)
            return torch.device(f"cuda:{torch.cuda.device_count() - 1}")

Configs classes can be inherited. This can be used to separate configs into module and it is quite neat when you want to inherit entire experiment setups and make a few modifications. 

In [41]:
class Configs(DeviceConfigs):
    model_size: int = 10
        
    model: any = 'cnn_model'

You can specify multiple config calculator functions. The function given by the string for respective attribute will be picked.

In [42]:
@Configs.calc(Configs.model)
def cnn_model(c: Configs):
    return c.model_size * 10

@Configs.calc(Configs.model)
def lstm_model(c: Configs):
    return c.model_size * 2

The experiment will calculate the configs.

In [43]:
conf = Configs()
conf.model = 'lstm_model'
experiment = Experiment(name='test_configs')
experiment.calc_configs(conf)
logger.info(model=conf.model)