# Advanced Usage of Domain

[`Domain`](https://analysiscenter.github.io/batchflow/api/batchflow.research.html#batchflow.research.Domain) and auxiliary classes ([`Alias`](https://analysiscenter.github.io/batchflow/api/batchflow.research.html#batchflow.research.Alias), [`Option`](https://analysiscenter.github.io/batchflow/api/batchflow.research.html#batchflow.research.Option), [`ConfigAlias`](https://analysiscenter.github.io/batchflow/api/batchflow.research.html#batchflow.research.ConfigAlias)) are used to define combinations of parameters to try in `Research`.

We start with some useful imports.

In [1]:
import sys
import os
import shutil

import matplotlib
%matplotlib inline

In [2]:
sys.path.append('../../../..')

from batchflow import NumpySampler as NS
from batchflow.research import Alias, Option, Domain

## Basic usage

`Domain` is a class to define domain of parameters. In the simplest case, it is for parameter name and values that will be used in a `Research`. Values for the parameter can be defined as array or [`Sampler`](https://analysiscenter.github.io/batchflow/api/batchflow.sampler.html).

In [3]:
domain = Domain(p=['v1', 'v2'])

Each instance of `Domain` class has attribute `iterator` which produces configs from the domain.

In [4]:
list(domain.iterator)

[{'p': 'v1', 'repetition': '0', 'updates': '0'},
 {'p': 'v2', 'repetition': '0', 'updates': '0'}]

All configs from domain have several additional keys: `'repetition'` and `'updates'`. The first one defines the serial number of the repetition for the config (see `n_reps` below). The key `'updates'` defines the number of domain updates before getting of the current config (see `set_update` [documentation](https://analysiscenter.github.io/batchflow/api/batchflow.research.html#batchflow.research.Domain.set_update) and [tutorial 6](https://github.com/analysiscenter/batchflow/blob/research/examples/tutorials/research/06_update_domain.ipynb))

Each item generated by `Domain` is `ConfigAlias` instance: wrapper for `Config` with methods `config` and `alias`. That methods return wrapped `Config` and corresponding dict with `str` representations of values.
To set or reset `iterator` use `set_iter` method. It also accepts some parameters that will be described below.
If you get attribute `iterator` without `set_iter_params`, firstly it will be called with default parameters. 

In [5]:
domain.set_iter_params()
config = next(domain.iterator)

config.config(), config.alias()

(Config(
     {'p': 'v1', 'repetition': 0, 'updates': 0}
 ),
 Config(
     {'p': 'v1', 'repetition': '0', 'updates': '0'}
 ))

`Alias` is used to create `str` representation of each value of the domain because they will be used as folder names and to have more readable representation of configs with non-string values.
`Alias` is `__name__` attribute of the value or `str` representation. One can define custom alias by using `Alias` class.

In [6]:
domain = Domain(p=[Alias('v1', 'alias'), NS])

config = next(domain.iterator)
print('alias: {:14} value: {}'.format(config.alias()['p'], config.config()['p']))

config = next(domain.iterator)
print('alias: {:14} value: {}'.format(config.alias()['p'], config.config()['p']))

alias: alias          value: v1
alias: NumpySampler   value: <class 'batchflow.batchflow.sampler.NumpySampler'>


You can define the number of times to produce each item of the domain as `n_reps` parameter of `set_iter`. Each produced `ConfigAlias` will have `'repetition'` key. 

In [7]:
domain.set_iter_params(n_reps=2)

list(domain.iterator)

[{'p': 'alias', 'repetition': '0', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '0', 'updates': '0'},
 {'p': 'alias', 'repetition': '1', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '1', 'updates': '0'}]

Also you can define `n_iters` parameter to define the number of configs that we will get from `Domain`. By default it is equel to the actual number of unique elements.

In [8]:
domain.set_iter_params(n_items=3, n_reps=2)

list(domain.iterator)

[{'p': 'alias', 'repetition': '0', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '0', 'updates': '0'},
 {'p': 'alias', 'repetition': '0', 'updates': '0'},
 {'p': 'alias', 'repetition': '1', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '1', 'updates': '0'},
 {'p': 'alias', 'repetition': '1', 'updates': '0'}]

The period of the repetitions can be defined by `repeat_each` parameter. By default, for array-like parameter values all configs will be genereated one time and then repeated. In the case of samplers `repeat_each=100`.

In [9]:
domain.set_iter_params(n_items=3, n_reps=2, repeat_each=1)

list(domain.iterator)

[{'p': 'alias', 'repetition': '0', 'updates': '0'},
 {'p': 'alias', 'repetition': '1', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '0', 'updates': '0'},
 {'p': 'NumpySampler', 'repetition': '1', 'updates': '0'},
 {'p': 'alias', 'repetition': '0', 'updates': '0'},
 {'p': 'alias', 'repetition': '1', 'updates': '0'}]

## Operations

#### Multiplication
The resulting `Domain` will produce configs from Cartesian product of values. It means that we will get all possible combinations of parameter values. Here and below we will pop `'repetition'` and `'updates'` keys from configs to make cell output simpler except the cases while `n_reps != 1` (by setting `additional=False`).

In [10]:
domain = Option('p1', ['v1', 'v2']) * Option('p2', ['v3', 'v4'])

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': 'v1', 'p2': 'v3'},
 {'p1': 'v1', 'p2': 'v4'},
 {'p1': 'v2', 'p2': 'v3'},
 {'p1': 'v2', 'p2': 'v4'}]

#### Sum
Plus unites lists of values.

In [11]:
domain = Option('p1', ['v1', 'v2']) + Option('p2', ['v3', 'v4'])

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': 'v1'}, {'p1': 'v2'}, {'p2': 'v3'}, {'p2': 'v4'}]

#### `@` multiplication

Result is a scalar product of options.

In [12]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', ['v5', 'v6'])
domain = op1 @ op2 @ op3

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': 'v1', 'p2': 'v3', 'p3': 'v5'}, {'p1': 'v2', 'p2': 'v4', 'p3': 'v6'}]

You also can combine all operations because all of them can be applied to resulting domains.

In [13]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', list(range(2)))
op4 = Option('p4', list(range(3, 5)))

domain = (op1 @ op2 + op3) * op4

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': 'v1', 'p2': 'v3', 'p4': '3'},
 {'p1': 'v1', 'p2': 'v3', 'p4': '4'},
 {'p1': 'v2', 'p2': 'v4', 'p4': '3'},
 {'p1': 'v2', 'p2': 'v4', 'p4': '4'},
 {'p3': '0', 'p4': '3'},
 {'p3': '0', 'p4': '4'},
 {'p3': '1', 'p4': '3'},
 {'p3': '1', 'p4': '4'}]

`size` attribute will return the size of resulting domain 

In [14]:
print(domain.size)

8


Note that you will get the total number of produced confgs. For example, if you have one `Option` with two values and `n_iters=5` and `n_reps=2` in `set_iter` then the size will be 10.

In [15]:
domain = Domain(p1=list(range(3)))

domain.set_iter_params(n_items=5, n_reps=2)
domain.size

10

## Options with Samplers

Instead of array-like options you can use `Sampler` instances as `Option` value. Iterator will produce independent samples from domain.

In [16]:
domain = Domain(p1=NS('n'))
domain.set_iter_params(n_items=3, additional=False)

list(domain.iterator)

[{'p1': '0.9420136507539727'},
 {'p1': '-2.0150968708611265'},
 {'p1': '0.11545755161718185'}]

If `n_reps > 1` then samples will be repeated.

In [17]:
domain.set_iter_params(n_items=3, n_reps=2, additional=False)

list(domain.iterator)

[{'p1': '-1.521832240625408'},
 {'p1': '-0.016483529047014795'},
 {'p1': '0.330069017626998'},
 {'p1': '-1.521832240625408'},
 {'p1': '-0.016483529047014795'},
 {'p1': '0.330069017626998'}]

If `set_iter_params` will be called with `n_items=None` then resulting iterator will be infinite.

In [18]:
domain.set_iter_params(n_items=None, additional=False)

print('size: ', domain.size)

for _ in range(5):
    print(next(domain.iterator))

size:  None
{'p1': '-0.2899207280882731'}
{'p1': '0.2600709832908798'}
{'p1': '-2.661256482222706'}
{'p1': '-1.991252358812105'}
{'p1': '0.427651230662582'}


`repeat_each` parameter defines how often elements from infinite generator will be repeated (by default, `repeat_each=100`).

In [19]:
domain.set_iter_params(n_items=None, n_reps=2, repeat_each=2)

print('Domain size: {} \n'.format(domain.size))

for _ in range(8):
    print(next(domain.iterator))

Domain size: None 

{'p1': '2.5545455103045875', 'repetition': '0', 'updates': '0'}
{'p1': '-2.256729995007879', 'repetition': '0', 'updates': '0'}
{'p1': '2.5545455103045875', 'repetition': '1', 'updates': '0'}
{'p1': '-2.256729995007879', 'repetition': '1', 'updates': '0'}
{'p1': '-0.05577228813667809', 'repetition': '0', 'updates': '0'}
{'p1': '0.17618486668664568', 'repetition': '0', 'updates': '0'}
{'p1': '-0.05577228813667809', 'repetition': '1', 'updates': '0'}
{'p1': '0.17618486668664568', 'repetition': '1', 'updates': '0'}


If one multiply array-like options and sampler options, resulting iterator will produce combinations of array-like options with independent sampler from sampler options.

In [20]:
domain = Option('p1', NS('n')) * Option('p2', NS('u')) * Option('p3', [1, 2, 3])

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': '1.4235885092121556', 'p2': '0.5303242605477534', 'p3': '1'},
 {'p1': '-0.46753826985343144', 'p2': '0.8199500187216122', 'p3': '2'},
 {'p1': '-0.43048181752886727', 'p2': '0.6299870465691961', 'p3': '3'}]

## Domains with Weights

By default configs are consequently produced from option in a sum from the left to the right.

In [21]:
op1 = Option('p1', ['v1', 'v2'])
op2 = Option('p2', ['v3', 'v4'])
op3 = Option('p3', ['v5', 'v6'])
domain = op1 + op2 + op3

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p1': 'v1'},
 {'p1': 'v2'},
 {'p2': 'v3'},
 {'p2': 'v4'},
 {'p3': 'v5'},
 {'p3': 'v6'}]

To sample options from sum independently with some probabilities you can multiply corresponding options by float.

In [22]:
domain = 0.3 * op1 + 0.2 * op2 + 0.5 * op3

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p3': 'v5'},
 {'p3': 'v6'},
 {'p1': 'v1'},
 {'p1': 'v2'},
 {'p2': 'v3'},
 {'p2': 'v4'}]

If you sum options with and without weights,
* they are grouped into consequent groups where all options has or not weights,
* consequently for each group configs are generated consequently (for groups with weights) or sampled as described above.

In [23]:
domain = op1 + 1.0 * op2 + 1.0 * op3

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p3': 'v5'},
 {'p1': 'v1'},
 {'p2': 'v3'},
 {'p3': 'v6'},
 {'p2': 'v4'},
 {'p1': 'v2'}]

Thus, we firstly get all configs from `op1`, then configs uniformly sampled from `op2` and `op3`. Obviously, if we define some weight too large, firstly we get all samples from corresponding option.

In [24]:
domain = op1 + 1.0 * op2 + 100.0 * op3

domain.set_iter_params(additional=False)
list(domain.iterator)

[{'p3': 'v5'},
 {'p3': 'v6'},
 {'p1': 'v1'},
 {'p1': 'v2'},
 {'p2': 'v3'},
 {'p2': 'v4'}]

Consider more dificult situation. We will get
* all configs from `options[0]`
* configs will be sampled from `1.2 * options[1] + 2.3 * options[2]`
* all configs from `options[3]`
* configs will be sampled from `1.7 * options[4] + 3.4 * options[5]`

In [25]:
options = [Option('p'+str(i), ['v'+str(i)]) for i in range(6)]
domain = options[0] + 1.2 * options[1] + 2.3 * options[2] + options[3] + 1.7 * options[4] + 3.4 * options[5]

domain.set_iter_params(12, additional=False)
list(domain.iterator)

[{'p0': 'v0'},
 {'p2': 'v2'},
 {'p1': 'v1'},
 {'p3': 'v3'},
 {'p5': 'v5'},
 {'p4': 'v4'},
 {'p0': 'v0'},
 {'p2': 'v2'},
 {'p1': 'v1'},
 {'p3': 'v3'},
 {'p4': 'v4'},
 {'p5': 'v5'}]