## Flow Optimization

Flow Optimization gets the most out of your data.
It allows hyper parameter optimization on a complete search Flow, including indexing and querying.
For example, choosing a middle layer of a model often results in richer semantic embeddings.
Let's test through all layers of a model.

### Setup

Before we start, we need to install the needed dependencies.

In [6]:
%%bash
pip install jina[optimizer]



### Imports

First, let's get all needed imports.

In [7]:
import numpy as np
from jina import Document
from jina.executors.encoders import BaseEncoder
from jina.optimizers import FlowOptimizer, MeanEvaluationCallback
from jina.optimizers.flow_runner import SingleFlowRunner


### Flow definition

For simplicity the Flow consists of two parts: An Encoder and an Evaluator.
The `SimpleEncoder` attaches an embedding to each given Document.
The `EuclideanEvaluator` scores the embedding agains a given groundtruth.

`ENCODER_LAYER` allows the optimizer to change the Encoder configuration with each iteration.
Beware, that the Pod definition is done via the inline syntax of Jina.

In [2]:
flow = '''jtype: Flow
version: '1'
pods:
  - uses:
      jtype: SimpleEncoder
      with:
        layer: ${{JINA_ENCODER_LAYER}}
  - uses: EuclideanEvaluator
'''

### Encoder Definition

Now we will fake a model with three layers.
For simplicity each layer only consists of a single integer which is taken as the embedding.


In [3]:
class SimpleEncoder(BaseEncoder):

    ENCODE_LOOKUP = {
        '🐲': [1, 3, 5],
        '🐦': [2, 4, 7],
        '🐢': [0, 2, 5],
    }

    def __init__(self, layer=0, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._layer = layer

    def encode(self, data, *args, **kwargs) -> 'np.ndarray':
        return np.array([[self.ENCODE_LOOKUP[data[0]][self._layer]]])


### Parameter definition

We are loading the parameter from the `parameter.yml` file, which contains the following:

```yaml
- !IntegerParameter
  jaml_variable: JINA_ENCODER_LAYER
  high: 2
  low: 0
  step_size: 1
```

### De

For optimization, we need to run almost equal Flows again and again with the same data.
This is realized with a `SingleFlowRunner`.

In [4]:
documents = [
    (Document(content='🐲'), Document(embedding=np.array([2]))),
    (Document(content='🐦'), Document(embedding=np.array([3]))),
    (Document(content='🐢'), Document(embedding=np.array([3])))
]

runner = SingleFlowRunner(
    flow, documents, 1, 'search', overwrite_workspace=True
)


In [5]:
optimizer = FlowOptimizer(
    flow_runner=runner,
    parameter_yaml='parameter.yml',
    evaluation_callback=MeanEvaluationCallback(),
    n_trials=3,
    direction='minimize',
    seed=1
)

optimizer.optimize_flow()


[32m[I 2021-04-13 13:47:58,678][0m A new study created in memory with name: no-name-9637e0dc-ea43-48cd-92b4-4ae16b9840f7[0m


           pod0@14412[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod0@14412[I]:input [33mtcp://0.0.0.0:33603[0m (PULL_BIND) output [33mtcp://0.0.0.0:48297[0m (PUSH_CONNECT) control over [33mtcp://0.0.0.0:49753[0m (PAIR_BIND)
           pod1@14419[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod1@14419[I]:input [33mtcp://0.0.0.0:48297[0m (PULL_BIND) output [33mtcp://0.0.0.0:32853[0m (PUSH_BIND) control over [33mtcp://0.0.0.0:49589[0m (PAIR_BIND)
        gateway@14426[I]:starting jina.peapods.runtimes.asyncio.grpc.GRPCRuntime...
  SimpleEncoder@14412[I]:post_init may take some time...
  SimpleEncoder@14412[I]:post_init may take some time takes 0 seconds (0.00s)
  SimpleEncoder@14412[S]:[32msuccessfully built SimpleEncoder from a yaml config[0m
        gateway@14426[I]:input [33mtcp://0.0.0.0:32853[0m (PULL_CONNECT) output [33mtcp://0.0.0.0:33603[0m (PUSH_CONNECT) control over [33mipc:///tmp/tmp2uqru5a2[0m (PAIR_BIND)
     

[32m[I 2021-04-13 13:48:00,943][0m Trial 0 finished with value: 1.0 and parameters: {'JINA_ENCODER_LAYER': 1}. Best is trial 0 with value: 1.0.[0m


           pod0@14508[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod0@14508[I]:input [33mtcp://0.0.0.0:59353[0m (PULL_BIND) output [33mtcp://0.0.0.0:41895[0m (PUSH_CONNECT) control over [33mtcp://0.0.0.0:53857[0m (PAIR_BIND)
           pod1@14515[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod1@14515[I]:input [33mtcp://0.0.0.0:41895[0m (PULL_BIND) output [33mtcp://0.0.0.0:34765[0m (PUSH_BIND) control over [33mtcp://0.0.0.0:57551[0m (PAIR_BIND)
  SimpleEncoder@14508[I]:post_init may take some time...
  SimpleEncoder@14508[I]:post_init may take some time takes 0 seconds (0.00s)
        gateway@14526[I]:starting jina.peapods.runtimes.asyncio.grpc.GRPCRuntime...
  SimpleEncoder@14508[S]:[32msuccessfully built SimpleEncoder from a yaml config[0m
        gateway@14526[I]:input [33mtcp://0.0.0.0:34765[0m (PULL_CONNECT) output [33mtcp://0.0.0.0:59353[0m (PUSH_CONNECT) control over [33mipc:///tmp/tmphbkaah1v[0m (PAIR_BIND)
     

[32m[I 2021-04-13 13:48:02,963][0m Trial 1 finished with value: 1.6666666666666667 and parameters: {'JINA_ENCODER_LAYER': 0}. Best is trial 0 with value: 1.0.[0m


           JINA@12297[W]:[40m[33m[31mExisting workspace deleted[0m[0m
           JINA@12297[W]:[40m[33m[31mWORKSPACE: ./JINA_WORKSPACE_0[0m[0m
           JINA@12297[W]:[40m[33m[31mchange overwrite_workspace to change this[0m[0m
           pod0@14605[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod0@14605[I]:input [33mtcp://0.0.0.0:39955[0m (PULL_BIND) output [33mtcp://0.0.0.0:55407[0m (PUSH_CONNECT) control over [33mtcp://0.0.0.0:47689[0m (PAIR_BIND)
           pod1@14612[I]:starting jina.peapods.runtimes.zmq.zed.ZEDRuntime...
           pod1@14612[I]:input [33mtcp://0.0.0.0:55407[0m (PULL_BIND) output [33mtcp://0.0.0.0:36863[0m (PUSH_BIND) control over [33mtcp://0.0.0.0:53185[0m (PAIR_BIND)
  SimpleEncoder@14605[I]:post_init may take some time...
  SimpleEncoder@14605[I]:post_init may take some time takes 0 seconds (0.00s)
  SimpleEncoder@14605[S]:[32msuccessfully built SimpleEncoder from a yaml config[0m
EuclideanEvaluator@14612[I]:p

[32m[I 2021-04-13 13:48:05,674][0m Trial 2 finished with value: 1.6666666666666667 and parameters: {'JINA_ENCODER_LAYER': 0}. Best is trial 0 with value: 1.0.[0m


           JINA@12297[I]:[32mNumber of finished trials: 3[0m
           JINA@12297[I]:[32mBest trial: {'JINA_ENCODER_LAYER': 1}[0m
           JINA@12297[I]:[32mTime to finish: 0:00:02.263426[0m


<jina.optimizers.ResultProcessor at 0x7fc9e180ed50>