To try this example, Go to Runtime -> Run all

Report problems with this example on [GitHub Issues](https://github.com/jina-ai/jina/issues/new/choose)

Make sure to run this command to install Jina 2.0 for this notebook

In [7]:
!pip install --pre jina

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting ipynbname
  Downloading ipynbname-2021.3.2-py3-none-any.whl (4.0 kB)
Collecting ipykernel
  Using cached ipykernel-6.0.1-py3-none-any.whl (122 kB)
Collecting ipython>=7.23.1
  Using cached ipython-7.25.0-py3-none-any.whl (786 kB)
Collecting debugpy>=1.0.0
  Downloading debugpy-1.3.0-cp37-cp37m-macosx_10_14_x86_64.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 2.0 MB/s eta 0:00:01
[?25hCollecting matplotlib-inline<0.2.0appnope,>=0.1.0; platform_system == "Darwin"
  Using cached matplotlib_inline-0.1.2-py3-none-any.whl (8.2 kB)
Collecting jupyter-client
  Using cached jupyter_client-6.1.12-py3-none-any.whl (112 kB)
Collecting jupyter-core>=4.6.0
  Using cached jupyter_core-4.

## Minimum Working Example for Jina 2.0

This notebook explains the code behind the [38-Line Get Started](https://github.com/jina-ai/jina#get-started).

The demo indices every line of its *own source code*, then searches for the most similar line to `"request(on=something)"`. No other library required, no external dataset required. The dataset is the codebase.

### Import

For this demo, we only need to import `numpy` and `jina`:

In [9]:
import os
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

### Character embedding

For embedding every line of the code, we want to represent it into a vector using simple character embedding and mean-pooling.

The character embedding is a simple identity matrix.

To do that we need to write a new `Executor`:

In [10]:
class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # mean-pooling

### Indexing

To store & retrieve encoded results, we need an indexer. At index time, it stores `DocumentArray` into memory. At query time, it computes the Euclidean distance between the embeddings of query Documents and all embeddings of the stored Documents.

The indexing and searching are represented by `@request('/index')` and `@request('/search')`, respectively.

In [11]:
class Indexer(Executor):
    _docs = DocumentArray()  # for storing all document in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embedding from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embedding from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.score.value)  # sort matches by its value

### Callback function

Callback function is invoked when the search is done.

In [12]:
def print_matches(req):  # the callback function invoked when task is done
    for idx, d in enumerate(req.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.score.value:2f}: "{d.text}"')

### Flow

In [None]:
f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(os.__file__) if t.strip()))  # index all lines of this file
    f.block()  # block for listening request

[32m      pod0/head@5104[L]:ready and listening[0m
[32m     pod0/pea-0@5104[L]:ready and listening[0m
[32m     pod0/pea-1@5104[L]:ready and listening[0m
[32m      pod0/tail@5104[L]:ready and listening[0m
[32m           pod1@5104[L]:ready and listening[0m
[32m        gateway@5104[L]:ready and listening[0m
           Flow@5104[I]:🎉 Flow is ready to use!
	🔗 Protocol: 		[1mGRPC[0m
	🏠 Local access:	[4m[36m0.0.0.0:12345[0m
	🔒 Private network:	[4m[36m127.0.0.1:12345[0m[0m


Keep the above running and start a simple client:

In [None]:
from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')


c = Client(host='localhost', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

It finds the lines most similar to "request(on=something)" from the server code snippet and prints the following:

Need help in understanding Jina? Ask a question to friendly Jina community on [Slack](https://slack.jina.ai/) (usual response time: 1hr)