# Fuzzy String Matching in 30 Lines

Now that you understand all fundamental concepts, let's practice the learnings and build a simple end-to-end demo.

You will use Jina to implement a fuzzy search solution on source code: given a snippet source code and a query, find all lines that are
similar to the query. It is like `grep` but in fuzzy mode.

<div class="alert alert-block alert-info">
    <b>Preliminaries:</b><br>
    <ul>
      <li><a href="https://en.wikipedia.org/wiki/Word_embedding"> Character embedding</a></li>
      <li><a href="https://computersciencewiki.org/index.php/Max-pooling_/_Pooling"> Pooling</a></li>
      <li><a href="https://en.wikipedia.org/wiki/Euclidean_distance"> Euclidean distance</a></li>
    </ul>
</div>

## Client-Server architecture

![Client server architecture diagram](https://github.com/jina-ai/tutorial-notebooks/blob/main/fuzzy-grep/simple-arch.svg?raw=1)


## Server

### Character embedding

You first need to build a simple Executor for character embedding:

In [None]:
import numpy as np
from docarray import DocumentArray
from jina import Executor, requests


class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

### Indexer with Euclidean distance

And you also need a straightforward Indexer to collect and match Documents:

In [None]:
from docarray import DocumentArray
from jina import Executor, requests


class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        docs.match(self._docs, metric='euclidean', limit=20)


### Put it together in a Flow

In [None]:
from jina import Flow

f = (Flow(port_expose=12345, protocol='http', cors=True)
        .add(uses=CharEmbed, replicas=2)  # two replicas of CharEmbed, for scalability (not crucial in this example)
        .add(uses=Indexer))


### Start the Flow and index data

Now you start the flow and call the `\index` endpoint, indexing a Document for each line in the source file.
The Flow will block, remaining available for outside clients.

In [None]:
from docarray import Document
import os 

with f:
    f.post('/index', (Document(text=t.strip()) for t in open('fuzzy-grep.ipynb') if t.strip()))  # index all lines of _this_ file
    f.block()  # block for listening request

## Query from Python

You can now access the Flow via the Jina Python Client, and search for lines similar to "request(on=something)".


In [None]:
from docarray import Document
from jina import Client


def print_matches(resp):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclidean"].value:2f}: "{d.text}"')

c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

[0]0.132651: ""c.post('/search', Document(text='request(on=something)'), on_done=print_matches)""
[1]0.205947: ""That means, **we want to find lines from the above code snippet that are most similar to `@request(on=something)`.**\n","
[2]0.210064: ""from jina import Executor, requests\n","

## Query via SwaggerUI

Alternatively, you can send your query through your web browser, using SwaggerUI.

First, open `http://localhost:12345/docs` (an extended Swagger UI) in your browser.
Here you can see a number of tabs which correspond to the different endpoints exposed by the Flow.
Since you want to search for text, click <kbd>/search</kbd> tab and input, then on <kbd>/try it out</kbd>, and enter your
query in the following form:

```json
{
  "data": [
    {
      "text": "requests(on=something)"
    }
  ]
}
```

That means, **we want to find lines from the above code snippet that are most similar to `@request(on=something)`.**
Now, click the <kbd>Execute</kbd> button and you will see a response that includes *matches* and their text entries.