# Embedder API example: term embedding

In order to run this notebook, setup and start the embedding service as described [here](https://github.com/BlueBrain/BlueGraph/blob/master/services/embedder/README.rst).

You may want to modify the following configs in `services/embedder/configs/app_config.py`:

- `DOWNLOAD_DIR = "downloads/"`: Directory for downloading or serving from embeddig pipelines
- `LOCAL = True`: Flag indicating whether you would like to serve embedding pipelines hosted in Nexus or stored in the local `DOWNLOAD_DIR` 

By default, the `services/embedder/downloads` folder is used and `LOCAL` is set to `True`. This folder contains two example models (`Cord-19-NCIT-linking` and `Attri2vec_test_model`) distributed along with the source code.

In [None]:
import requests

In [None]:
ENDPOINT = "http://127.0.0.1:5000"

## Get all the models in the catalogue

In [None]:
r = requests.get(
    f'{ENDPOINT}/models/')
print(r)
r.json()

## Get a model by name

In [None]:
MODEL_NAME = "Cord-19-NCIT-linking"

In [None]:
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}')
print(r)
r.json()

## Get details on different model components

In [None]:
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/preprocessor/')
print(r)
r.json()

In [None]:
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/embedder/')
print(r)
r.json()

In [None]:
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/similarity-processor/')
print(r)
r.json()

## Get resource embeddings

In [None]:
%%time
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/embedding',
    params={
        "resource_ids": ["dna replication", "glucose", "covid-19 infection", "lalala not in the index"]
    })
print(r)
r.json()

Alternatively, to retrieve embedding vectors for a large number of resources, a POST request can be sent to the same endpoint with the resource IDs in the request body.

In [None]:
%%time
r = requests.post(
    f'{ENDPOINT}/models/{MODEL_NAME}/embedding/',
    json={
        "resource_ids": ["dna replication", "glucose", "covid-19 infection", "lalala not in the index"]
    })
print(r)
r.json()

## Get nearest neighbors

In [None]:
%%time
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/neighbors/',
    params={
        "resource_ids": ["glucose", "covid-19 infection", "dna replication", "lalala not in the index"],
        "k": 20
    })
print(r)
r.json()

In [None]:
%%time
r = requests.get(
    f'{ENDPOINT}/models/{MODEL_NAME}/neighbors/',
    params={
        "resource_ids": ["glucose", "covid-19 infection", "dna replication", "lalala not in the index"],
        "k": 20,
        "values": True
    })
print(r)
r.json()

Alternatively, to get nearest neighbors for a large number of resources, a POST request can be sent to the same endpoint with the resource IDs in the request body.

In [None]:
%%time
r = requests.post(
    f'{ENDPOINT}/models/{MODEL_NAME}/neighbors/',
    params={"k": 20, "values": True},
    json={
        "resource_ids": ["glucose", "covid-19 infection", "dna replication", "lalala not in the index"],
    })
print(r)
r.json()

## Predict embeddings for unseen points

In [None]:
%%time
r = requests.post(
    f'{ENDPOINT}/models/{MODEL_NAME}/embedding/',
    json={
        "data": ["hello world", "protein", "coronavirus"],
    })
print(r)
vectors = r.json()["vectors"]

In [None]:
%%time
r = requests.post(
    f'{ENDPOINT}/models/{MODEL_NAME}/neighbors/',
    params={
        "k": 20,
        "values": True
    },
    json={
        "vectors": vectors
    })
print(r)
r.json()