[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface-endpoint.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface-endpoint.ipynb)

# Using Huggingface endpoint

HuggingFace is a huge ecosystem of open source models. It can be run locally and supports the largest library of encoders.

Currently the Semantic Routeres `HFEndpointEncoder` class is setup to only use TEI (Text Imbedding Inference models). See: 

https://huggingface.co/docs/text-embeddings-inference/quick_tour

For example, trying using `"https://api-inference.huggingface.co/models/BAAI/bge-large-en-v1.5"` as the `HF_API_URL`.


## Getting Started

We start by installing semantic-router.

In [1]:
# !pip install -qU semantic-router==0.0.20

We start by defining a dictionary mapping routes to example phrases that should trigger those routes.

In [2]:
from semantic_router import Route

politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president",
        "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
)

* 'allow_population_by_field_name' has been renamed to 'populate_by_name'
* 'smart_union' has been removed
  from .autonotebook import tqdm as notebook_tqdm


Let's define another for good measure:

In [3]:
chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
)

routes = [politics, chitchat]

Now we initialize our embedding model, we will use the Huggingface endpoint

In [4]:
import os
from getpass import getpass
from semantic_router.encoders.huggingface import HFEndpointEncoder

huggingface_url = os.getenv("HF_API_URL") or getpass("Enter HuggingFace API URL: ")
huggingface_api_key = os.getenv("HF_API_KEY") or getpass("Enter HuggingFace API Key: ")

encoder = HFEndpointEncoder(
    huggingface_url=huggingface_url,
    huggingface_api_key=huggingface_api_key,
)

In [5]:
encoder("Hey")

Processing batch 1 with 3 documents


[[0.06257740408182144,
  0.023564014583826065,
  0.0017444374971091747,
  0.04231269657611847,
  -0.01953040435910225,
  -0.03389021381735802,
  0.04474523290991783,
  0.021480467170476913,
  0.03779444471001625,
  0.015299754217267036,
  0.007672062609344721,
  0.005935966037213802,
  -0.04037568345665932,
  0.009820322506129742,
  -0.055159423500299454,
  -0.027077648788690567,
  0.010105383582413197,
  -0.020441915839910507,
  -0.026698201894760132,
  0.0012803795980289578,
  -0.02498244121670723,
  0.04908446967601776,
  -0.07169245928525925,
  -0.010073100216686726,
  -0.025036698207259178,
  0.011281315237283707,
  0.0017082496779039502,
  -0.005548158194869757,
  0.03506730869412422,
  0.03669964149594307,
  -0.025721052661538124,
  0.020028667524456978,
  -0.001102163689211011,
  -0.038798652589321136,
  -0.02513568289577961,
  -0.020972445607185364,
  0.020897019654512405,
  -0.019093383103609085,
  0.011529088020324707,
  -0.01873159594833851,
  0.00985939335078001,
  0.01243

Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.

In [6]:
from semantic_router.layer import RouteLayer

rl = RouteLayer(encoder=encoder, routes=routes)

Processing batch 1 with 11 documents


We can check the dimensionality of our vectors by looking at the `index` attribute of the `RouteLayer`.

In [7]:
rl.index

LocalIndex(index=array([[ 0.02499899,  0.02385715,  0.00468016, ...,  0.01588508,
        -0.00209475, -0.01171774],
       [ 0.01642705, -0.02640625, -0.00659476, ..., -0.0194328 ,
        -0.02548762, -0.04860513],
       [ 0.06932217,  0.02093412,  0.01063143, ...,  0.00100745,
        -0.00822618,  0.03042692],
       ...,
       [-0.0244599 ,  0.02428463, -0.01713768, ..., -0.0349757 ,
        -0.01013033,  0.01069217],
       [-0.00814284,  0.02359665, -0.02238297, ..., -0.03862419,
         0.00937595,  0.00824297],
       [-0.00102218, -0.00938131, -0.02210761, ..., -0.05303425,
        -0.00088396, -0.03256031]]), routes=array(['politics', 'politics', 'politics', 'politics', 'politics',
       'politics', 'chitchat', 'chitchat', 'chitchat', 'chitchat',
       'chitchat'], dtype='<U8'), utterances=array(["isn't politics the best thing ever",
       "why don't you tell me about your political opinions",
       "don't you just love the president",
       "don't you just hate the 

We do have 1024-dimensional vectors. Now let's test them:

In [8]:
rl("tell me about your political opinions?")

Processing batch 1 with 1 documents


RouteChoice(name=None, function_call=None, similarity_score=None)

In [9]:
rl("how's the weather today?")

Processing batch 1 with 1 documents


RouteChoice(name='chitchat', function_call=None, similarity_score=None)

Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?

In [10]:
rl("I'm interested in learning about llama 2")

Processing batch 1 with 1 documents


RouteChoice(name=None, function_call=None, similarity_score=None)

In this case, we return `None` because no matches were identified. We always recommend optimizing your `RouteLayer` for optimal performance, you can see how in [this notebook](https://github.com/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization.ipynb).

---

## Testing Batches

Create a larger list of documents.

In [11]:
test_docs = [
    "This is a test document about politics.",
    "The weather is nice today.",
    "I love discussing political issues.",
    "Let's talk about the latest news.",
    "What's your opinion on the current economic situation?",
    "How's the weather in your area?",
    "The political landscape is changing rapidly.",
    "I prefer to avoid political discussions.",
    "Do you think it will rain tomorrow?",
    "The government announced new policies today.",
] * 10  # Repeat the list 10 times to get 100 documents

In [12]:
print(f"Number of input documents: {len(test_docs)}")

Number of input documents: 100


Now, let's test the encoder with our larger list of documents:

In [13]:
# Initialize the encoder
encoder = HFEndpointEncoder(
    huggingface_url=huggingface_url,
    huggingface_api_key=huggingface_api_key,
)

# Call the encoder with the test documents
embeddings = encoder(test_docs)

# Print some information about the results
print(f"Number of input documents: {len(test_docs)}")
print(f"Number of embeddings generated: {len(embeddings)}")
print(f"Dimension of each embedding: {len(embeddings[0])}")

Processing batch 1 with 50 documents
Processing batch 2 with 50 documents
Number of input documents: 100
Number of embeddings generated: 100
Dimension of each embedding: 1024
