[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface.ipynb)

# Using HuggingFaceEncoder

HuggingFace is a huge ecosystem of open source models. It can be run locally and supports the largest library of encoders.

## Getting Started

We start by installing semantic-router with the `[local]` flag to include all necessary dependencies for `HuggingFaceEncoder`:

In [None]:
!pip install -qU "semantic-router[local]==0.0.20"

We start by defining a dictionary mapping routes to example phrases that should trigger those routes.

In [2]:
from semantic_router import Route

politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president",
        "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
)

  from .autonotebook import tqdm as notebook_tqdm


_**⚠️ If you see an ImportError, you must install local dependencies. You can do so by installing Semantic Router using `pip install -qU "semantic-router[local]"`.**_

Let's define another for good measure:

In [3]:
chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
)

routes = [politics, chitchat]

Now we initialize our embedding model.

In [4]:
from semantic_router.encoders import HuggingFaceEncoder

encoder = HuggingFaceEncoder()

tokenizer_config.json: 100%|██████████| 350/350 [00:00<?, ?B/s] 
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 950kB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 630kB/s]
special_tokens_map.json: 100%|██████████| 112/112 [00:00<?, ?B/s] 
config.json: 100%|██████████| 612/612 [00:00<?, ?B/s] 
model.safetensors: 100%|██████████| 90.9M/90.9M [00:03<00:00, 25.8MB/s]


In [11]:
encoder(["hey"])

[[-0.11423865705728531,
  0.013737470842897892,
  0.05483824759721756,
  0.02612205408513546,
  0.03366684541106224,
  -0.0807342678308487,
  0.1294635385274887,
  0.03305264189839363,
  -0.02904639206826687,
  -0.04729180410504341,
  -0.01743963174521923,
  0.010834690183401108,
  -0.013411852531135082,
  -0.002171672647818923,
  0.01874753087759018,
  0.00762708717957139,
  0.025326967239379883,
  -0.10684280842542648,
  -0.09309743344783783,
  0.06578213721513748,
  0.03600294888019562,
  0.030101895332336426,
  0.010903903283178806,
  -0.014277834445238113,
  -0.05480341985821724,
  -0.046608816832304,
  0.03811933100223541,
  0.09334418922662735,
  -0.08844338357448578,
  -0.00015365486615337431,
  -0.05805235356092453,
  0.040658798068761826,
  0.03602350875735283,
  -0.00012040344881825149,
  0.0014777182368561625,
  -0.01575486548244953,
  -0.0839557945728302,
  -0.1172216534614563,
  0.02088264748454094,
  0.03227342665195465,
  -0.01875143311917782,
  -0.006739516742527485,
 

Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.

In [5]:
from semantic_router.layer import RouteLayer

rl = RouteLayer(encoder=encoder, routes=routes)

[32m2024-04-14 18:15:59 INFO semantic_router.utils.logger local[0m


In [7]:
rl.index

LocalIndex(index=array([[-0.00864941,  0.02088905,  0.04548961, ..., -0.00787578,
         0.0252752 ,  0.01958269],
       [ 0.10101052, -0.05990515,  0.01437028, ..., -0.00809868,
         0.03701495, -0.01487793],
       [ 0.01996296, -0.03627442,  0.15291646, ...,  0.06719883,
         0.08079942, -0.03931363],
       ...,
       [-0.01322144,  0.11396162,  0.14592913, ..., -0.01772162,
        -0.09720093,  0.05921701],
       [ 0.02863792,  0.09272329,  0.10989423, ..., -0.00030185,
        -0.10717052,  0.04849005],
       [-0.03355407, -0.04666358, -0.05054352, ...,  0.04337099,
         0.10585055, -0.06144635]]), routes=array(['politics', 'politics', 'politics', 'politics', 'politics',
       'politics', 'chitchat', 'chitchat', 'chitchat', 'chitchat',
       'chitchat'], dtype='<U8'), utterances=array(["isn't politics the best thing ever",
       "why don't you tell me about your political opinions",
       "don't you just love the president",
       "don't you just hate the 

Now we can test it:

In [6]:
rl("don't you love politics?")

vector [ 7.46806897e-03 -3.49807888e-02  8.88827071e-02 -3.65814343e-02
  9.69796628e-02 -3.88786122e-02  1.91174150e-02 -2.44899071e-03
  2.94564646e-02  2.73508448e-02 -1.19639039e-01  3.17963734e-02
  2.52084807e-03  2.75214389e-03 -1.46908767e-03  3.32778133e-02
 -8.00867677e-02 -4.17326577e-03 -2.38617379e-02  1.46567017e-01
 -1.49368718e-01  4.26295139e-02  4.98373099e-02  1.65512003e-02
  1.98207237e-02 -2.43026596e-02 -1.81025621e-02 -2.04044133e-02
 -6.39620274e-02 -4.54164632e-02 -7.70635204e-03  1.11671137e-02
 -1.09083969e-02  6.57793432e-02  3.92031223e-02  7.03004422e-03
  5.31722419e-02  2.29947194e-02  3.18085887e-02 -1.34273348e-02
  7.80912861e-03  1.41071444e-02 -2.45402344e-02 -5.35607934e-02
 -5.93818501e-02 -2.74034124e-03  3.07757724e-02  4.67852596e-03
  7.82168582e-02 -7.07853511e-02  1.20290471e-02  3.36084664e-02
  3.86536419e-02  4.40022983e-02  7.18170255e-02 -1.54885557e-02
  3.44607979e-02  4.96295393e-02 -4.19351831e-02  9.97676980e-05
 -2.87968363e-03  

RouteChoice(name='politics', function_call=None, similarity_score=None)

In [6]:
rl("how's the weather today?")

RouteChoice(name='chitchat', function_call=None)

Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?

In [7]:
rl("I'm interested in learning about llama 2")

RouteChoice(name=None, function_call=None)

In this case, we return `None` because no matches were identified.