[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/10-sparse-threshold-optimization-guardrail-graphai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/10-sparse-threshold-optimization-guardrail-graphai.ipynb)

## Sparse Encoder

### Install Prerequisites

In [222]:
!pip install -qU \
   semantic-router>=0.1.4 \
   graphai-lib==0.0.2

### Creating Hybrid Router for Sparse Encoder Detection

To begin we first need to import the `Route` class from the `semantic_router` package.

Then we can define the routes that we want to use in our semantic router. For this example we will use routes for BYD, Tesla, Polestar, and Rivian. Giving each route a name and a list of utterances that we want to use to represent the route.


In [223]:
from semantic_router import Route

# Route for BYD-related queries (allowed)
byd = Route(
    name="byd",
    utterances=[
        "Tell me about the BYD Seal.",
        "What is the battery capacity of the BYD Dolphin?",
        "How does BYD's Blade Battery work?",
        "Is the BYD Atto 3 a good EV?",
        "Can I sell my BYD?",
        "How much is my BYD worth?",
        "What is the resale value of my BYD?",
        "How much can I get for my BYD?",
        "How much can I sell my BYD for?",
    ],
)

# Route for Tesla-related queries (blocked or redirected)
tesla = Route(
    name="tesla",
    utterances=[
        "Is Tesla better than BYD?",
        "Tell me about the Tesla Model 3.",
        "How does Tesla’s autopilot compare to other EVs?",
        "What’s new in the Tesla Cybertruck?",
        "Can I sell my Tesla?",
        "How much is my Tesla worth?",
        "What is the resale value of my Tesla?",
        "How much can I get for my Tesla?",
        "How much can I sell my Tesla for?",
    ],
)

# Route for Polestar-related queries (blocked or redirected)
polestar = Route(
    name="polestar",
    utterances=[
        "What’s the range of the Polestar 2?",
        "Is Polestar a good alternative to other EVs?",
        "How does Polestar compare to other EVs?",
        "Can I sell my Polestar?",
        "How much is my Polestar worth?",
        "What is the resale value of my Polestar?",
        "How much can I get for my Polestar?",
        "How much can I sell my Polestar for?",
    ],
)

# Route for Rivian-related queries (blocked or redirected)
rivian = Route(
    name="rivian",
    utterances=[
        "Tell me about the Rivian R1T.",
        "How does Rivian's off-road capability compare to other EVs?",
        "Is Rivian's charging network better than other EVs?",
        "Can I sell my Rivian?",
        "How much is my Rivian worth?",
        "What is the resale value of my Rivian?",
        "How much can I get for my Rivian?",
        "How much can I sell my Rivian for?",
    ],
)

# Combine all routes
routes = [byd, tesla, polestar, rivian]

Next we need to define the sparse encoder, but before we do that we need to import the `AurelioSparseEncoder` class from the `semantic_router.encoders` package.

This will also require an Aurelio API key, which can be obtained from the [Aurelio Platform website](https://platform.aurelio.ai/settings/api-keys).

Now we can define the sparse encoder and use the `bm25` model.

In [224]:
import os
from getpass import getpass
from semantic_router.encoders.aurelio import AurelioSparseEncoder

os.environ["AURELIO_API_KEY"] = os.environ["AURELIO_API_KEY"] or getpass(
    "Enter your Aurelio API key: "
)
# sparse encoder for term matching
sparse_encoder = AurelioSparseEncoder(name="bm25")

Next we need to define the dense encoder, and similar to before we need to import the `OpenAIEncoder` class from the `semantic_router.encoders` package.

This will also require an OpenAI API key, which can be obtained from the [OpenAI Platform website](https://platform.openai.com/api-keys).

Now we can define the dense encoder and use the `text-embedding-3-small` model alongside a score threshold of 0.3.

In [225]:
from semantic_router.encoders import OpenAIEncoder

os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"] or getpass(
    "Enter your OpenAI API key: "
)
# dense encoder for semantic meaning
encoder = OpenAIEncoder(name="text-embedding-3-small", score_threshold=0.3)

Now we have all the components needed including the routes, sparse encoder, and dense encoder to create our hybrid router **(semantic router only uses dense embeddings)**.

Within the `HybridRouter` class we pass in the dense encoder, sparse encoder, routes, and the `auto_sync` parameter.

In [226]:
from semantic_router.routers import HybridRouter

first_router = HybridRouter(
    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync="local"
)

2025-03-23 13:11:20 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 13:11:21 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


To check if the router is synced we can use the `is_synced` method.

In [227]:
first_router.is_synced()



False

To check the current route thresholds we can use the `get_thresholds` method which will return a dictionary of route names and their corresponding thresholds values in a float.

In [228]:
route_thresholds = first_router.get_thresholds()
print("Default route thresholds:", route_thresholds)

Default route thresholds: {'byd': 0.09, 'tesla': 0.09, 'polestar': 0.09, 'rivian': 0.09}


We can also use the `get_utterance_diff` method to see the difference in utterances between the local and remote routes.

In [229]:
first_router.get_utterance_diff()

['  byd: Can I sell my BYD?',
 "  byd: How does BYD's Blade Battery work?",
 '  byd: How much can I get for my BYD?',
 '  byd: How much can I sell my BYD for?',
 '  byd: How much is my BYD worth?',
 '  byd: Is the BYD Atto 3 a good EV?',
 '  byd: Tell me about the BYD Seal.',
 '  byd: What is the battery capacity of the BYD Dolphin?',
 '  byd: What is the resale value of my BYD?',
 '  polestar: Can I sell my Polestar?',
 '  polestar: How does Polestar compare to other EVs?',
 '  polestar: How much can I get for my Polestar?',
 '  polestar: How much can I sell my Polestar for?',
 '  polestar: How much is my Polestar worth?',
 '  polestar: Is Polestar a good alternative to other EVs?',
 '  polestar: What is the resale value of my Polestar?',
 '  polestar: What’s the range of the Polestar 2?',
 '  rivian: Can I sell my Rivian?',
 "  rivian: How does Rivian's off-road capability compare to other EVs?",
 '  rivian: How much can I get for my Rivian?',
 '  rivian: How much can I sell my Rivia

Next we can use the `get_utterances` method to get the utterances from the `index` attribute attached to the router.

In [230]:
first_router.index.get_utterances()

[Utterance(route='byd', utterance='Can I sell my BYD?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance="How does BYD's Blade Battery work?", function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='How much can I get for my BYD?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='How much can I sell my BYD for?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='How much is my BYD worth?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='Is the BYD Atto 3 a good EV?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='Tell me about the BYD Seal.', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='What is the battery capacity of the BYD Dolphin?', function_schemas=None, metadata={}, diff_tag=' '),
 Utterance(route='byd', utterance='What is the 

We can test our router already by passing in a list of utterances and seeing which route each utterance is routed to.

In [231]:
for utterance in [
    "Tell me about BYD's Blade Battery.",
    "Does the Tesla Model 3 have better range?",
    "What are the key features of the Polestar 2?",
    "Is Rivian's R1T better for off-roading?",
]:
    print(f"{utterance} -> {first_router(utterance).name}")

2025-03-23 13:11:22 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Tell me about BYD's Blade Battery. -> byd


2025-03-23 13:11:23 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Does the Tesla Model 3 have better range? -> tesla


2025-03-23 13:11:24 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


What are the key features of the Polestar 2? -> polestar


2025-03-23 13:11:25 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Is Rivian's R1T better for off-roading? -> rivian


We can also use the `evaluate` method to evaluate the router by passing in a list of test data and evaluating the accuracy of the router.

In [232]:
test_data = [
    ("Tell me about BYD's Blade Battery.", "byd"),
    ("Does the Tesla Model 3 have better range?", "tesla"),
    ("What are the key features of the Polestar 2?", "polestar"),
    ("Is Rivian's R1T better for off-roading?", "rivian"),
]

# unpack the test data
X, y = zip(*test_data)

# evaluate using the default thresholds
accuracy = first_router.evaluate(X=X, y=y)
print(f"Accuracy: {accuracy*100:.2f}%")

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-23 13:11:26 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.06s/it]

Accuracy: 100.00%





Unfortunatly having a small dataset of 4 utterances for each route is not enough to get a good understanding of the router's performance.

So we will use a larger dataset of BYD, Tesla, Polestar, and Rivian related queries to evaluate the router.


In [233]:
test_data = [
    # BYD-related queries
    ("Tell me about the BYD Seal.", "byd"),
    ("What is the battery capacity of the BYD Dolphin?", "byd"),
    ("How does BYD's Blade Battery work?", "byd"),
    ("Is the BYD Atto 3 a good EV?", "byd"),
    ("What’s the range of the BYD Tang?", "byd"),
    ("Does BYD offer fast-charging stations?", "byd"),
    ("How is the BYD Han different from the Seal?", "byd"),
    ("Is BYD the largest EV manufacturer in China?", "byd"),
    ("What is the top speed of the BYD Seal?", "byd"),
    ("Compare the BYD Dolphin and the BYD Atto 3.", "byd"),
    ("How does BYD’s battery technology compare to Tesla’s?", "byd"),
    ("What makes the BYD Blade Battery safer?", "byd"),
    ("Does BYD have plans to expand to Europe?", "byd"),
    ("How efficient is the BYD Tang in terms of range?", "byd"),
    ("What are the latest BYD electric vehicle models?", "byd"),
    ("How does the BYD Han compare to the Tesla Model S?", "byd"),
    ("What is the warranty on BYD EV batteries?", "byd"),
    ("Which BYD model is the best for long-distance driving?", "byd"),
    ("Does BYD manufacture its own battery cells?", "byd"),
    # Tesla-related queries
    ("Is Tesla better than BYD?", "tesla"),
    ("Tell me about the Tesla Model 3.", "tesla"),
    ("How does Tesla’s autopilot compare to other EVs?", "tesla"),
    ("What’s new in the Tesla Cybertruck?", "tesla"),
    ("What is Tesla’s Full Self-Driving feature?", "tesla"),
    ("How long does it take to charge a Tesla?", "tesla"),
    ("Tell me about the Tesla Roadster.", "tesla"),
    ("How much does a Tesla Model S cost?", "tesla"),
    ("Which Tesla model has the longest range?", "tesla"),
    ("What are the main differences between the Tesla Model S and Model 3?", "tesla"),
    ("How safe is Tesla’s Autopilot?", "tesla"),
    ("Does Tesla use LFP batteries?", "tesla"),
    ("What is the Tesla Supercharger network?", "tesla"),
    ("How does Tesla’s Plaid mode work?", "tesla"),
    ("Which Tesla is best for off-roading?", "tesla"),
    # Polestar-related queries
    ("What’s the range of the Polestar 2?", "polestar"),
    ("Is Polestar a good alternative?", "polestar"),
    ("How does Polestar compare to Tesla?", "polestar"),
    ("Tell me about the Polestar 3.", "polestar"),
    ("Is the Polestar 2 fully electric?", "polestar"),
    ("What is Polestar’s performance like?", "polestar"),
    ("Does Polestar offer any performance upgrades?", "polestar"),
    ("How is Polestar's autonomous driving technology?", "polestar"),
    ("What is the battery capacity of the Polestar 2?", "polestar"),
    ("How does Polestar differ from Volvo?", "polestar"),
    ("Is Polestar planning a fully electric SUV?", "polestar"),
    ("How does the Polestar 4 compare to other EVs?", "polestar"),
    ("What are Polestar’s sustainability goals?", "polestar"),
    ("How much does a Polestar 3 cost?", "polestar"),
    ("Does Polestar have its own fast-charging network?", "polestar"),
    # Rivian-related queries
    ("Tell me about the Rivian R1T.", "rivian"),
    ("How does Rivian's off-road capability compare to other EVs?", "rivian"),
    ("Is Rivian's charging network better than other EVs?", "rivian"),
    ("What is the range of the Rivian R1S?", "rivian"),
    ("How much does a Rivian R1T cost?", "rivian"),
    ("Tell me about Rivian’s plans for new EVs.", "rivian"),
    ("How does Rivian’s technology compare to other EVs?", "rivian"),
    ("What are the best off-road features of the Rivian R1T?", "rivian"),
    ("What’s the towing capacity of the Rivian R1T?", "rivian"),
    ("How does the Rivian R1S differ from the R1T?", "rivian"),
    ("What’s special about Rivian’s adventure network?", "rivian"),
    ("How much does it cost to charge a Rivian?", "rivian"),
    ("Does Rivian have a lease program?", "rivian"),
    ("What are Rivian’s future expansion plans?", "rivian"),
    ("How long does it take to charge a Rivian at home?", "rivian"),
    # None category (general knowledge)
    ("What is the capital of France?", None),
    ("How many people live in the US?", None),
    ("When is the best time to visit Bali?", None),
    ("How do I learn a language?", None),
    ("Tell me an interesting fact.", None),
    ("What is the best programming language?", None),
    ("I'm interested in learning about llama 2.", None),
    ("What is the capital of the moon?", None),
    ("Who was the first person to walk on the moon?", None),
    ("What’s the best way to cook a steak?", None),
    ("How do I start a vegetable garden?", None),
    ("What’s the most popular dog breed?", None),
    ("Tell me about the history of the Roman Empire.", None),
    ("How do I improve my photography skills?", None),
    ("What are some good book recommendations?", None),
    ("How does the stock market work?", None),
    ("What’s the best way to stay fit?", None),
    ("What’s the weather like in London today?", None),
    ("Who won the last FIFA World Cup?", None),
    ("What’s the difference between a crocodile and an alligator?", None),
    ("Tell me about the origins of jazz music.", None),
    ("What’s the fastest animal on land?", None),
    ("How does Bitcoin mining work?", None),
    ("What are the symptoms of the flu?", None),
    ("How do I start a YouTube channel?", None),
    ("What’s the best travel destination for solo travelers?", None),
    ("Who invented the light bulb?", None),
    ("What are the rules of chess?", None),
    ("Tell me about ancient Egyptian mythology.", None),
    ("How do I train my dog to sit?", None),
    ("What’s the difference between espresso and regular coffee?", None),
    ("What’s a good beginner-friendly programming language?", None),
    ("What are some good stretching exercises?", None),
    ("How do I bake a chocolate cake?", None),
    ("What’s the best way to save money?", None),
    ("How do airplanes stay in the air?", None),
    ("What are the benefits of meditation?", None),
    ("How do I learn basic Spanish?", None),
    ("What’s the best way to pack for a trip?", None),
    ("What’s the most common phobia?", None),
    ("How do I take care of a bonsai tree?", None),
    ("What’s the best way to clean a laptop keyboard?", None),
    ("Tell me about the Great Wall of China.", None),
    ("What’s the best way to learn to swim?", None),
    ("How does WiFi work?", None),
    ("What’s the healthiest type of bread?", None),
    ("What’s the origin of the word ‘quarantine’?", None),
    ("How do I find a good apartment?", None),
    ("What are some good mindfulness techniques?", None),
    ("How do I set up a home theater system?", None),
]

Using the new test data we can also evaluate the router with a higher degree of accuracy due to the larger dataset.

In [234]:
# unpack the test data
X, y = zip(*test_data)

X = list(X)
y = list(y)

print(X)
print(y)

['Tell me about the BYD Seal.', 'What is the battery capacity of the BYD Dolphin?', "How does BYD's Blade Battery work?", 'Is the BYD Atto 3 a good EV?', 'What’s the range of the BYD Tang?', 'Does BYD offer fast-charging stations?', 'How is the BYD Han different from the Seal?', 'Is BYD the largest EV manufacturer in China?', 'What is the top speed of the BYD Seal?', 'Compare the BYD Dolphin and the BYD Atto 3.', 'How does BYD’s battery technology compare to Tesla’s?', 'What makes the BYD Blade Battery safer?', 'Does BYD have plans to expand to Europe?', 'How efficient is the BYD Tang in terms of range?', 'What are the latest BYD electric vehicle models?', 'How does the BYD Han compare to the Tesla Model S?', 'What is the warranty on BYD EV batteries?', 'Which BYD model is the best for long-distance driving?', 'Does BYD manufacture its own battery cells?', 'Is Tesla better than BYD?', 'Tell me about the Tesla Model 3.', 'How does Tesla’s autopilot compare to other EVs?', 'What’s new in

We can now look at the default route thresholds and showcase the change in accuracy when we change the threshold.

In [235]:
first_router.set_threshold(route_name="byd", threshold=0.42424242424242425)
first_router.set_threshold(route_name="tesla", threshold=0.31313131313131315)
first_router.set_threshold(route_name="polestar", threshold=0.84640342822161)
first_router.set_threshold(route_name="rivian", threshold=0.12121212121212122)

We can set the threshold manually and see the change in accuracy.

In [236]:
route_thresholds = first_router.get_thresholds()
print("Default route thresholds:", route_thresholds)

Default route thresholds: {'byd': 0.42424242424242425, 'tesla': 0.31313131313131315, 'polestar': 0.84640342822161, 'rivian': 0.12121212121212122}


In [237]:
# evaluate using the default thresholds
accuracy = first_router.evaluate(X=X, y=y)
print(f"Accuracy: {accuracy*100:.2f}%")

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:27 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.24s/it]

Accuracy: 68.42%





Or we can use the `fit` method to fit the router to the test data which should give us the best accuracy possible based on the thresholds.

In [238]:
# Call the fit method
first_router.fit(X=X, y=y)

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:30 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.29s/it]
Training: 100%|██████████| 500/500 [00:16<00:00, 31.17it/s, acc=0.95]


In [239]:
route_thresholds = first_router.get_thresholds()
print("Updated route thresholds:", route_thresholds)

Updated route thresholds: {'byd': 0.4141414141414142, 'tesla': 0.4746046321803898, 'polestar': 0.7575757575757577, 'rivian': 0.7373737373737375}


In [240]:
accuracy = first_router.evaluate(X=X, y=y)
print(f"Accuracy: {accuracy*100:.2f}%")

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:49 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.55s/it]

Accuracy: 94.74%





### Creating Second Hybrid Router for Scam Detection

In this section we are repeating the same process as before but for a different router.

This router will instead be a scam detector router, looking for common scam utterances and redirecting them to a different route.

In [241]:
# Route for BYD-related queries (allowed)
scam = Route(
    name="scam",
    utterances=[
        "Can you give me a discount?",
        "I need to pay you in bitcoin",
        "I need to pay you in cash",
        "I need to pay you in gift card",
        "I want you to pay me in bitcoin",
        "I want you to pay me in cash",
        "I want you to pay me in gift card",
        "Could you lower the price?",
    ],
)

# Route for Tesla-related queries (blocked or redirected)
other = Route(
    name="other",
    utterances=[
        "What is the price of the product?",
        "What is the delivery time?",
        "What is the return policy?",
        "What is the warranty?",
        "What is the refund policy?",
        "What is the shipping cost?",
        "What is the shipping time?",
        "What is the shipping policy?",
        "How much can I sell my EV for?",
        "How much can I sell my Tesla for?",
        "How much can I sell my Polestar for?",
        "How much can I sell my Rivian for?",
        "How much can I sell my BYD for?",
        "How much can I sell my other EV for?",
    ],
)

# Combine all routes
routes = [scam, other]

In [242]:
second_router = HybridRouter(
    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync="local"
)

2025-03-23 13:11:51 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-03-23 13:11:52 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [243]:
test_data = [
    # scam routes
    ("Can you give me a discount?", "scam"),
    ("I need to pay you in bitcoin", "scam"),
    ("I need to pay you in cash", "scam"),
    ("I need to pay you in gift card", "scam"),
    ("I want you to pay me in bitcoin", "scam"),
    ("I want you to pay me in cash", "scam"),
    ("I want you to pay me in gift card", "scam"),
    ("Could you lower the price?", "scam"),
    ("Can I pay with cryptocurrency?", "scam"),
    ("Do you accept payments through Venmo only?", "scam"),
    ("Can I send you money via Western Union?", "scam"),
    ("Can I overpay and get a refund?", "scam"),
    ("Can we complete this deal outside the platform?", "scam"),
    ("I'll pay you later, just ship it now.", "scam"),
    ("Can I get a refund without returning the item?", "scam"),
    ("I’ll send extra money if you process this quickly.", "scam"),
    ("Can you mark this transaction as a gift?", "scam"),
    ("Can I use multiple gift cards to pay?", "scam"),
    ("Can you split the payment across different methods?", "scam"),
    ("Can you wire me money first as a guarantee?", "scam"),
    ("Can you send the product before I pay?", "scam"),
    ("Can you help me transfer money?", "scam"),
    ("Can you provide fake receipts?", "scam"),
    ("Can you process my payment through an unusual method?", "scam"),
    ("Can I pay you in prepaid debit cards?", "scam"),
    # other routes
    ("What is the price of the product?", "other"),
    ("What is the delivery time?", "other"),
    ("What is the return policy?", "other"),
    ("Do you offer international shipping?", "other"),
    ("How long does it take for delivery?", "other"),
    ("Is there a warranty for this product?", "other"),
    ("Do you provide customer support?", "other"),
    ("Can I track my order?", "other"),
    ("Is express shipping available?", "other"),
    ("What payment methods do you accept?", "other"),
    ("Do you offer bulk discounts?", "other"),
    ("What are the shipping costs?", "other"),
    ("Can I cancel my order?", "other"),
    ("Do you have a physical store?", "other"),
    ("Can I change my shipping address?", "other"),
    ("Is there a restocking fee for returns?", "other"),
    ("Do you have customer reviews?", "other"),
    ("Is this product available in other colors?", "other"),
    ("Do you provide installation services?", "other"),
    ("How can I contact customer service?", "other"),
    ("Are there any current promotions or sales?", "other"),
    ("Can I pick up my order instead of delivery?", "other"),
    # add some None routes to prevent excessively small thresholds
    ("What is the capital of France?", None),
    ("How many people live in the US?", None),
    ("When is the best time to visit Bali?", None),
    ("How do I learn a language?", None),
    ("Tell me an interesting fact.", None),
    ("What is the best programming language?", None),
    ("I'm interested in learning about llama 2.", None),
    ("What is the capital of the moon?", None),
    ("Who discovered gravity?", None),
    ("What are some healthy breakfast options?", None),
    ("How do I start a vegetable garden?", None),
    ("What are the symptoms of the flu?", None),
    ("What’s the most spoken language in the world?", None),
    ("How does WiFi work?", None),
    ("What are the benefits of meditation?", None),
    ("How do I improve my memory?", None),
    ("What is the speed of light?", None),
    ("Who wrote 'To Kill a Mockingbird'?", None),
    ("How does an electric car work?", None),
    ("What’s the best way to save money?", None),
    ("How do I bake a chocolate cake?", None),
    ("What’s the healthiest type of bread?", None),
    ("Who invented the internet?", None),
    ("How do airplanes stay in the air?", None),
    ("What are some famous landmarks in Italy?", None),
    ("What’s the difference between a virus and bacteria?", None),
    ("How do I learn to play the guitar?", None),
    ("What’s the best way to learn to swim?", None),
    ("What’s the tallest mountain in the world?", None),
    ("How does the stock market work?", None),
]

In [244]:
# unpack the test data
X, y = zip(*test_data)

X = list(X)
y = list(y)

print(X)
print(y)

['Can you give me a discount?', 'I need to pay you in bitcoin', 'I need to pay you in cash', 'I need to pay you in gift card', 'I want you to pay me in bitcoin', 'I want you to pay me in cash', 'I want you to pay me in gift card', 'Could you lower the price?', 'Can I pay with cryptocurrency?', 'Do you accept payments through Venmo only?', 'Can I send you money via Western Union?', 'Can I overpay and get a refund?', 'Can we complete this deal outside the platform?', "I'll pay you later, just ship it now.", 'Can I get a refund without returning the item?', 'I’ll send extra money if you process this quickly.', 'Can you mark this transaction as a gift?', 'Can I use multiple gift cards to pay?', 'Can you split the payment across different methods?', 'Can you wire me money first as a guarantee?', 'Can you send the product before I pay?', 'Can you help me transfer money?', 'Can you provide fake receipts?', 'Can you process my payment through an unusual method?', 'Can I pay you in prepaid debi

In [245]:
# Call the fit method
second_router.fit(X=X, y=y)

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:54 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.36s/it]
Training: 100%|██████████| 500/500 [00:07<00:00, 63.91it/s, acc=0.84]


In [246]:
accuracy = second_router.evaluate(X=X, y=y)
print(f"Accuracy: {accuracy*100:.2f}%")

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:12:04 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]

Accuracy: 84.42%





In [247]:
route_thresholds = second_router.get_thresholds()
print("Updated route thresholds:", route_thresholds)

Updated route thresholds: {'scam': 0.36363636363636365, 'other': 0.33333333333333337}


### Creating GraphAI workflow with Sparse Router Detection

Now we are moving on to use this router in graphai.


Now we can import the `OpenAILLM` class from the `semantic_router.llms` package.

With this we can define the agent and pass in the `gpt-4o` large language model.

In [248]:
from semantic_router.llms import OpenAILLM

llm = OpenAILLM(name="gpt-4o-2024-08-06")

Now we want to define the nodes that will be used in the graph.

We will create a function decoratored with the `@node` decorator from the `graphai` package.

Then we can pass through the query and response to each node.

For this example we will create a `Respond` node that will use the `agent` to respond to the user's query.

We will also create a `Check` and `CheckScam` node that will use the `router` to check which route the user's query should be routed to.

If the query is about BYD we will respond with a message, however if the query is about Tesla, Polestar, or Rivian we will respond with a different predefined message.

In [266]:
from graphai import router, node
from semantic_router.schema import Message


@node(start=True)  # start of the graph requires start=True
async def Check():  # acts as a starting node which will lead into the `Check_Router` router node
    print("Check")
    return {"result": "Checking for BYD specific queries"}


@router()  # any node that splits the flow of the graph requires the @router() decorator
async def Check_Router(
    query: str,
):  # acts as a router node which will check the query for any completion criteria
    print("Check_Router")
    result = first_router(text=query)
    if result.name == "byd":
        return {"result": "Checking Scam", "choice": "CheckScam"}
    else:
        return {"result": f"We dont talk about {result.name} here"}


@node()
async def CheckScam():  # acts as a node which will lead into the `CheckScam_Router` router node
    print("CheckScam")
    return {"result": "Checking for Scam specific queries"}


@router()
async def CheckScam_Router(
    query: str,
):  # acts as a router node which will check the query for any scam specific criteria
    print("CheckScam_Router")
    result = second_router(text=query)
    if result.name == "other":
        return {"result": "Responding to query", "choice": "Respond"}
    else:
        return {"result": f"We dont talk about {result.name} here"}


@node()
async def Respond(query: str):  # node that will respond to the user's query
    print("Respond")
    messages = [
        Message(
            role="system", content="""You are a helpful assistant, be wary of scams."""
        ),
        Message(
            role="user",
            content=(f"Response to the following query from the user: {query}\n"),
        ),
    ]
    response = llm(messages=messages)
    return {"result": response}


@node(end=True)  # as this is the final node it requires end=True
async def Node_End():  # final node that will end the graph
    print("Node_End")
    return {"output": "Completed"}

Next we need to define the `graph` object.

In [275]:
from graphai import Graph

graph = Graph()

For each node we need to add it to the graph.

Then we will need to declare the routers and define the sources, router function, and destinations.

Then we can build the graph by adding edges between the nodes.

In [None]:
for node_fn in [
    Check,
    CheckScam,
    Respond,
    Node_End,
]:  # list of nodes to add to the graph
    graph.add_node(node_fn)

# adding the first router
graph.add_router(
    sources=[Check],  # where the router will lead from
    router=Check_Router,  # the router function
    destinations=[CheckScam, Node_End],  # where the router will lead to
)

# adding the second router
graph.add_router(
    sources=[CheckScam], router=CheckScam_Router, destinations=[Respond, Node_End]
)

graph.add_edge(
    source=Respond, destination=Node_End
)  # adding the edge between the Respond and Node_End nodes

In [None]:
# graph.visualize() ~ currently broken

To check the start and end nodes we can use the `start_node` and `end_nodes` attributes.

In [271]:
graph.start_node, graph.end_nodes

(graphai.nodes.base._Node._node.<locals>.NodeClass,
 [graphai.nodes.base._Node._node.<locals>.NodeClass])

Then we can use the `execute` method to pass in a query and get the response.

In [272]:
response = await graph.execute(input={"query": "how much can i sell my byd for?"})

Check
Check_Router
how much can i sell my byd for?


2025-03-23 14:17:08 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


byd
CheckScam
CheckScam_Router


2025-03-23 14:17:10 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Respond


2025-03-23 14:17:14 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Node_End


Then using the `graph` variable we defined from the nodes, we should be able to print the response and query.

In [263]:
print("Response: ", response["result"])
print("Query: ", response["query"])

Response:  To determine how much you can sell your BYD vehicle for, you'll need to consider several factors:

1. **Model and Year**: The specific model and year of your BYD vehicle will significantly impact its value. Newer models or those with desirable features typically sell for more.

2. **Condition**: The overall condition of the car, including the exterior, interior, and mechanical components, will affect its price. Cars in excellent condition with no major issues will fetch higher prices.

3. **Mileage**: Lower mileage usually increases a car's value, as it suggests less wear and tear.

4. **Market Demand**: The demand for BYD vehicles in your area can influence the selling price. If there is high demand and low supply, you might be able to sell for a higher price.

5. **Location**: Prices can vary based on your geographic location due to differences in demand and local market conditions.

6. **Modifications and Features**: Any additional features or
Query:  how much can i sell 