# Building a Semantic Chatbot with SBERT and Together.ai API

This notebook implements a hybrid chatbot that combines semantic search using SBERT (Sentence-BERT) and response generation using a large language model (LLM) via the Together.ai API. The system retrieves the most relevant example interactions and uses them as context for producing natural, human-like responses.


In [23]:
import os
import json
import random
import torch
import requests
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer, util

## Load Together.ai API Key

We use the `python-dotenv` package to securely load the API key for Together.ai from a `.env` file.
This key is required to authenticate requests to the LLM generation endpoint.

If the key is not found, the script raises an error to prevent unauthorized access.


In [24]:
load_dotenv()
TOGETHER_API_KEY = os.getenv("TOGETHER_API_KEY")

if TOGETHER_API_KEY is None:
    raise ValueError("❌ API key not loaded. Check your .env file name or path.")

## SemanticChatbot Class

This class defines the main logic of our semantic chatbot system. It combines two powerful components:

- **SBERT (Sentence-BERT)** for encoding user queries and pre-defined patterns into dense semantic vectors.
- **Together.ai** (via API) for generating natural, context-aware responses based on the most relevant examples.

### Main Functionalities:
- **Data Loading**: Parses a JSON file containing user patterns and response sets.
- **Embedding Generation**: Encodes all patterns into SBERT embeddings and caches them for faster reuse.
- **Semantic Retrieval**: Retrieves the top-k semantically similar patterns to the user input.
- **Prompt Construction**: Formats the retrieved examples as a few-shot prompt.
- **Response Generation**: Sends the prompt to Together.ai's LLM to synthesize a coherent, fluent response.

This hybrid approach gives our chatbot the retrieval quality of semantic search with the natural language fluency of large language models.


In [25]:
class SemanticChatbot:
    """
    A semantic chatbot using SBERT for retrieval and Together.ai for generation.

    Attributes:
        intents_path (str): Path to the JSON file containing intent data.
        model_name (str): SentenceTransformer model name for embeddings.
        embeddings_path (str): Path to store or load precomputed embeddings.
        together_model_name (str): Name of the Together.ai model to use for generation.
    """

    def __init__(
        self,
        intents_path,
        model_name='all-MiniLM-L6-v2',
        embeddings_path="embeddings.pt",
        together_model_name="mistralai/Mixtral-8x7B-Instruct-v0.1"
    ):
        self.intents_path = intents_path
        self.embeddings_path = embeddings_path
        self.together_model_name = together_model_name
        self.model = SentenceTransformer(model_name)

        self.patterns = []
        self.responses = []
        self.embeddings = None

        self._load_data()
        self._load_or_build_embeddings()

    def _load_data(self):
        """
        Loads the intent data from the specified JSON file and extracts
        the patterns and associated responses.
        """
        with open(self.intents_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        self.patterns = [item["pattern"] for item in data["intents"]]
        self.responses = [item["responses"] for item in data["intents"]]

    def _load_or_build_embeddings(self):
        """
        Loads precomputed embeddings from disk or builds them from patterns
        using a SentenceTransformer model if not found.
        """
        if os.path.exists(self.embeddings_path):
            print(f"Loading embeddings from {self.embeddings_path}...")
            self.embeddings = torch.load(self.embeddings_path)
        else:
            print("Generating embeddings using SBERT...")
            self.embeddings = self.model.encode(self.patterns, convert_to_tensor=True)
            torch.save(self.embeddings, self.embeddings_path)
            print(f"Embeddings saved to {self.embeddings_path}")

    def build_prompt(self, user_input, top_k_hits):
        """
        Constructs a prompt to send to the language model using the top-k most
        relevant pattern-response examples.

        Args:
            user_input (str): The message from the user.
            top_k_hits (List[Dict]): List of hits from semantic search.

        Returns:
            str: Prompt formatted for generation.
        """
        examples = ""
        for i, hit in enumerate(top_k_hits[:3]):
            idx = hit["corpus_id"]
            question = self.patterns[idx]
            answer = random.choice(self.responses[idx])
            examples += f"User: {question}\nBot: {answer}\n\n"

        prompt = (
            "You are a helpful assistant. Below are a few example conversations.\n"
            "Use them to respond to the final user input in a concise and natural way.\n\n"
            f"{examples}"
            f"User: {user_input}\nBot:"
        )
        return prompt

    def generate_with_together(self, prompt, max_tokens=150):
        """
        Generates a response using Together.ai API given a prompt.

        Args:
            prompt (str): Prompt text including context and user question.
            max_tokens (int): Maximum number of tokens to generate.

        Returns:
            str: Generated response.
        """
        headers = {
            "Authorization": f"Bearer {TOGETHER_API_KEY}",
            "Content-Type": "application/json"
        }

        data = {
            "model": self.together_model_name,
            "prompt": prompt,
            "max_tokens": max_tokens,
            "temperature": 0.7,
            "top_p": 0.9,
            "stop": ["User:", "Bot:"]
        }

        response = requests.post("https://api.together.xyz/v1/completions", headers=headers, json=data)

        if response.status_code != 200:
            print("Error from Together API:", response.text)
            return "Sorry, I had a problem generating a response."

        return response.json()["choices"][0]["text"].strip()

    def get_response(self, user_input, top_k=5):
        """
        Generates a semantic chatbot response based on the top-k closest
        examples from the training patterns.

        Args:
            user_input (str): Message from the user.
            top_k (int): Number of similar examples to retrieve.

        Returns:
            str: Response from the chatbot.
        """
        query_embedding = self.model.encode(user_input, convert_to_tensor=True)
        hits = util.semantic_search(query_embedding, self.embeddings, top_k=top_k)[0]

        if not hits:
            return "I'm not sure how to respond."

        prompt = self.build_prompt(user_input, hits)
        return self.generate_with_together(prompt)


In [26]:
if __name__ == '__main__':
    # Initialize the semantic chatbot with the dataset and precomputed embeddings
    print("🤖 Loading the semantic chatbot using Together.ai...")

    bot = SemanticChatbot(
        intents_path="C:/Users/jihad/Desktop/ChatBot-from-Scratch/data_dailydialog/dialogues_final_format.json",
        embeddings_path="embeddings_final.pt"  # This path should match where embeddings are saved
    )

    print("\n💬 Chatbot is ready! Type a message below (or /quit to exit)\n")

    conversation_log = []

    while True:
        message = input("🗣️ You: ")
        if message.strip().lower() == "/quit":
            print("👋 Goodbye!")
            break

        # Generate response
        response = bot.get_response(message)

        # Save and display the full exchange
        conversation_log.append((message, response))
        print(f"\n📨 You said: {message}")
        print(f"🤖 Bot replied: {response}\n")

🤖 Loading the semantic chatbot using Together.ai...
Loading embeddings from embeddings_final.pt...

💬 Chatbot is ready! Type a message below (or /quit to exit)


📨 You said: Hello, how are you ? 
🤖 Bot replied: I'm doing well, thank you. I'm just here to help answer your questions. How can I assist you today?


📨 You said: How is the weather today ? 
🤖 Bot replied: It's sunny and a bit windy .


📨 You said: 
🤖 Bot replied: Thank you for your understanding .


📨 You said: What do you think of the political and economical state of the world right now ? 
🤖 Bot replied: I believe that the current political climate is causing a lot of uncertainty and unpredictability in the global economy . It is very important for world leaders to work together to address these challenges and find solutions .


📨 You said: Thank you, that's all for me.
🤖 Bot replied: Okay, have a great day!

👋 Goodbye!
