In [None]:
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.schema import Document
import os

# LLM Layer (RAG Chatbot)

This notebook sets up the final **LLM layer** for our Retrieval-Augmented Generation (RAG) chatbot. 

✅ It connects to a local Chroma DB containing vector embeddings of the documents.  
✅ Uses a lightweight open-source model (**GROQ**) for natural language generation.  
✅ Combines both for a **conversational chatbot** that:
- Retrieves relevant document chunks from Chroma DB.
- Generates contextual answers with LLM.
- Maintains conversational memory for a natural chat experience.

This is the last step of our RAG pipeline, enabling us to answer user questions interactively and conversationally.


In [None]:
# Initialize Chroma retriever
persist_directory = "chroma_db"
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

chroma_db = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding_model
)
retriever = chroma_db.as_retriever()

In [None]:
# Groq API function
import requests

GROQ_API_KEY = os.getenv("GROQ_API_KEY")

def get_groq_completion(prompt: str) -> str:
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {GROQ_API_KEY}"
    }
    data = {
        "model": "deepseek-r1-distill-llama-70b",
        "messages": [
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant. Provide clear, concise, and professional answers. The other thing to consider is potential buyers might interact with you so you should subtly sell the product without being too pushy. "
                )
            },
            {
                "role": "user",
                "content": prompt
            }
        ]
    }
    response = requests.post(
        "https://api.groq.com/openai/v1/chat/completions",
        headers=headers,
        json=data
    )
    response.raise_for_status()
    result = response.json()["choices"][0]["message"]["content"]


    if "<think>" in result and "</think>" in result:
        result = result.split("</think>")[-1].strip()

    return result.strip()

In [None]:
# Final context-aware QA function
def answer_with_context(question: str, top_k: int = 10):
    top_chunks: list[Document] = retriever.get_relevant_documents(question, top_k=top_k)
    combined_context = "\n\n".join(chunk.page_content for chunk in top_chunks)

    prompt = (
        f"Context:\n{combined_context}\n\n"
        f"Question: {question}\n\n"
        f"Answer concisely and professionally."
    )
    answer = get_groq_completion(prompt)
    return answer


In [11]:
question = "How do I integrate the Rockfish API in my existing stack? give me clear and concise code examples. Imagine that I have the API key"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)


Answer:
 Certainly! Integrating the Rockfish API into your existing stack is straightforward. Here’s a concise guide to get you started:

### Step 1: Install the Rockfish SDK
Install the Rockfish SDK using your preferred package manager:

```bash
# For Python (using pip)
pip install rockfish-sdk

# For JavaScript (using npm)
npm install @rockfish/sdk

# For Java (using Maven)
<dependency>
    <groupId>com.rockfish</groupId>
    <artifactId>rockfish-sdk</artifactId>
    <version>1.0.0</version>
</dependency>
```

### Step 2: Initialize the Rockfish Client
Use your API key to initialize the client:

```python
from rockfish import RockfishClient

# Initialize the client with your API key
client = RockfishClient(api_key="your_api_key_here")
```

### Step 3: Create a Configuration File
Create a Rockfish config file (e.g., `rockfish_config.json`) with your API credentials and settings:

```json
{
    "profiles": {
        "default": {
            "api_key": "your_api_key_here",
            

In [None]:
question = "What types of data can it create? Can it create data for a specific industry?"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)

In [12]:
question = "what is the research behind the product? What is the technology behind the product? who are the main researchers behind the product?"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)


Answer:
 **Research Behind the Product:**  
The research focuses on synthetic data generation, data adaptability, recommendation engines, and data privacy, addressing challenges in sparse data environments and fraud detection across industries like e-commerce, healthtech, and life sciences.

**Technology Behind the Product:**  
Built on proprietary technology from Carnegie Mellon University, the platform utilizes advanced machine learning algorithms, neural architectures, and specialized data encoding techniques to adapt to various data schemas and types.

**Main Researchers:**  
While specific names aren't provided, the team includes leading researchers and engineers from Carnegie Mellon University, known for their expertise in AI and data science, with multiple patents filed reflecting their innovative contributions.


In [None]:
question = "can you cite some technicalities of the product from a research standpoint? What are the main technical features of the product?"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)

In [14]:
question = "what do you customers say about the product? Give concrete examples and cite the sources. How do I know whether the product creates useful data?"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)


Answer:
 Our product has received positive feedback from customers across various industries, who appreciate its ability to handle real-time operational data. Here’s a structured overview of customer sentiments and how to assess the product's effectiveness:

1. **E-commerce Industry**: Customers value the product for its real-time alerts when a SKU is out of stock or when a new product launches, enabling quick restocking and maintaining customer satisfaction.

2. **Healthtech and Life Sciences**: In these fields, where data is often scarce, customers commend the product's ability to manage sparse data, facilitating informed decision-making despite limited information.

3. **Fraud Detection**: Businesses in finance and e-commerce appreciate the product's enhanced fraud detection capabilities, which help reduce fraudulent transactions and build customer trust.

To determine if the product creates useful data, consider the following outcomes:
- **Reduced Stockouts**: The product's alerts

In [13]:
question = "I am on the fence about buying the product. Can you convince me to buy it? Why rockfish?"
answer = answer_with_context(question)
print("\nAnswer:\n", answer)


Answer:
 Thank you for considering Rockfish Solution. We understand that choosing the right product is a significant decision. Rockfish stands out as a unique and differentiated solution in the synthetic data space, backed by a team of experts dedicated to innovation and support. Our strong partner ecosystem and collaborative approach ensure that we can address your specific needs effectively. If you're ready to explore how Rockfish can benefit your business, we're here to guide you through the next steps. Let's discuss how we can support your goals together.
