<div style="display: flex; align-items: center; gap: 40px;">

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSkez75fZoo82SccEXRMVRlj9sZsQifRUhURQ&s" width="200">



<div>
  <h2>Cerebras Inference </h2>
  <p>Cerebras Systems builds the world's largest computer chip - the Wafer Scale Engine (WSE) - designed specifically for AI workloads. This cookbook provides comprehensive examples, tutorials, and best practices for developing and deploying AI models using Cerebras infrastructure, including both training on WSE clusters and fast inference via Cerebras Cloud.

<div>
  <h2>RAG Fusion Using Cerebras</h2>
  <p>RAG-Fusion is an enhanced version of the traditional Retrieval-Augmented Generation (RAG) model. In RAG-Fusion, after receiving a query, the model first generates related sub-queries using a large language model. These sub-queries help find more relevant documents. Instead of simply sending the retrieved documents to the model, RAG-Fusion uses a technique called Reciprocal Rank Fusion (RRF) to score and reorder the documents based on their relevance. The best-ranked documents are then used to generate a more accurate response.</p>



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wRGD4teTh3LyJzZwzeGCgojjGPplLbKL?usp=sharing)


## Get Your API Keys

Before you begin, make sure you have:

1. A CEREBRAS API key (Get yours at [CEREBRAS API page](https://cloud.cerebras.ai/))
2. Basic familiarity with Python and Jupyter notebooks

This notebook is designed to run in Google Colab, so no local Python installation is required.

###🔧 1. Install Required Libraries

In [None]:
!pip install -qU langchain_openai langchain_community chromadb langsmith

###🔑 2. Set Environment Variables (API Keys)

In [None]:
import os
from google.colab import userdata

# Set the API key from Colab secrets
os.environ["CEREBRAS_API_KEY"] = userdata.get("CEREBRAS_API_KEY")
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY") # For emebedding

###🔹 Step 3: Load Documents and Split

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = CSVLoader("/content/sample_data/california_housing_test.csv")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

###🔹 Step 4: Setup Embeddings and Vector Store (ChromaDB)

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(documents, embeddings)

###🔹 Step 5: Create Retriever

In [None]:
retriever = vectorstore.as_retriever()

###🔹 Step 6: Initialize Sutra LLM and Langsmith Prompt Client

In [None]:
from langchain_openai import ChatOpenAI
from langsmith import Client
import os

client = Client()

llm = ChatOpenAI(
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1",
        model="gpt-oss-120b"
  )


###🔹 Step 7: Load RAG-Fusion Query Generation Prompt from Langsmith

In [None]:
from langchain_core.output_parsers import StrOutputParser

prompt = client.pull_prompt("langchain-ai/rag-fusion-query-generation")

generate_queries = (
    prompt
    | ChatOpenAI(temperature=0, api_key=os.getenv("CEREBRAS_API_KEY"), base_url="https://api.cerebras.ai/v1", model="gpt-oss-120b")
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)


###🔹 Step 8: Reciprocal Rank Fusion (RRF) Function

In [None]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    fused_scores = {}
    for docs in results:
        # docs sorted by relevance descending
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            fused_scores[doc_str] += 1 / (rank + k)
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]
    return reranked_results

###🔹 Step 9: Build RAG-Fusion Chain

In [None]:
chain = generate_queries | retriever.map() | reciprocal_rank_fusion

###🔹 Step 10: Test the Fusion Chain

In [None]:
query = "what are points on a mortgage"
results = chain.invoke(query)
print("RRF Results Top 3 Documents:")
for doc, score in results[:3]:
    print(f"Score: {score:.3f} - Doc excerpt: {doc.page_content[:200]}...\n")

###🔹 Step 11: Final RAG Answer Chain


In [None]:
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context.
If you don't find the answer in the context, just say that you don't know.

Context: {context}

Question: {question}
"""

prompt_rag = ChatPromptTemplate.from_template(template)

rag_fusion_chain = (
    {
        "context": chain,
        "question": RunnablePassthrough()
    }
    | prompt_rag
    | llm
    | StrOutputParser()
)

final_answer = rag_fusion_chain.invoke(query)
print("Final RAG-Fusion Answer:\n", final_answer)

###🔹 Step 12: Prepare Data for Evaluation

In [None]:
questions = ["How many total rooms are for housing median age of 27"]
responses = []
contexts = []
ground_truths = ["housing median age sometimes also called median house age"]

for q in questions:
    responses.append(rag_fusion_chain.invoke(q))
    contexts.append([doc.page_content for doc in retriever.invoke(q)])

data = {
    "query": questions,
    "response": responses,
    "context": contexts,
    "ground_truth": ground_truths
}


###🔹 Step 13: Create Dataset and DataFrame

In [None]:
from datasets import Dataset
import pandas as pd

dataset = Dataset.from_dict(data)
df = pd.DataFrame(dataset)
df

In [None]:
from IPython.display import display, Markdown, Latex

display(Markdown(df['response'][0])) # Viewing the response