
# Assembling a RAG Application

In this , we will assemble a Retrieval-augmented Generation (RAG) application using the components we previously created. The primary goal is to create a seamless pipeline where users can ask questions, and our system retrieves relevant documents from a Vector Search index to generate informative responses.

In [0]:
%pip install -U -qq databricks-vectorsearch langchain==0.3.7 flashrank langchain-databricks PyPDF2
dbutils.library.restartPython()

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
petastorm 0.12.1 requires pyspark>=2.1.0, which is not installed.
jupyter-server 1.23.4 requires anyio<4,>=3.1.0, but you have anyio 4.9.0 which is incompatible.
langchain-community 0.0.38 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 0.3.63 which is incompatible.
numba 0.57.1 requires numpy<1.25,>=1.21, but you have numpy 1.26.4 which is incompatible.
ydata-profiling 4.5.1 requires numpy<1.24,>=1.16.0, but you have numpy 1.26.4 which is incompatible.
ydata-profiling 4.5.1 requires pydantic<2,>=1.8.1, but you have pydantic 2.11.7 which is incompatible.[0m[31m
[0m[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/Classroom-Setup-04

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m



The examples and models presented in this course are intended solely for demonstration and educational purposes.
 Please note that the models and prompt examples may sometimes contain offensive, inaccurate, biased, or harmful content.


In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location:  {DA.paths.datasets}")

Username:          labuser10914379_1753166678@vocareum.com
Catalog Name:      dbacademy
Schema Name:       labuser10914379_1753166678
Working Directory: /Volumes/dbacademy/ops/labuser10914379_1753166678@vocareum_com
Dataset Location:  NestedNamespace (arxiv='/Volumes/dbacademy_arxiv/v01', dais='/Volumes/dbacademy_dais/v01', news='/Volumes/dbacademy_news/v01', docs='/Volumes/dbacademy_docs/v01')


## Step 1: Setup the Retriever Component
**Steps:**
1. Define the embedding model.
1. Get the vector search index that was created in the previous lab.
1. Generate a **retriever** from the vector store. The retriever should return **three results.**
1. Write a test prompt and show the returned search results.


In [0]:
## Components we created before

vs_endpoint_name="vs_endpoint_4"
vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_managed_vs_index"

In [0]:
from databricks.vector_search.client import VectorSearchClient
from langchain_databricks import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest

def get_retriever(cache_dir=f"{DA.paths.working_dir}/opt"):

    def retrieve(query, k: int=10):
        if isinstance(query, dict):
            query = next(iter(query.values()))

        ## get the vector search index
        vsc = VectorSearchClient(disable_notice=True)
        vs_index = vsc.get_index(endpoint_name=vs_endpoint_name, index_name=vs_index_fullname)
        
        ## get similar k documents
        return query, vs_index.similarity_search(
            query_text=query,
            columns=["pdf_name", "content"],
            num_results=k)

    def rerank(query, retrieved, cache_dir, k: int=2):
        ## format result to align with reranker lib format 
        passages = []
        for doc in retrieved.get("result", {}).get("data_array", []):
            new_doc = {"file": doc[0], "text": doc[1]}
            passages.append(new_doc)       
        ## Load the flashrank ranker
        ranker = Ranker(model_name="rank-T5-flan", cache_dir=cache_dir)

        ## rerank the retrieved documents
        rerankrequest = RerankRequest(query=query, passages=passages)
        results = ranker.rerank(rerankrequest)[:k]

        ## format the results of rerank to be ready for prompt
        return [Document(page_content=r.get("text"), metadata={"source": r.get("file")}) for r in results]

    ## the retriever is a runnable sequence of retrieving and reranking.
    return RunnableLambda(retrieve) | RunnableLambda(lambda x: rerank(x[0], x[1], cache_dir))

## test our retriever
question = {"input": "How does Generative AI impact humans?"}
vectorstore = get_retriever()
similar_documents = vectorstore.invoke(question)
print(f"Relevant documents: {similar_documents}")

Relevant documents: [Document(metadata={'source': 'dbfs:/Volumes/dbacademy_arxiv/v01/arxiv-articles/2303.10130.pdf'}, page_content='ical growth, permeating tasks, greatly impacting professions. This study probes GPTs’ potential trajectories,\npresenting a groundbreaking rubric to gauge tasks’ GPT exposure, particularly in the U.S. labor market.\n7.2 LLM Conclusion (Author-Augmented Version)\nGenerative Pre-trained Transformers (GPTs) generate profound transformations, garnering potential techno-\nlogical growth, permeating tasks, gutting professional management. Gauging possible trajectories? Generate'), Document(metadata={'source': 'dbfs:/Volumes/dbacademy_arxiv/v01/arxiv-articles/2203.02155.pdf'}, page_content='that meaningfully represents the people impacted by the technology, and that synthesizes peoples’\nvalues in a way that achieves broad consensus amongst many groups. We discuss some related\nconsiderations in Section 5.2.\n5.5 Broader impacts\nThis work is motivated by our aim

Trace(request_id=tr-05d8f039795348c2b8e0235ee2e03355)

## Step 2: Setup the Foundation Model
**Steps:**
1. Define the foundation model for generating responses. Use `llama-3.3` as foundation model. 
2. Test the foundation model to ensure it provides accurate responses.

In [0]:
## import necessary libraries
from langchain_databricks import ChatDatabricks

## define foundation model for generating responses
chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens = 300)

## test foundation model
print(f"Test chat model: {chat_model.invoke('What is Generative AI?')}")

  chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens = 300)


Test chat model: content='Generative AI refers to a type of artificial intelligence that is capable of generating new, original content, such as images, videos, music, text, or even entire datasets. This is in contrast to traditional AI, which is typically focused on analyzing and processing existing data.\n\nGenerative AI uses complex algorithms and machine learning techniques, such as deep learning and neural networks, to learn patterns and structures in data and then generate new data that is similar in style, structure, and content. The goal of generative AI is to create new, synthetic data that is indistinguishable from real data.\n\nSome common applications of generative AI include:\n\n1. **Image and video generation**: Generative AI can be used to generate realistic images and videos, such as faces, objects, and scenes.\n2. **Text generation**: Generative AI can be used to generate text, such as articles, stories, and dialogue.\n3. **Music generation**: Generative AI can be used

Trace(request_id=tr-0bc74601ec0d47cca817cfde0eaccf56)

##Step 3: Assemble the Complete RAG Solution
**Steps:**
1. Merge the retriever and foundation model into a single Langchain chain.
2. Configure the Langchain chain with proper templates and context for generating responses.
3. Test the complete RAG solution with sample queries.

In [0]:
## import necessary libraries
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate

## define template for prompt
TEMPLATE = """You are an assistant for GENAI teaching class. You are answering questions related to Generative AI and how it impacts humans life. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer. Keep the answer as concise as possible.
Use the following pieces of context to answer the question at the end:
{context}
Question: {input}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "input"])

## unwrap the longchain document from the context to be a dict so we can register the signature in mlflow
def unwrap_document(answer):
  return answer | {"context": [{"metadata": r.metadata, "page_content": r.page_content} for r in answer['context']]}

## merge retriever and foundation model into Langchain chain
question_answer_chain = create_stuff_documents_chain(chat_model, prompt)
chain = create_retrieval_chain(get_retriever(), question_answer_chain)|RunnableLambda(unwrap_document)


## test the complete RAG solution with sample query
question = {"input": "How Generative AI impacts humans?"}
answer = chain.invoke(question)
print(answer)

{'input': 'How Generative AI impacts humans?', 'context': [{'metadata': {'source': 'dbfs:/Volumes/dbacademy_arxiv/v01/arxiv-articles/2303.10130.pdf'}, 'page_content': 'Goldfarb,A.,Taska,B.,andTeodoridis,F.(2023). Couldmachinelearningbeageneralpurposetechnology? a\ncomparison of emerging technologies using data from online job postings. Research Policy, 52(1):104653.\nGoldstein,J.A.,Sastry,G.,Musser,M.,DiResta,R.,Gentzel,M.,andSedova,K.(2023). Generativelanguage\nmodels and automated influence operations: Emerging threats and potential mitigations.\nGrace,K.,Salvatier,J.,Dafoe,A.,Zhang,B.,andEvans,O.(2018). Whenwillaiexceedhumanperformance?'}, {'metadata': {'source': 'dbfs:/Volumes/dbacademy_arxiv/v01/arxiv-articles/2303.10130.pdf'}, 'page_content': 'Dwivedi-Yu, J., Celikyilmaz, A., et al. (2023). Augmented language models: a survey. arXivpreprint\narXiv:2302.07842.\nMoll,B.,Rachel,L.,andRestrepo,P.(2021). Unevengrowth: Automation’simpactonincomeandwealth\ninequality. SSRNElectronic Jou

Trace(request_id=tr-f29792206c0147b89d41e71f5e006eab)

##Step 4: Save the Model to Model Registry in Unity Catalog
**Steps:**
1. Register the assembled RAG model in the Model Registry with Unity Catalog.
2. Ensure that all necessary dependencies and requirements are included.
3. Provide an input example and infer the signature for the model.

In [0]:
## import necessary libraries
from mlflow.models import infer_signature
import mlflow
import langchain

## set Model Registry URI to Unity Catalog
mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.abhi_rag_app"

## register the assembled RAG model in Model Registry with Unity Catalog
with mlflow.start_run(run_name="abhi_rag_app") as run:
    signature = infer_signature(question, answer)
    model_info = mlflow.langchain.log_model(
        chain,
        loader_fn=get_retriever,
        artifact_path="chain",
        registered_model_name=model_name,
        pip_requirements=[
            "mlflow==" + mlflow.__version__,
            "langchain==" + langchain.__version__,
            "databricks-vectorsearch",
        ],
        input_example=question,
        signature=signature
    )


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  from langchain_community.utilities.requests import TextRequestsWrapper
  warn(
* 'allow_population_by_field_name' has been renamed to 'validate_by_name'
* 'underscore_attrs_are_private' has been removed
2025/07/22 09:43:14 INFO mlflow: Attempting to auto-detect Databricks resource dependencies for the current langchain model. Dependency auto-detection is best-effort and may not capture all dependencies of your langchain model, resulting in authorization errors when serving or querying your model. We recommend that you explicitly pass `resources` to mlflow.langchain.log_model() to ensure authorization to dependent resources succeeds when the model is deployed.


Uploading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

Successfully registered model 'dbacademy.labuser10914379_1753166678.abhi_rag_app'.


Uploading artifacts:   0%|          | 0/36 [00:00<?, ?it/s]

Created version '1' of model 'dbacademy.labuser10914379_1753166678.abhi_rag_app'.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
