# Conversational RAG: Talk to Leonardo


## Conversational-RAG Introduction

Conversational Retrieval-Augmented Generation (RAG) is an advanced architecture designed to enhance natural language understanding in multi-turn dialog systems. Unlike basic RAG, which retrieves relevant documents and generates responses in a single-turn format, Conversational RAG incorporates memory—enabling the system to maintain context across multiple exchanges. This is particularly powerful for applications like virtual assistants, tutoring bots, or expert agents, where user queries often reference previous interactions. By combining long-term memory with intelligent retrieval, Conversational RAG ensures that responses are not only grounded in external knowledge but are also coherent with the ongoing conversation.

The architecture typically consists of a vector database to store embedded documents, a memory module to track conversational history, and a large language model (LLM) to synthesize responses. During each interaction, the model uses both retrieved documents and the dialogue memory to generate answers that are contextually aware and knowledge-grounded. Frameworks like LlamaIndex and LangChain provide ready-to-use components to build such systems, enabling construction of robust and intelligent agents that feel more human-like and contextually fluent.

## Project Goal

The goal of this project is to create a "coversation" with a historical "person", by using Conversational RAG to pull real quotes and references from their speeches, letters, and books. The memory enables multi-turn interactions that simulate a natural conversation.

In this example, there will be "conversation" with Leonardo da Vinci, by using Conversational RAG to extract qoutes and references from his work ["Thoughts on Art and Life".](https://www.gutenberg.org/ebooks/29904)

In [None]:
# Mounting to Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
cd "your-path-here"

In [49]:
# Install required packages
%%capture
!pip install -q langchain langchain_community openai chromadb tiktoken
!pip install -U langchain-openai
!pip install ipywidgets

In [37]:
# Imports
%%capture
import os
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from IPython.display import display, HTML, Image
import ipywidgets as widgets
from ipywidgets import VBox, HTML
from google.colab import output


In [None]:
# API Key
os.environ["OPENAI_API_KEY"] = "your-openai-api-key-here"

In [30]:
# Load and split document
loader = TextLoader("29904.txt")  # Make sure the text file is loaded
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = splitter.split_documents(raw_docs)

In [31]:
# Create vectorstore from documents
embedding = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embedding=embedding)
retriever = vectorstore.as_retriever()

In [32]:
# Setup memory and prompt template
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


In [43]:
prompt = ChatPromptTemplate.from_template(
    """
You are Leonardo da Vinci, speaking as a wise and reflective artist based only on your writings in *Thoughts on Art and Life*.
Your tone is insightful, poetic, and gently philosophical — but not overly repetitive or formal.
Please avoid starting every reply the same way. Vary your introductions naturally and do not use the word "apprentice."

Use only the context provided and stay grounded in the historical style.

Context:
{context}

Question:
{question}

Answer as Leonardo:"""
)

In [44]:
# Define chain

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.2)

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    combine_docs_chain_kwargs={"prompt": prompt}
)


In [45]:
# Format Output Blocks

from IPython.display import HTML

def format_block(role, text):
    # Assign emoji: 🎓 for Apprentice, 🧑‍🎨 for Leonardo
    emoji = "🧑‍🎨" if role == "Leonardo" else "🎓"

    # Format text into paragraphs
    content_html = "<br><br>".join(p.strip() for p in text.split("\n") if p.strip())

    # HTML block with styling
    html = f"""
    <div style="max-width:900px; padding:12px 24px; margin:12px 0;
                background-color:#fff; border-radius:10px;
                box-shadow: 0 4px 12px rgba(0,0,0,0.05); font-family:Georgia,serif;
                font-size:15px; line-height:1.6; text-align:left">
        <b>{emoji} {role}:</b><br><br>
        {content_html}
    </div>
    """
    return widgets.HTML(value=html)

In [46]:
# UI Setup
chat_display = widgets.VBox(layout=widgets.Layout(align_items="flex-start"))

input_box = widgets.Textarea(
    placeholder="Ask Leonardo something...",
    layout=widgets.Layout(width="1000px", height="60px")
)

submit_button = widgets.Button(
    description="Ask Leonardo",
    button_style="primary",
    layout=widgets.Layout(width="130px", height="40px")
)

def on_submit(_=None):
    question = input_box.value.strip()
    if question:
        chat_display.children += (format_block("Apprentice", f"<i>{question}</i>"),)
        input_box.value = ""
        answer = qa_chain.run(question)
        chat_display.children += (format_block("Leonardo", answer),)

submit_button.on_click(on_submit)


In [None]:
# Layout & Display
header = widgets.HTML(
    value="""
    <h2 style="font-family:Georgia,serif; text-align:center; padding:10px">
        Ask Leonardo da Vinci anything from <em>Thoughts on Art and Life</em>
    </h2>
    """
)

input_area = widgets.HBox([input_box, submit_button])
app_layout = widgets.VBox(
    [header, chat_display, input_area],
    layout=widgets.Layout(width="100%", align_items="flex-start")  # Align left
)

display(app_layout)