# RAGStack and Astra vector db

This notebook demonstrates a RAG pattern using RAGStack and the AstraDB vector database.

The pattern is:

1. Construct information base
2. Basic retrieval
3. Generation with augmented context
4. Advanced retrieval and generation
5. Evaluate quality


## Setup
RAGStack includes all the libraries you need for the RAG pattern, including the vector database, embeddings pipeline, and retrieval.

In [64]:
!pip3 install ragstack-ai datasets



Import the necessary dependencies:

In [65]:
import getpass
from datasets import load_dataset
from langchain.vectorstores.astradb import AstraDB 
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document

Enter your environment variables:

In [66]:
astra_token = getpass.getpass("Astra token:")
astra_endpoint = getpass.getpass("Astra db endpoint:")
openai_key = getpass.getpass("OpenAI Key:")
collection = input("Collection name:")

## RAG workflow

With your environment set up, you're ready to create a RAG workflow.

### Construct information base

Declare the embeddings model define its required parameters.
Create your Astra vector database and configure its parameters.

In [67]:
embedding = OpenAIEmbeddings(openai_api_key=openai_key)
vstore = AstraDB(
        collection_name=collection,
        embedding=embedding,
        token=astra_token,
        api_endpoint=astra_endpoint
    )
print("Astra configured")

Astra configured


Load a small dataset of quotes with the Python dataset module.

In [68]:
philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry:")
print(philo_dataset[16])

An example entry:
{'author': 'aristotle', 'quote': 'Love well, be loved and do something of value.', 'tags': 'love;ethics'}


Process metadata and convert:

In [69]:
docs = []
for entry in philo_dataset:
    metadata = {"author": entry["author"]}
    if entry["tags"]:
        # Add metadata tags to the metadata dictionary
        for tag in entry["tags"].split(";"):
            metadata[tag] = "y"
    # Add a LangChain document with the quote and metadata tags
    doc = Document(page_content=entry["quote"], metadata=metadata)
    docs.append(doc)

Compute embeddings:

In [70]:
inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")


Inserted 450 documents.


Confirm your vector store is populated:

In [71]:
print(vstore.astra_db.collection(collection).find())

{'data': {'documents': [{'_id': '7eba67f9f63c4e88bd00dba7cb3f548a', 'content': 'Those who are not angry at the things they should be angry at are thought to be fools, and so are those who are not angry in the right way, at the right time, or with the right persons.', '$vector': [-0.011329636200437894, -0.015616183476134384, 0.019839506986287815, -0.030777158820591042, -0.03204162854351666, -0.020370584381675286, -0.028399957902159295, -0.026756149311265694, 0.023152415164408385, -0.0033002622142925893, 0.012480302027798896, 0.006249634671863165, -0.011158933151058746, 0.009553058260697427, -0.01031806153968451, 0.011759555198427434, 0.03659371358558429, 0.027565409039868403, 0.018435947959399766, -0.004925103954920049, -0.009451900794622088, 0.02738838324140591, -0.005307605594413591, -0.012334888112107934, -0.022191418771031375, 0.014806923747531677, 0.03403948686869006, -0.020585544811992647, -0.003505738288154289, -0.0425114230226851, 0.013441298421175906, -0.026225071915878223, -0.

### Basic retrieval



Retrieve context from your vector database, and pass it to OpenAI with a prompt question.

In [72]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough

retriever = vstore.as_retriever(search_kwargs={'k': 3})

prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)
model = ChatOpenAI(openai_api_key=openai_key)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

chain.invoke("In the given context, what subject are philosophers most concerned with?")


'In the given context, philosophers are most concerned with truth and the unchanging reality that is the object of knowledge.'

Unsure about this response above? Modify the prompt to return the source documents it used to make its decision.

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableMap, RunnableLambda, RunnablePassthrough
from operator import itemgetter

retriever = vstore.as_retriever(search_kwargs={'k': 3})

prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)
model = ChatOpenAI(openai_api_key=openai_key)

chain_from_docs = (
        {
        "context": lambda input: docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | StrOutputParser()
)

chain_with_source = RunnableMap(
    {"documents": retriever, "question": RunnablePassthrough()}
) | {
    "documents": lambda input: [doc.metadata for doc in input["documents"]],
    "answer": chain_from_docs,
}

chain_with_source.invoke("In the given context, what subject are philosophers most concerned with?")
