# Quickstart with RAGStack

This notebook demonstrates how to set up a simple RAG pipeline with RAGStack. At the end of this notebook, you will have a fully functioning Question/Answer model that can answer questions using your supplied documents.

A RAG pipeline requires, at minimum, a vector store, an embedding model, and an LLM. In this tutorial, you will use an Astra DB vector store, an OpenAI embedding model, an OpenAI LLM, and LangChain to orchestrate it all together.

## Prerequisites

You will need a vector-enabled Astra database and an OpenAI Account.

* Create an [Astra vector database](https://docs.datastax.com/en/astra-serverless/docs/getting-started/create-db-choices.html).
* Create an [OpenAI account](https://openai.com/)
* Within your database, create an [Astra DB Access Token](https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html) with Database Administrator permissions.
* Get your Astra DB Endpoint:
  * `https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com`


## Setup
`ragstack-ai` includes all the packages you need to build a RAG pipeline.

`datasets` is used to import a sample dataset

In [2]:
%pip install -q ragstack-ai datasets

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [4]:
import os
from getpass import getpass  #used to take input from user

# Enter your settings for Astra DB and OpenAI:
os.environ["ASTRA_DB_API_ENDPOINT"] = os.getenv("ASTRA_DB_API_ENDPOINT")
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [5]:

# Optional: Check if the environment variable is loaded
print(os.getenv("OPENAI_API_KEY"))  # Should print the key if it's loaded properly


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


# Create RAG Pipeline

**Embedding Model and Vector Store**

In [6]:
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings

import os

#configure your embedding model and vector store
embedding = OpenAIEmbeddings()
vstore = AstraDBVectorStore(
    collection_name="test",
    embedding=embedding,
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT")
)
print("Astra vector store configured")

Astra vector store configured


In [None]:
from datasets import load_dataset

#load a sample datastax dataset from hugging face

philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry : ")
print(philo_dataset[0])

Downloading readme: 100%|██████████| 574/574 [00:00<00:00, 576kB/s]
Downloading data: 100%|██████████| 67.6k/67.6k [00:00<00:00, 149kB/s]
Generating train split: 100%|██████████| 450/450 [00:00<00:00, 3703.83 examples/s]


An example entry : 
{'author': 'aristotle', 'quote': "True happiness comes from gaining insight and growing into your best possible self. Otherwise all you're having is immediate gratification pleasure, which is fleeting and doesn't grow you as a person.", 'tags': 'knowledge'}


https://huggingface.co/datasets/datastax/philosopher-quotes

In [15]:
from langchain.schema import Document

#Constructs a set of documents from your data. Documents can be used as inputs to your vectore store.
docs = []

#Each entry is expected to be a dictionary with keys for author, tags, and quote.
for entry in philo_dataset:
    metadata = {"author": entry["author"]}
    if entry["tags"]:
        #add metadata tags to the metadata dictionary
        for tag in entry["tags"].split(";"):
            metadata[tag]= "y"
    # Create a langchain document with the qoute and metadata tags
    doc = Document(page_content = entry["quote"], metadata=metadata)
    docs.append(doc)
    

In [19]:
doc

Document(metadata={'author': 'kant', 'knowledge': 'y'}, page_content='The schematicism by which our understanding deals with the phenomenal world ... is a skill so deeply hidden in the human soul that we shall hardly guess the secret trick that Nature here employs.')

In [None]:
# Create embedddings by inserting your documents into the vector store.
inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len}")