## AI for Education - with RAG and Guardrails

### Purpose of this Notebook
The purpose of this notebook is to create a RAG application with guardrails. RAG can be used to provide up-to-date information to users and help reduce hallucinations. Guardrails can be used to give developers more control over the behavior of the model, preventing it from generating harmful or inappropriate content. Together, RAG and guardrails can be used to create a more reliable and safe application.

### Inspiration
This type of application was inspired by my time as an Undergraduate Instructor at UC Berkeley. Teachers are often understaffed and don't have time to answer all the questions students have. Many students today are using AI to help them learn the material. While using general-purpose AI can be helpful, ,amy students use it to cheat on assignments and generate code for them on their projects. Additionally, every semester, a course like CS10 may have new policies, materials, or other changes that are not reflected in general-purpose AI.

### Solution
The combination of RAG and guardrails used in this notebook addresses both of these issues. RAG is used to provide up-to-date information on the course syllabus and lecture notes, and guardrails are used to prevent the model from explicitly generating code snippets. For classes who want students to use LLMs but want to limit the LLM's ability to help students cheat, this application can be a great tool.

### Sections
1. Package Installation
2. Data Ingestion to Build Vector DB
3. RAG Query Engine
4. Guardrails
5. Observability
6. Future Steps

 



## Package Installation

Please run the following codde to install the necessary packages. If you followed the instructions in the README, you should already have these installed. But if you need to install them again or you just want to make sure they're installed, you can run the following code.

In [79]:
pip install ipykernel


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [80]:
pip install pinecone


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [81]:
pip install llama-index


Collecting llama-index-core<0.12.0,>=0.11.17 (from llama-index)
  Using cached llama_index_core-0.11.23-py3-none-any.whl.metadata (2.5 kB)
Downloading llama_index_core-0.11.23-py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.12.25
    Uninstalling llama-index-core-0.12.25:
      Successfully uninstalled llama-index-core-0.12.25
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-vector-stores-pinecone 0.4.5 requires llama-index-core<0.13.0,>=0.12.0, but you have llama-index-core 0.11.23 which is incompatible.[0m[31m
[0mSuccessfully installed llama-index-core-0.11.23

[1m[[0m[34;49mnotice[0m[1;39;49m][0m

In [82]:
pip install openai



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [83]:
pip install requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [84]:
pip install llama-index-vector-stores-pinecone

Collecting llama-index-core<0.13.0,>=0.12.0 (from llama-index-vector-stores-pinecone)
  Using cached llama_index_core-0.12.25-py3-none-any.whl.metadata (2.5 kB)
Using cached llama_index_core-0.12.25-py3-none-any.whl (1.6 MB)
Installing collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.11.23
    Uninstalling llama-index-core-0.11.23:
      Successfully uninstalled llama-index-core-0.11.23
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-multi-modal-llms-openai 0.2.2 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.25 which is incompatible.
llama-index-tools-tavily-research 0.2.0 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.25 which is incompatible.
llama-index-question-gen-openai 0.2.0 requires llama-index-

In [88]:
pip install nemoguardrails


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Data Ingestion to Build Vector DB

In [57]:
# importing necessary libraries

from pinecone import Pinecone, ServerlessSpec
import os

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from nemoguardrails import LLMRails, RailsConfig

from pathlib import Path
from llama_index.readers.file import PDFReader
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SemanticSplitterNodeParser
import json


## SET YOUR ENVIROMENT VARIABLES

In [89]:
# Set environment variables
if os.environ["OPENAI_API_KEY"] is None:
    os.environ["OPENAI_API_KEY"] = ""
if os.environ["PINECONE_API_KEY"] is None:
    os.environ["PINECONE_API_KEY"] = ""
if os.environ["PINECONE_CLOUD"] is None:
    os.environ["PINECONE_CLOUD"] = "aws"
if os.environ["PINECONE_REGION"] is None:
    os.environ["PINECONE_REGION"] = "us-east-1"


### Pinecone Setup
You will need to enter your own API key and cloud region for Pinecone. Please let me know if this is an issue or if there are any questions.

Pinecone is used as the vector database for it's fully managed service. I didnt wantt to spend time setting up a local database solution. Given my limited time and previous experience with Pinecone, I decided to use it.

In [None]:
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index_name = "cs10-index"

In [16]:
# creating a pinecone index
try:
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="euclidean",
        spec=ServerlessSpec(cloud=os.environ["PINECONE_CLOUD"] , region=os.environ["PINECONE_REGION"]),
    )
except Exception as e:
    print("Index already exists")
    print(e)

(409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=utf-8', 'access-control-allow-origin': '*', 'vary': 'origin,access-control-request-method,access-control-request-headers', 'access-control-expose-headers': '*', 'x-pinecone-api-version': '2025-01', 'x-cloud-trace-context': 'cf9d1a993521846b52d3c18053c3e9af', 'date': 'Sun, 23 Mar 2025 14:26:29 GMT', 'server': 'Google Frontend', 'Content-Length': '85', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
HTTP response body: {"error":{"code":"ALREADY_EXISTS","message":"Resource  already exists"},"status":409}



### Embedding Model

OpenAI is used as the embedding model. It's performance is generally good and it's easy to use.

In [17]:
embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])

pinecone_index = pc.Index(index_name)
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

### Ingestion Pipeline

Llama index was used as the framework for this RAG project specificallhy because of tools like IngestionPipeline that make vector embedding easy to do. Below an Ingestion pipeline object is created to embed the documents

In [18]:


pipeline = IngestionPipeline(
    transformations=[
        SemanticSplitterNodeParser(
            buffer_size=1,
            breakpoint_percentile_threshold=95, 
            embed_model=embed_model,
            ),
        embed_model,
        ],
        vector_store=vector_store  # Our new addition
    )

### Data

The data directory contains the course materials for CS10. It includes the syllabus, lecture notes, and other use guides, all in PDF format. These documents were found on the course website at https://cs10.org/sp25/. 

In [65]:
data_directory = Path("./data")
documents = []

loaded_files_path = Path("./loaded_data/loaded_files.json")

# Check if the file exists, if not create it with empty loaded_files list
if not loaded_files_path.exists():
    # Make sure the directory exists
    loaded_files_path.parent.mkdir(parents=True, exist_ok=True)
    # Create the file with empty loaded_files list
    with open(loaded_files_path, 'w') as f:
        json.dump({"loaded_files": []}, f)



with open(loaded_files_path) as f:
    loaded_data = json.load(f)

loaded_files_set = {file["file_path"] for file in loaded_data.get("loaded_files", [])}

loaded_files_set = set([file["file_path"] for file in loaded_data.get("loaded_files", [])])



In [66]:
# Ingesting the data

for file in data_directory.iterdir():
    if file.is_file() and file.suffix == '.pdf' and not str(file) in loaded_files_set:
        documents = PDFReader().load_data(file=file)
        pipeline.run(documents=documents)
        print(f"Ingested file: {file}")
        loaded_data["loaded_files"].append({
            "file_path": str(file),
            "index_name": index_name
        })
    else:
        print(f"Skipping file: {file}")
    

with open(loaded_files_path, "w") as f:
    json.dump(loaded_data, f)

print("Ingestion complete")

Skipping file: data/2025Sp CS10 Intro Abstraction.pptx.pdf
Skipping file: data/2025Sp CS10 Python I.pptx.pdf
Skipping file: data/2025Sp CS10 Recursion II.pptx.pdf
Skipping file: data/.DS_Store
Skipping file: data/2025Sp CS10 Recursion III.pptx.pdf
Skipping file: data/2025Sp CS10 Python Tree Recursion Game Theory.pptx.pdf
Skipping file: data/How To_ Snap! Features and Tricks.pdf
Skipping file: data/2025Sp CS10 Testing + Project 3 + Mutable vs Immutable.pptx.pdf
Skipping file: data/2025Sp CS10 Ethics in AI.pptx.pdf
Skipping file: data/2025Sp CS10 Generative AI.pptx.pdf
Skipping file: data/2025Sp CS10 Computers in Education.pptx.pdf
Skipping file: data/2025Sp CS10 HOFs.pptx.pdf
Skipping file: data/2025Sp CS10 Variables Scope Lists HOFs.pptx.pdf
Skipping file: data/2025Sp CS10 Abstraction II.pptx.pdf
Skipping file: data/2025Sp CS10 Python II.pptx.pdf
Skipping file: data/[CS10 _ Sp25] Lecture 4. Iteration.pdf
Skipping file: data/2025Sp CS10 Concurrency.pptx.pdf
Skipping file: data/2025Sp CS

## RAG Query Engine

The RAG query engine is created by connecting the Pinecone vector database to the LlamaIndex framework. The query engine is used to query the vector database and retrieve the most relevant documents to the user's question.


In [58]:
# Set up OpenAI embeddings
embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])

# Connect Pinecone with LlamaIndex
vector_store = PineconeVectorStore(pinecone_index=pc.Index(index_name))

# Set up the service context
Settings.embed_model = embed_model

# # Build the index
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

# Create query engine
query_engine = RetrieverQueryEngine.from_args(index.as_retriever())

## Guardrails

Nvidia's NeMo Guardrails are used because of it's open source nature and ability to customize flows. When first creating this project, I did not realize that the guardrails library was built on top of LangChain. Instead of LlamaIndex, LangChain might have been the better framework to use for this project because the RAG portion of the project is not extremely complex. 

However NeMo can still be used with LlamaIndex or other frameworks, but requires a bit more setup.

In [68]:
pip install langchain-openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Config files

Config files are the heart of NeMo Guardrails. They are used to define the behavior of the guardrails. The config file used in this project is located in the config/ directory. Colang is used to define the flow of the conversation and the actions that can be taken.

In short, theses config files look for input prompts that ask the llm to generate code or behave in harmful way. When it detects this, the guardrails will enable the LLM to respond accordingly. Another way guardrails are being used here is to detect whenever RAG needs to be used. This allows for simple routing for the LLM, something that would usually require additional OpenAI configs or LlamaIndex framework code

In [48]:
# Update the config to include prompts
config_path = "./config/"

# Create the config from the path
config = RailsConfig.from_path(config_path)


def rag_query(query=None):
    if query is None:
        return "No query provided"
    response = query_engine.query(query)
    return response
# Implement the check_if_cs10_related action
def check_if_cs10_related(query=None) -> bool:
    if query is None:
        return False
    # List of CS10 related keywords
    cs10_keywords = [
        "algorithm", "data structure", "recursion", "big o", "binary search", 
        "object-oriented", "sorting", "computational complexity", "linked list", 
        "dynamic programming", "cs10", "computer science", "programming concept",
        "binary tree", "hash table", "graph theory", "queue", "stack", "heap",
        "time complexity", "space complexity", "syllabus", "course content",
        "lecture notes", "assignments", "labs", "projects", "exams", "textbook",
        "video lectures", "online resources", "tutorials", "practice problems",
        "quizzes", "grading", "peer review", "team projects",
        "project-based learning", "hands-on activities", "interactive learning",
        "collaborative learning", "self-paced learning", "online course", "snap", "class"
        
    ]
    
    query_lower = query.lower()
    return any(keyword in query_lower for keyword in cs10_keywords)

# Creating the guardrails object
rails = LLMRails(config)

# Register the actions
rails.register_action(rag_query, name="rag_query")
rails.register_action(check_if_cs10_related, name="check_if_cs10_related")

In [70]:
async def guardrails_rag_query(prompt):
    response = await rails.generate_async(prompt=prompt)
    # print(response)
    return response

def no_guardrails_rag_query(prompt):
    response = query_engine.query(prompt)
    # print(response)
    return response.response

prompt = "can you help me write python code to calculate the sum of all the elements in a list?"        
gr_response = await guardrails_rag_query(prompt)
no_gr_response = no_guardrails_rag_query(prompt)
print("Guardrails response: ", gr_response)
print("--------------------------------")
print("No guardrails response: ", no_gr_response)

Guardrails response:  I'm not a coding assistant, I can help you understand the concepts but cannot write code for you.
--------------------------------
No guardrails response:  Certainly. Here is a Python code snippet that calculates the sum of all the elements in a list:

```python
my_list = [1, 2, 3, 4, 5]
sum = 0
for i in my_list:
    sum = sum + i
print(sum)
```


In [None]:
prompts = [
    "Who is the instructor for CS10",
    "What is an algorithm?",
    "Can you help me write python code to calculate the sum of all the elements in a list?",
    "How many midterms are there in CS10?",
    "What is the syllabus for CS10?",
    "How many projects are there in CS10?",
    "Write a function that calculates my grade in CS10"
]

for prompt in prompts:
    print(f"Prompt: {prompt}")
    response = await guardrails_rag_query(prompt)
    print(f"Guardrails response: {response}")
 

### Try it out!

In [78]:
prompt = "What is the late policy for projects in CS10?"
response = await guardrails_rag_query(prompt)
print("Question: ", prompt)
print(f"Guardrails response: {response}")

## Observability

NeMo Guardrails do offer some observability features, but they are limited. Arize's Phoeniz platform offers the perfect open source pairing to NeMo Guardrails. Systems like RAG and Guardrails offer developers more control over their product, but the products become more complex and harder to monitor.

By introducing an observability platform like Phoeniz, developers can gain a better understanding of their product's performance and behavior. Here are some ways Phoenix can be used to help this application:

1. Phoenix experiments can be used to test different guardrail congfigurations or diffeerent vector DBs used for RAG. 

2. The whole flow of the guardrails can be observed. Evaluations can be used to label whether or not a prompt should have been rerouted or blocked. 

3. Custom insturmentation can easily be applied to guardrail actions, allowing developers to gain deeper insights into how their product is behaving.




## Future Steps:

If I were to continue workng on this project, I would like to add the following features:

1. Add more document data to the vector DB. This would allow for more accurate responses and a better user experience.

2. Switch from Llamaindex to LangChain. This would would integrate better with the guardrails framework and would allow for easier auto instrumentation for tracing.

3. Add evaluations to the pipeline to test the performance of the guardrails and RAG.

4. Use phoenix experiments to test different guardrail configurations and vector DBs.

