##  RAG on WFA Specifications using OpenAI Embeddings and Chroma Vector Store 

This project uses RAG (Retrieval Augmented Generation) to build a question/answering assistant. I have used WFA specifications as base for these documents, but this can be expanded to any other set of documents.

This project demonstrates how to build a conversational AI system using OpenAI's language model with memory and a vector-based retrieval mechanism for Retrieval-Augmented Generation (RAG). It enables context-aware, dynamic conversations with enhanced retrieval capabilities for domain-specific knowledge.

-----------

Features -

Conversational Context Memory: Tracks the history of exchanges for continuity in multi-turn conversations.

Retrieval-Augmented Generation (RAG): Integrates a vector database for efficient retrieval of relevant documents or context.

OpenAI GPT Integration: Leverages OpenAI's language models (e.g., GPT-3.5 or GPT-4) for generating high-quality responses.

-----------

Example -

Input Prompt: what is GAS initial response frame

Output: The GAS Initial Response Frame is a type of action frame used in the Generic Advertisement Service (GAS) as defined in the Wi-Fi Direct® Specification. It is part of the communication process between devices, particularly in peer-to-peer (P2P) networking scenarios.

The frame includes several fields such as the Category field (which indicates it is a Public Action Frame), Action field (set to a specific value for a GAS Initial Response), Dialog Token (copied from the corresponding GAS Initial Request), and various other fields that provide details about the response, such as status codes, query response length, and any advertisement protocol information.

In essence, the GAS Initial Response Frame is sent by a device in response to a GAS Initial Request from another device, facilitating the exchange of information related to services and capabilities available within the network.

-----------

This was better response than ChatGPT. When ChatGPT was asked "what is GAS initial response frame". It gave an irrelevant response:
 
The GAS Initial Response Frame typically refers to the structured approach used in emergency response or incident management systems to establish an organized and effective initial reaction to emergencies or critical incidents. GAS might stand for a specific organization or methodology depending on the context (e.g., Gasoline Emergency Response or a General Assessment Strategy), but without additional context, I’ll outline a general framework often applied in emergency management scenarios:

https://chatgpt.com/share/679543e5-9464-8000-975c-2142132bf737

In [4]:
import os
import glob
from dotenv import load_dotenv
import gradio as gr
from PyPDF2 import PdfReader

In [5]:
# imports for langchain
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [6]:
MODEL = "gpt-4o-mini"

In [9]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
#openai = OpenAI()

In [None]:
#loader = PyPDFLoader("WFA Specifications\Miracast_Specification_v2.3.pdf")
#pdf_docs = loader.load()

In [None]:
#print(pdf_docs)

In [None]:
pdf_files = glob.glob("WFA Specifications/*.pdf")
documents = []

for pdf_file in pdf_files:
    doc_type = os.path.basename(pdf_file)
    loader = PyPDFLoader(pdf_file)
    pdf_docs = loader.load()
    for doc in pdf_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)


In [None]:
len(documents)

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks: {len(chunks)}")
print(f"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}")

In [None]:
#start embedding

embeddings = OpenAIEmbeddings()

if os.path.exists("wfa_database"):
    Chroma(persist_directory="wfa_database", embedding_function=embeddings).delete_collection()

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory="wfa_database")
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

In [15]:
#load embeddings

embeddings = OpenAIEmbeddings()

if os.path.exists("wfa_database"):
    vectorstore = Chroma(persist_directory="wfa_database", embedding_function=embeddings)
    print(f"Loaded existing vectorstore with {vectorstore._collection.count()} documents")
else:
    vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory="wfa_database")
    print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Loaded existing vectorstore with 1985 documents


In [16]:
# investigate the vectors

collection = vectorstore._collection
count = collection.count()

sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"There are {count:,} vectors with {dimensions:,} dimensions in the vector store")

There are 1,985 vectors with 1,536 dimensions in the vector store


In [23]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=1, model_name=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG
retriever = vectorstore.as_retriever()

# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [24]:
query = "what are the types of WFD sessions"
result = conversation_chain.invoke({"question": query})
print(result["answer"])

The provided context does not specify or categorize different types of WFD sessions. Therefore, I don't know the answer to your question.


In [25]:
# set up a new conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [26]:
def chat(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [27]:
view = gr.ChatInterface(chat, type="messages", title="WFA Assistant").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.
