# RAG with Galileo, LangChain and GPT
Retrieval-Augmented Generation (RAG) is an architectural approach that can enhance the effectiveness of large language model (LLM) applications using customized data. In this example, we use LangChain, an orchestrator for language pipelines, to build an assistant capable of loading information from a web page and use it for answering user questions

## Step 0: Configuring the environment

This step install the necessary libraries for connecting with Galileo and the models

In [None]:
!pip install langchain-community
!pip install langchain
!pip install langchain_openai
!pip install promptquality #Galileo
!pip install chromadb
!pip install sentence-transformers
!pip install openai
!pip install PyPDF

Collecting langchain-community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain-community)
[0m  Downloading aiohttp-3.10.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.5 kB)
[0mCollecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain<0.4.0,>=0.3.0 (from langchain-community)
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain-community)
  Downloading langchain_core-0.3.0-py3-none-any.whl.metadata (6.2 kB)
Collecting langsmith<0.2.0,>=0.1.112 (from langchain-community)
  Downloading langsmith-0.1.121-py3-none-any.whl.metadata (13 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.5.2-py3-none-any.whl.metadata (3.5 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (fr

## Step 1: Data Loading

In this step, we will use the Langchain framework to  extract the content from a local PDF file with the product documentation. Also, we have commented some example on how to use Web Loaders to load data form pages on the web.

In [3]:
from langchain.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyPDFLoader

In [4]:
file_path = (
    "docs/AIStudioDoc.pdf"
)
pdf_loader = PyPDFLoader(file_path)
pdf_data = pdf_loader.load()

#loader1 = WebBaseLoader("https://www.hp.com/us-en/workstations/ai-studio.html") # If you want to change the knowledge base, just modify this link.
#data1 = loader1.load()

#loader2 = WebBaseLoader("https://zdocs.datascience.hp.com/docs/aistudio")
#data2 = loader2.load()

## Step 2: Creation of Chunks
Here, we split the loaded documents into chunks, so we have smaller and more specific texts to add do our vector database.

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(pdf_data)



## Step 3: Retrieval

We transform the texts into embeddings and store them in a vector database. This allows us to perform similarity search, and proper retrieval of documents


In [7]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma


In [8]:
embedding = HuggingFaceEmbeddings()

  embedding = HuggingFaceEmbeddings()
  from tqdm.autonotebook import tqdm, trange
2024-09-16 15:15:37.620127: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/sentence-transformers/all-mpnet-base-v2/78c0197b6159d92658e319bc1d72e4c73a9a03dd03815e70e555c5ef05615658?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1726758956&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNjc1ODk1Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9zZW50ZW5jZS10cmFuc2Zvcm1lcnMvYWxsLW1wbmV0LWJhc2UtdjIvNzhjMDE5N2I2MTU5ZDkyNjU4ZTMxOWJjMWQ3MmU0YzczYTlhMDNkZDAzODE1ZTcwZTU1NWM1ZWYwNTYxNTY1OD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=HRrj8kPjw%7EwSsgL5QS0PMpo1AE5Fpxkimb8L4wjKJgkWqebSNSWyY1mvXYej51e1zk5Clp%7Eg66QNIrjt0RwrTYPnXIy-ekzwTXDES%7EP%7El-OQX9cNgg6Hz2yV2dENwr6ng5KubaDGcO4pyunWKbPi771v547-IcP-zPcKKJu1ba8eEv9Ri1lx0MwJRRlxYpKdoWn6r--JQK073srZGm94xbsyA3YWCMW2l3b60v6J94qTFmekcbcKJb9mwk0MRY2WhC5pSgI5rTLqWoIts-4DuqY3s156orKu5az-H7SHsO85AAgte7ygTawwExjcpl1fdHdCp0Jvf

model.safetensors:  60%|#####9    | 262M/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
retriever = vectordb.as_retriever()


## Step 4: Model

In this example, we will use OpenAI API to connect to GPT-3.5 model. A broader range of models could be used.

In [10]:
import os
from langchain_openai import OpenAI

os.environ["OPENAI_API_KEY"] = "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"
llm = OpenAI(model_name="gpt-3.5-turbo-instruct")


In [None]:
### Code to connect to Hugging Face models

#import yaml
#with open('config.yaml') as file:
    #config = yaml.safe_load(file)
#huggingfacehub_api_token = config["hf_key"]
#repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
#llm = HuggingFaceEndpoint(
   #huggingfacehub_api_token=huggingfacehub_api_token,
   #repo_id=repo_id,
#)


## Step 5: Chain
In this part, we define a pipeline that receives a question and context, formats the context documents, and uses a Hugging Face (Mistral) chat model to answer the question based on the provided context. The output is then formatted as a string for easy reading.

In [11]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from typing import List
from langchain.schema.document import Document

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])

template = """You are an virtual Assistant for a Data Science platform called AI Studio. Answer the question based on the following context:

    {context}

    Question: {query}
    """
prompt = ChatPromptTemplate.from_template(template)

chain = {"context": retriever | format_docs, "query": RunnablePassthrough()} | prompt | llm | StrOutputParser()

## Step 6: Connect to Galileo
Through the Galileo library called Prompt Quality, we connect our API generated in the Galileo console to log in. To get your ApiKey, use this link: https://console.hp.galileocloud.io/api-keys

In [12]:
import promptquality as pq

os.environ['GALILEO_API_KEY'] = "9zjBwRIhyWo4zzkdsJhvg2y-NTT92qjEQmt2DIFmCFg" #your api Key
galileo_url = "https://console.hp.galileocloud.io/"
pq.login(galileo_url)

👋 You have logged into 🔭 Galileo (https://console.hp.galileocloud.io/) as rafael.borges@hp.com.


Config(console_url=Url('https://console.hp.galileocloud.io/'), username=None, password=None, api_key=SecretStr('**********'), token=SecretStr('**********'), current_user='rafael.borges@hp.com', current_project_id=None, current_project_name=None, current_run_id=None, current_run_name=None, current_run_url=None, current_run_task_type=None, current_template_id=None, current_template_name=None, current_template_version_id=None, current_template_version=None, current_template=None, current_dataset_id=None, current_job_id=None, current_prompt_optimization_job_id=None, api_url=Url('https://api.hp.galileocloud.io/'))

Through callbacks, we choose the metrics we want to monitor via the Galileo console. We pass a list of queries to run our created chain and log in to Galileo.


In [13]:
# Create callback handler
prompt_handler = pq.GalileoPromptCallback(
    project_name="AIStudio_RAG",
    scorers=[pq.Scorers.context_adherence_luna, pq.Scorers.correctness, pq.Scorers.toxicity, pq.Scorers.sexist, pq.Scorers.chunk_attribution_utilization_plus,
 ]
)

# Run your chain experiments across multiple inputs with the galileo callback
inputs = [
    "What is AI Studio",
    "How to create projects in AI Studio?",
    "How to monitor experiments?",
    "What are the different workspaces available?",
    "What, exactly, is a workspace?",
    "How to share my experiments with my team?",
    "Can I access my Git repository?",
    "Do I have access to files on my local computer?",
    "How do I access files on the cloud?",
    "Can I invite more people to my team?"
]
chain.batch(inputs, config=dict(callbacks=[prompt_handler]))

# publish the results of your run
prompt_handler.finish()

Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:
rag_nli: Computing 🚧
cost: Computing 🚧
toxicity: Done ✅
sexist: Done ✅
pii: Computing 🚧
protect_status: Done ✅
latency: Done ✅
factuality: Computing 🚧
chunk_attribution_utilization_gpt: Computing 🚧
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/db593b3a-3723-44ed-b53b-f03f0966560d/519ddda7-b754-43e3-9b7f-7df44e0e0710?taskType=12


In [25]:
dir(pq.Scorers)                                     

['__class__',
 '__doc__',
 '__members__',
 '__module__',
 'chunk_attribution_utilization_luna',
 'chunk_attribution_utilization_plus',
 'completeness_luna',
 'completeness_plus',
 'context_adherence_luna',
 'context_adherence_plus',
 'context_relevance',
 'correctness',
 'pii',
 'prompt_injection',
 'prompt_perplexity',
 'sexist',
 'tone',
 'toxicity']