# Web Search with Langchain

This example shows how to use the Python [LangChain](https://python.langchain.com/docs/get_started/introduction) library to run a text-generation request on open-source LLMs and embedding models using the OpenAI SDK, then augment that request using results from Google web search.

### <u>Requirements</u>
1. As you will accessing the LLMs and embedding models through Vector AI Engineering's Kaleidoscope Service (Vector Inference + Autoscaling), you will need to request a KScope API Key:

      Run the following command (replace ```<user_id>``` and ```<password>```) from **within the cluster** to obtain the API Key. The ```access_token``` in the output is your KScope API Key.
  ```bash
  curl -X POST -d "grant_type=password" -d "username=<user_id>" -d "password=<password>" https://kscope.vectorinstitute.ai/token
  ```
2. After obtaining the `.env` configurations, make sure to create the ```.kscope.env``` file in your home directory (```/h/<user_id>```) and set the following env variables:
- For local models through Kaleidoscope (KScope):
    ```bash
    export OPENAI_BASE_URL="https://kscope.vectorinstitute.ai/v1"
    export OPENAI_API_KEY=<kscope_api_key>
    ```
- For OpenAI models:
   ```bash
   export OPENAI_BASE_URL="https://api.openai.com/v1"
   export OPENAI_API_KEY=<openai_api_key>
   ```

## Set up the RAG workflow environment

#### Import libraries

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
import requests
import sys

from bs4 import BeautifulSoup
from googlesearch import search
from pathlib import Path

from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS
from langchain.docstore.document import Document
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

#### Load config files

In [3]:
# Add root folder of the rag_bootcamp repo to PYTHONPATH
current_dir = Path().resolve()
parent_dir = current_dir.parent
sys.path.insert(0, str(parent_dir))

from utils.load_secrets import load_env_file
load_env_file()

In [15]:
GENERATOR_BASE_URL = os.environ["OPENAI_BASE_URL"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

#### Set up some helper functions

In [5]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

#### Choose LLM and embedding model

In [None]:
GENERATOR_MODEL_NAME = "Meta-Llama-3.1-8B-Instruct"

## Select one of the two options: 
## - "all-MiniLM-L6-v2" (22M parameters)
## - "bge-base-en-v1.5" (110M parameters)

# EMBEDDING_MODEL_NAME = "bge-base-en-v1.5"
EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"

## Start with a basic generation request without RAG augmentation

Let's start by asking Llama-3.1 a question about recent events that it doesn't know about, something that happened after it finished training. At the time I'm writing this notebook in November 2024, Llama3 doesn't know who won the last World Series of baseball.

*Who won the 2024 World Series of baseball?*

**The correct answer is the Los Angeles Dodgers won in October 2024.**

In [7]:
query = "Who won the 2024 World Series of baseball?"

## Now send the query to the open source model using KScope

In [8]:
llm = ChatOpenAI(
    model=GENERATOR_MODEL_NAME,
    temperature=0,
    max_tokens=None,
    base_url=GENERATOR_BASE_URL,
    api_key=OPENAI_API_KEY
)
message = [
    ("human", query),
]
try:
    result = llm.invoke(message)
    print(f"Result: \n\n{result.content}")
except Exception as err:
    if "Error code: 503" in err.message:
        print(f"The model {GENERATOR_MODEL_NAME} is not ready yet.")
    else:
        raise

Result: 

I don't have the ability to predict the future or know the outcome of future events, including the 2024 World Series. The 2024 World Series has not yet occurred, and I don't have any information about it. I can provide information about past World Series winners, though. Would you like to know more about that?


Llama-3.1 admits that it doesn't know the answer, since according to the model it's a future event.

Let's see how we can use RAG to augment our question with a Google web search and get the correct answer.

## Ingestion: Do a Google web search for the query and obtain the necessary information

Parse through all the websites returned by a Google search, break them up into smaller digestible chunks, then encode them as vector embeddings.

In [9]:
# Do a Google web search and parse the results into a big text string
web_documents = []
for result_url in search(query):
    response = requests.get(result_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    web_documents.append(soup.get_text())

# Wrap text as Document object
docs = [Document(page_content=web_txt, metadata={"source": "web"}) for web_txt in web_documents]
print(f"Number of source documents: {len(docs)}")

# Split the result text into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=32)
chunks = text_splitter.split_documents(docs)
print(f"Number of text chunks: {len(chunks)}\n")

Number of source documents: 10
Number of text chunks: 869



#### Define the embeddings model

In [17]:
model_kwargs = {'device': 'cuda', 'trust_remote_code': True}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

print(f"Setting up the embeddings model {EMBEDDING_MODEL_NAME} at {GENERATOR_BASE_URL}")
embeddings = OpenAIEmbeddings(
    model=EMBEDDING_MODEL_NAME,
    # Leverage the RoBERTa tokenizer to make sure that 
    # the chunks stay within the 512-token context window.
    tiktoken_model_name="roberta-base",
    tiktoken_enabled=False
)

Setting up the embeddings model all-MiniLM-L6-v2 at https://kscope.vectorinstitute.ai/v1


## Retrieval: Make the document chunks available via a retriever

The retriever will identify the document chunks that most closely match our original query. 

Depending on the number of chunks provided, generating the embeddings might require:
- about 5 minutes, when using "bge-base-en-v1.5" (110M parameters);
- about 1 minute, when using "all-MiniLM-L6-v2" (22M parameters).

In [18]:
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Retrieve the most relevant context from the vector store based on the query
retrieved_docs = retriever.invoke(query)

Let's see what results it found. Important to note, these results are in the order the retriever thought were the best matches.

In [19]:
pretty_print_docs(retrieved_docs)

Document 1:

The 2024 Major League Baseball season (MLB) began on March 20–21 with a two-game series between the Los Angeles Dodgers and the San Diego Padres held in Seoul, South Korea, before the regular season proper ran from March 28 to September 30.[1][2] The 94th All-Star Game was played on July 16 at Globe Life Field in Arlington, Texas,[3] with the American League winning, 5–3.[4] The postseason then began on October 1 and concluded with Game 5 of the World Series on October 30.[5] Going into the season, the
----------------------------------------------------------------------------------------------------
Document 2:

The 2024 Major League Baseball season (MLB) began on March 20–21 with a two-game series between the Los Angeles Dodgers and the San Diego Padres held in Seoul, South Korea, before the regular season proper ran from March 28 to September 30.[1][2] The 94th All-Star Game was played on July 16 at Globe Life Field in Arlington, Texas,[3] with the American League winn

## Now send the query to the RAG pipeline

In [20]:
rag_pipeline = RetrievalQA.from_llm(llm=llm, retriever=retriever)
result = rag_pipeline.invoke(input=query)
print(f"Result: \n\n{result['result']}")

Result: 

The Los Angeles Dodgers won the 2024 World Series.


The model provides the correct answer based on the information from the web.