# LangChain Samples

LangChain is an open source framework for interacting with LLMs. This notebook gives some examples on how to use LangChain with GCP.

In [1]:
# Install dependencies
%pip install --upgrade -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
# Global Variables (CHANGE THESE)
PROJECT_ID = "PROJECT-ID"

RAW_BUCKET = "GCS-BUCKET"

In [4]:
# Load websites from URLs into LangChain documents

from langchain.document_loaders import RecursiveUrlLoader
from bs4 import BeautifulSoup as Soup

url = "https://developers.google.com/machine-learning/guides/"
loader = RecursiveUrlLoader(
    url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
documents = loader.load()

In [5]:
# Split the documents into chunks so we can embed them

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(documents)

In [6]:
# Let's see how many chunks we have

len(all_splits)

159

In [7]:
# We now create a vectorstore so we can retrieve relevant embeddings

from langchain.embeddings import VertexAIEmbeddings
from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=all_splits, embedding=VertexAIEmbeddings())

In [8]:
# Now we create a retriever which we can use to actually search the vectorstore

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [9]:
# Let's ask a question to our retriever

retrieved_docs = retriever.get_relevant_documents(
    "What is the first rule of machine learning?"
)

In [10]:
# Let's see what we retrieved!

print(retrieved_docs[0].page_content)

Before Machine Learning
Rule #1: Don’t be afraid to launch a product without machine learning.
Machine learning is cool, but it requires data. Theoretically, you can take data
from a different problem and then tweak the model for a new product, but this
will likely underperform basic
heuristics. If you think that
machine learning will give you a 100% boost, then a heuristic will get you 50%
of the way there.
For instance, if you are ranking apps in an app marketplace, you could use the
install rate or number of installs as heuristics. If you are detecting spam,
filter out publishers that have sent spam before. Don’t be afraid to use human
editing either. If you need to rank contacts, rank the most recently used
highest (or even rank alphabetically). If machine learning is not absolutely
required for your product, don't use it until you have data.
Rule #2: First, design and implement metrics.
Before formalizing what your machine learning system will do, track as much as


In [11]:
# Now we'll create a Vertex AI LLM
from langchain.llms import VertexAI

llm = VertexAI(temperature=0)

In [12]:
# We will use an existing prompt template from LangChain Hub

from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

In [13]:
# Let's see what this prompt looks like then

print(
    prompt.invoke(
        {"context": "filler context", "question": "filler question"}
    ).to_string()
)

Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


In [14]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. \nQuestion: {question} \nContext: {context} \nAnswer:"
)

In [15]:
# Now we will create a chain to get answers from the questions based on samples retrieved from our retriever

from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | StrOutputParser()
)

In [14]:
# And now we can actually ask a question to our chain!

for chunk in rag_chain.stream("What is the first rule of machine learning?"):
    print(chunk, end="", flush=True)

 The first rule of machine learning is: Don't be afraid to launch a product without machine learning. 