# LangChain Quickstart
This quickstart is meant to guide the user through the basics of building and interacting with a LLM using langchain
It is update from March 2024, and based of the LangChain docs found here - https://python.langchain.com/docs/get_started/quickstart

In [5]:
%pip install langchain langchain-openai

Note: you may need to restart the kernel to use updated packages.


In [8]:
# # Let's make this work with Google Colab by default. Uncomments the commented items below if you want to use this on a local notebook
# import getpass
# import os
# import uuid
# from google.colab import userdata

# os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
# os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')

# def _set_if_undefined(var: str):
#     if not os.environ.get(var):
#         os.environ[var] = getpass(f"Please provide your {var}")


# _set_if_undefined("OPENAI_API_KEY")
# _set_if_undefined("LANGCHAIN_API_KEY")

# Optional, add tracing in LangSmith.
# This will help you visualize and debug the control flow
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = "quickstart"

# Uncomment below if you want to enter the API key manually
#os.environ["OPENAI_API_KEY"] = getpass.getpass()

#Uncomment below if you want to use .env file
import os
import uuid
import dotenv

#load the .env file from the default location
dotenv.load_dotenv()

#retrieve the openai key
openai_api_key = os.getenv("OPENAI_API_KEY")

#check if OPENAI_API_KEY is not NONE and set it
if openai_api_key:
    os.environ["OPENAI_API_KEY"] = openai_api_key
else:
    # If OPENA_AI_API_KEY is not set, prompt the user to enter the key
    print("OPENAI_API_KEY not found in the .env file")
# Optional, add tracing in LangSmith.
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "quickstart"

In [13]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [38]:
llm.invoke("What great bands are playing music in Austin tonight?")

AIMessage(content="I'm not sure, as I do not have real-time information on music events. I recommend checking local music venues, event websites, or social media pages for information on bands playing in Austin tonight. Some popular venues in Austin known for hosting great live music include Mohawk, Stubbs BBQ, Antone's, and The Continental Club.", response_metadata={'token_usage': {'completion_tokens': 68, 'prompt_tokens': 17, 'total_tokens': 85}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None})

In [37]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are world class event planner, with amazing documentation kills."),
    ("user", "{input}")
])

In [24]:
chain = prompt | llm

In [39]:
chain.invoke({"input": "What great bands are playing music in Austin tonight?"})

"As an AI, I do not have real-time information on specific events happening in Austin tonight. However, Austin is known for its vibrant music scene, so there are likely many great bands playing in various venues around the city. I recommend checking local event listings, music venues' websites, or social media platforms to find out which bands are performing in Austin tonight. Enjoy the live music scene in the Live Music Capital of the World!"

The default output of ChatModel is a message. It's more convienient to work with strings. We'll use StrOutputParser to convert the chat message to a string.

In [31]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()

We cha now add this parser to the previous chain

In [32]:
chain = prompt | llm | output_parser

we can now invoke it and ask the same question

In [40]:
chain.invoke({"input": "What great bands are playing music in Austin tonight?"})

"I can't provide real-time information on specific events happening in Austin tonight. However, Austin is known for its vibrant music scene with many great bands and artists frequently performing in various venues across the city. You can check local event listings, music venues' websites, or event apps to find out which bands are playing in Austin tonight. Some popular music venues in Austin include Mohawk, Stubbs, Antone's, and The Continental Club. Enjoy the live music scene in Austin!"

## Retrieval Chain

In order to properly answer our question, we are going to need to retrieve information from the web. We can grab information from the internet using the WebBaseLoader. This loader will allow us to retrieve information from the web and use it in our model.

In [45]:
#install beautifulsoup4
%pip install beautifulsoup4

Collecting beautifulsoup4
  Downloading beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
  Using cached soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Downloading beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.9/147.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hUsing cached soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.5
Note: you may need to restart the kernel to use updated packages.


Lets pull the events down from the Austin Chronicle using WebBaseLoader from langchain_community

In [46]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.austinchronicle.com/events/")

docs = loader.load()

## Vector Store
Now we need to take the data we pulled from the web and convert it into a format that the model can understand. 


In [48]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

Let's use faiss, a very simple in memory vector store, to store the vectors of the events we pulled from the web.

In [49]:
%pip install faiss-cpu

Note: you may need to restart the kernel to use updated packages.


In [52]:
# Now we can build our index

from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

Now that we have taken our data, split it up and stuffed into in a vector store, we can build our retrieval chain. 

In [57]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>
                                        
Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

If we wanted to simplify things, we can pass this in directly to the retrieval chain.


In [59]:
from langchain_core.documents import Document

document_chain.invoke({
    "input": "What great bands are playing music in Austin tonight?",
    "context": [Document(page_content="A great honkey tonk band is always playing at the White Horse")]
})

'A great honkey tonk band is always playing at the White Horse.'

However, we want to pull this from our vector store, which was populated by the WebBaseLoder

In [61]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

We can now invoke our chain. This returns a dictionary, with the response from the LLM in the answer key

In [63]:
response = retrieval_chain.invoke({"input": "What fun bands are playing in Austin tonight?"})
print(response["answer"])

Some fun bands playing in Austin tonight are Chris Gage, Church on Monday with Elias Haslanger & Dr. James Polk, and Lonelyland with the Pat Byrne Band.
