Content sourced from https://www.langchain.com/

In [None]:
!pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai chromadb bs4

In [70]:
import getpass
import os
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_core.prompts import PromptTemplate

# Langchain Basics

## Chains
https://python.langchain.com/docs/modules/chains

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to do this is with LangChain Expression Language (LCEL). The components of the chain are executed sequentially, with the output of each component acting as the input to the next component. The following contains a simple chain where we define a question/prompt, give the prompt to the model, and parse the model's output:

In [55]:
# INSERT YOUR OPENAI API KEY: https://platform.openai.com/docs/models , make an account
os.environ["OPENAI_API_KEY"] = '' 

In [58]:
# chatprompttemplate: https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html
prompt = ChatPromptTemplate.from_template("tell me a short joke about {topic}")
# using OpenAI's pre-trained LLM
model = OpenAI()
# converts the LLM output into something humans can understand
# stroutputparserL https://api.python.langchain.com/en/latest/output_parsers/langchain_core.output_parsers.string.StrOutputParser.html
output_parser = StrOutputParser()

# chains are constructed with the pipe operator: '|'
chain = prompt | model | output_parser

# invoke executes the chain
chain.invoke({"topic": "ice cream"})

'\n\nRobot: Why did the ice cream go to therapy? Because it was feeling a little rocky road.'

## Retrieval Augmented Generation 
https://python.langchain.com/docs/use_cases/question_answering/
https://python.langchain.com/docs/use_cases/question_answering/quickstart

RAG is a technique for augmenting LLM knowledge with additional data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

In the code below, two methods are shown on how to incporate RAG with OpenAI's LLM by training the model on the following web article: 'LLM Powered Autonomous Agents' by Lilian Weng. Before a model can retrieve data or generate responses, the new data must be indexed.

### 1. Indexing
![Indexing](useful_figures/rag_indexing.png) 

#### 1A. Indexing: Load

We need to first load the blog post contents. We can use DocumentLoaders for this, which are objects that load in data from a source and return a list of Documents. A Document is an object with some page_content (str) and metadata (dict).

In this case we’ll use the WebBaseLoader, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text. We can customize the HTML -> text parsing by passing in parameters to the BeautifulSoup parser via bs_kwargs (see BeautifulSoup docs). In this case only HTML tags with class “post-content”, “post-title”, or “post-header” are relevant, so we’ll remove all others.

For more info on types of document loaders that LangChain offers: https://python.langchain.com/docs/modules/data_connection/document_loaders/

In [72]:
# beautiful soup is a package for pulling data out of HTML & XML files: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

# only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)

docs = loader.load()

In [73]:
# We can view the contents of the document object 
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


#### 1B. Indexing: Split
Our loaded document is over 42k characters long. This is too long to fit in the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the blog post at run time.

In this case we’ll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. We use the RecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

We set add_start_index=True so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.

In [19]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)

# here we are splitting our document object into smaller chunks
all_splits = text_splitter.split_documents(docs)

In [74]:
print(len(all_splits), ',', len(all_splits[0].page_content))

66 , 969


#### 1C. Indexing: Store
Now we need to index our 66 text chunks so that we can search over them at runtime. The most common way to do this is to embed the contents of each document split and insert these embeddings into a vector database (or vector store). When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors).

We can embed and store all of our document splits in a single command using the Chroma vector store and OpenAIEmbeddings model.

In [76]:
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

### 2. Retrieval and Generation
![Retrieval](useful_figures/rag_retrieval_generation.png) 

#### 2A: Retrieval
Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

First we need to define our logic for searching over documents. LangChain defines a Retriever interface which wraps an index that can return relevant Documents given a string query.

The most common type of Retriever is the VectorStoreRetriever, which uses the similarity search capabilities of a vector store to facilitate retrieval. Any VectorStore can easily be turned into a Retriever with VectorStore.as_retriever():

In [78]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

In [79]:
# the retriever is searching for chunks which seem relevant to the prompt
# note: retriever.invoke means that the retreiver is a chain itself
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

In [28]:
# the output below shows that 6 chunks were relevant to the question
len(retrieved_docs)

6

In [80]:
print(retrieved_docs[0].page_content)

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


#### 2A: Generate
Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

We’ll use the gpt-3.5-turbo OpenAI chat model, but any LangChain LLM or ChatModel could be substituted in.

In [81]:
# defining our model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [82]:
# rag-prompt: https://smith.langchain.com/hub/rlm/rag-prompt?organizationId=bf831fe5-56d8-572d-a9ca-fea6f3d0f30e
# here we are using a pre-made prompt from langchain
prompt = hub.pull("rlm/rag-prompt")

In [87]:
# the prompt we pulled contains {context} and {question} for us to customize
# .tomessages converts the output from a langchain prompt value object to a list
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
example_messages

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

We’ll use the LCEL Runnable protocol to define the chain, allowing us to - pipe together components and functions in a transparent way - automatically trace our chain in LangSmith - get streaming, async, and batched calling out of the box.

RunnablePassthrough: RunnablePassthrough allows to pass inputs unchanged or with the addition of extra keys. This typically is used in conjuction with RunnableParallel to assign data to a new key in the map. RunnablePassthrough() called on it’s own, will simply take the input and pass it through. (https://python.langchain.com/docs/expression_language/how_to/passthrough)

In [89]:
# this function will be useful to join together the relevant chunks we retrieved earlier
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# now we can construct our chain using all of the components we defined
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [90]:
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This process involves transforming big tasks into multiple manageable tasks to enhance model performance. It can be done through simple prompting, task-specific instructions, or with human inputs.

### Customizing the prompt
As shown above, we can load prompts (e.g., this RAG prompt) from the prompt hub. The prompt can also be easily customized:

In [41]:
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

custom_rag_prompt.pretty_print()

Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

[33;1m[1;3m{context}[0m

Question: [33;1m[1;3m{question}[0m

Helpful Answer:


In [42]:
rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is the process of breaking down complex tasks into smaller and simpler steps to make them more manageable. This can be done using techniques like Chain of Thought or Tree of Thoughts to guide the model in decomposing hard tasks effectively. Task decomposition can be achieved through simple prompting, task-specific instructions, or human inputs. Thanks for asking!'

### Using Chains to to query databases
https://python.langchain.com/docs/use_cases/sql/quickstart

In [53]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool
from langchain_community.utilities import SQLDatabase
import os

To run the following code, you must have the file Chinook.db in the same folder as this Jupyter notebook. Run the following line by line:

curl -O https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql

sqlite3 Chinook.db

.read Chinook_Sqlite.sql

In [46]:
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM Artist LIMIT 10;")

sqlite
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


"[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]"

Let’s create a simple chain that takes a question, turns it into a SQL query, executes the query, and uses the result to answer the original question.

Convert question to SQL query
The first step in a SQL chain or agent is to take the user input and convert it to a SQL query. LangChain comes with a built-in chain for this: create_sql_query_chain

In [47]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

In [48]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = create_sql_query_chain(llm, db)
response = chain.invoke({"question": "How many employees are there"})
response

'SELECT COUNT("EmployeeId") AS "TotalEmployees" FROM "Employee"'

In [49]:
db.run(response)

'[(8,)]'

Execute SQL query
Now that we’ve generated a SQL query, we’ll want to execute it. This is the most dangerous part of creating a SQL chain. Consider carefully if it is OK to run automated queries over your data. Minimize the database connection permissions as much as possible. Consider adding a human approval step to you chains before query execution (see below).

We can use the QuerySQLDatabaseTool to easily add query execution to our chain:

In [51]:
execute_query = QuerySQLDataBaseTool(db=db)
write_query = create_sql_query_chain(llm, db)
chain = write_query | execute_query
chain.invoke({"question": "How many employees are there"})

'[(8,)]'

#### Answer the question
Now that we’ve got a way to automatically generate and execute queries, we just need to combine the original question and SQL query result to generate a final answer. We can do this by passing question and result to the LLM once more:

In [52]:
from operator import itemgetter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

answer = answer_prompt | llm | StrOutputParser()
chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | execute_query
    )
    | answer
)

chain.invoke({"question": "How many employees are there"})

'There are a total of 8 employees.'

## Related Topics
- Retain conversations with a chat history: https://python.langchain.com/docs/use_cases/question_answering/chat_history

- Using a locally hosted LLM for RAG: https://python.langchain.com/docs/use_cases/question_answering/local_retrieval_qa

- Streaming intermediate steps: https://python.langchain.com/docs/use_cases/question_answering/streaming