# LangChain and OpenAI Integration Notebook

This notebook demonstrates how to use LangChain with OpenAI for various NLP tasks. We'll walk through the installation, setup, and usage of different LangChain components step-by-step.

## Installation

First, we need to install the required packages.

In [None]:
pip install -r requirements.txt

## Environment Setup

Next, we set up environment variables necessary for LangChain and OpenAI API access.

### Example .env

LANGCHAIN_TRACING_V2=true

LANGCHAIN_API_KEY={{LANGCHAIN_API_KEY}}

OPENAI_API_KEY={{OPENAI_API_KEY}}

TAVILY_API_KEY={{TAVILY_API_KEY}}

In [1]:
from dotenv import load_dotenv

# Load the .env file
load_dotenv()


True

## LangChain and OpenAI Setup

Import and initialize the ChatOpenAI model from LangChain.


In [2]:
from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

### Testing LangSmith

Invoke the model to see how LangSmith can assist with testing.


In [4]:
llm.invoke("how can langsmith help with testing?")

{'Accept': 'application/json', 'Content-Type': 'application/json', 'User-Agent': 'OpenAI/Python 1.31.0', 'X-Stainless-Lang': 'python', 'X-Stainless-Package-Version': '1.31.0', 'X-Stainless-OS': 'MacOS', 'X-Stainless-Arch': 'x64', 'X-Stainless-Runtime': 'CPython', 'X-Stainless-Runtime-Version': '3.12.3', 'Authorization': 'Bearer sk-proj-piS1lV36b8mMtvTFzoBeT3BlbkFJ9Afr0m9XpiAOI1yoqI81', 'X-Stainless-Async': 'false'}
Bearer sk-proj-piS1lV36b8mMtvTFzoBeT3BlbkFJ9Afr0m9XpiAOI1yoqI81


AIMessage(content='Langsmith can help with testing in the following ways:\n\n1. Language Support: Langsmith can provide language-specific libraries and tools for automated testing, making it easier to write and run tests in different programming languages.\n\n2. Test Automation: Langsmith can automate the process of running tests, making it easier to test code changes and updates quickly and efficiently.\n\n3. Test Reporting: Langsmith can generate detailed reports on test results, helping developers identify and fix bugs more effectively.\n\n4. Integration Testing: Langsmith can help with integration testing by providing tools and frameworks for testing how different components of a system work together.\n\n5. Performance Testing: Langsmith can help with performance testing by providing tools and libraries for measuring and analyzing the performance of code and systems.\n\nOverall, Langsmith can streamline the testing process and provide developers with the tools they need to ensure t

## Creating a Chat Prompt Template

Use `ChatPromptTemplate` to define a prompt for our LLM.


In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world class technical documentation writer. Keep responses short and concise"),
    ("user", "{input}")
])

### Combining Prompt and Model

Create a chain that combines the prompt and the model.


In [None]:
chain = prompt | llm 

### Invoking the Chain

Invoke the chain to see how LangSmith can assist with testing.


In [None]:
chain.invoke({"input": "how can langsmith help with testing?"})

## Parsing Output

Use `StrOutputParser` to parse the output of the model.


In [None]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

chain = prompt | llm | output_parser

### Invoking the Chain with Output Parser

Invoke the chain with the output parser to get a structured response.


In [None]:
chain.invoke({"input": "how can langsmith help with testing?"})

## Document Loading

Load documents from a webpage using `WebBaseLoader`.


In [None]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()

## Embeddings

Generate embeddings for the documents using OpenAI.


In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

## Text Splitting

### Recursive Character Text Splitter

The `RecursiveCharacterTextSplitter` splits documents based on character count while maintaining coherence in the chunks. This is useful for processing large texts in manageable pieces.

### How `split_documents` Works

1. **Initialization**:
   - The `RecursiveCharacterTextSplitter` is initialized with parameters such as `chunk_size` and `chunk_overlap`. These parameters determine the maximum size of each chunk and the amount of overlap between consecutive chunks, respectively.

2. **Splitting Process**:
   - **Character Limit**: The text is split into chunks based on a specified character limit (`chunk_size`). This ensures that each chunk does not exceed a certain length, which can be beneficial for processing in NLP models that have input size limitations.
   - **Recursive Splitting**: If a segment of text still exceeds the `chunk_size`, the splitter recursively breaks it down further until each chunk is within the desired size.
   - **Overlap**: To maintain context between chunks, a certain number of characters from the end of one chunk are included at the beginning of the next chunk. This overlap (`chunk_overlap`) helps in preserving the continuity of the text across chunks.

3. **Final Output**:
   - The method returns a list of smaller documents or text chunks. Each chunk is a substring of the original document, and these chunks are designed to be contextually coherent and manageable in size.

### Example of How It Works

Here's a simple conceptual example:

Suppose you have a document that is a long paragraph of 1000 characters, and you set `chunk_size` to 200 and `chunk_overlap` to 50.

- **First Chunk**: Characters 1 to 200.
- **Second Chunk**: Characters 151 to 350 (including 50 characters overlap with the first chunk).
- **Third Chunk**: Characters 301 to 500, and so on.

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

## Retrieval-Augmented Generation

### Process Overview

1. **Query the Retriever**: The input question is passed to the retriever to find relevant documents.
2. **Combine and Process Documents**: The retrieved documents are combined into a context for the LLM.
3. **Generate Response with LLM**: The LLM generates a response based on the context and the question.

### Benefits of This Approach

- **Relevance**: The retriever ensures that the documents provided to the LLM are highly relevant to the query, improving the quality of the generated response.
- **Efficiency**: By retrieving only the most relevant documents, the system can efficiently handle large document collections.
- **Contextual Accuracy**: The combined approach allows the LLM to generate responses that are more accurate and contextually appropriate, as it has access to pertinent information retrieved by the retriever.

This setup effectively leverages the strengths of both the retriever and the LLM, providing a powerful solution for generating detailed and context-aware responses.

In [None]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

### Invoking the Retrieval Chain

Invoke the retrieval chain to answer the question based on the context from the retrieved documents.


In [None]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

# LangSmith offers several features that can help with testing:...

## History-Aware Retriever

Use history-aware retrieval to incorporate conversation history into the context for more accurate and relevant responses.


In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

# First we need a prompt that we can pass into an LLM to generate this search query

prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
retriever_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})

### Combining History and Context

Combine the chat history and document context to generate more relevant responses.


In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

In [None]:
chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})

## Creating a Retriever Tool

Create a tool for retrieving information about LangSmith.


In [None]:
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "langsmith_search",
    "Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)

## Additional Tools

Integrate additional tools such as Tavily Search for broader information retrieval.


In [None]:
from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults()
tools = [retriever_tool, search]

## Creating an OpenAI Functions Agent

Create an agent that uses OpenAI functions and integrates the defined tools.


In [None]:
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-functions-agent")

# You need to set OPENAI_API_KEY environment variable or pass it as argument `api_key`.
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

### Invoking the Agent

Invoke the agent to see how LangSmith can help with testing.


In [None]:
agent_executor.invoke({"input": "how can langsmith help with testing?"})

### Additional Queries

Invoke the agent with additional queries.


In [None]:
agent_executor.invoke({"input": "what is the weather in SF?"})

### History-Aware Agent

Use the history-aware agent for conversation-based queries.


In [None]:
chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
agent_executor.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})