In [1]:
!pip install langchain==0.0.208 deeplake==3.9.27 openai==0.27.8 tiktoken

Collecting langchain==0.0.208
  Downloading langchain-0.0.208-py3-none-any.whl.metadata (13 kB)
Collecting deeplake==3.9.27
  Downloading deeplake-3.9.27.tar.gz (618 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m618.7/618.7 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting openai==0.27.8
  Downloading openai-0.27.8-py3-none-any.whl.metadata (13 kB)
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain==0.0.208)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl.metadata (22 kB)
Collecting langchainplus-sdk>=0.0.13 (from langchain==0.0.208)
  Downloading langchainplus_sdk-0.0.20-py3-none-any.whl.metadata (8.7 kB)
Collecting openapi-schema-pydantic<2.0,

In [7]:
import os
from langchain.llms import OpenAI
from google.colab import userdata

# Get the API key from Colab's userdata
openai_api_key = userdata.get('OPENAI_API_KEY')

# Set it as an environment variable
os.environ["OPENAI_API_KEY"] = openai_api_key

# Now initialize LangChain
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.9)

In [9]:
text = "Suggest a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities."
print(llm(text))



Monday:
Warm up: 5-10 minutes of brisk walking or light jog

Circuit training:
- 10 push-ups
- 10 squats
- 10 lunges (each leg)
- 30 seconds of high knees
- 10 mountain climbers
Repeat circuit 3 times

Outdoor activity:
- 30 minute hike or trail run

Tuesday:
Warm up: 5-10 minutes of dynamic stretches (leg swings, arm circles, etc.)

Cardio:
- 25 minutes of cycling
- 10 minutes of jump rope
- 15 minutes of stair climbing
- 5 minutes of sprints (30 seconds on, 30 seconds off)
Cool down: 5-10 minutes of walking

Wednesday:
Active rest day:
- Take a leisurely bike ride or walk in nature for 30-45 minutes

Thursday:
Warm up: 5-10 minutes of light jog and dynamic stretches

HIIT (High Intensity Interval Training):
- 1 minute of burpees
- 1 minute of jumping jacks
- 1 minute of high knees
- 1 minute of mountain climbers
- 1 minute of rest
Repeat circuit 3 times

Outdoor activity:
- 30 minutes of swimming


## Chains

In LangChain, a chain is an end-to-end wrapper around multiple individual components, providing a way to accomplish a common use case by combining these components in a specific sequence. The most commonly used type of chain is the `LLMChain`, which consists of a `PromptTemplate`, a model (either an LLM or a ChatModel), and an optional output parser.

The `LLMChain` works as follows:

1. Takes (multiple) input variables.
2. Uses the `PromptTemplate` to format the input variables into a prompt.
3. Passes the formatted prompt to the model (LLM or ChatModel).
4. If an output parser is provided, it uses the `OutputParser` to parse the output of the LLM into a final format.

In the next example, we demonstrate how to create a chain that generates a possible name for a company that produces eco-friendly water bottles. By using LangChain's LLMChain, `PromptTemplate`, and `OpenAI` classes, we can easily define our prompt, set the input variables, and generate creative outputs.

In [11]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.9)

prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)

chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("eco-friendly water bottles"))



1. GreenWave Bottles
2. EarthEco Water
3. PureCycle Bottles
4. Sustainable Sips
5. EcoHydrate Co.
6. ReNewBottle Co.
7. PlanetPak Bottles
8. EcoAqua Solutions
9. EarthSwell Bottles
10. CleanCanteen Co.


## The Memory

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses. Memory, such as `ConversationBufferMemory`, acts as a wrapper around `ChatMessageHistory`, extracting the messages and providing them to the chain for better context-aware generation.

In [13]:
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

# Start the conversation
conversation.predict(input="Tell me about yourself.")

# Continue the conversation
conversation.predict(input="What can you do?")
conversation.predict(input="How can you help me with data analysis?")

# Display the conversation
print(conversation.memory.buffer)





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Tell me about yourself.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Tell me about yourself.
AI:  Well, I am an artificial intelligence program designed and created by a team of programmers. I am constantly learning and improving my abilities through algorithms and data analysis. My main purpose is to assist and provide information to user

## Deeplake Vector store

In [15]:
# Get the activeloop token from Colab's userdata
activeloop_token = userdata.get('ACTIVELOOP_TOKEN')

# Set it as an environment variable
os.environ["ACTIVELOOP_TOKEN"] = activeloop_token

activeloop_org_id = userdata.get('ACTIVELOOP_ORG_ID')

os.environ["ACTIVELOOP_ORG_ID"] = activeloop_org_id

In [16]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

texts = [
    "Glorry Sibomana was born on 21 January 2003",
    "Louis XIV was born in 5 September 1638"
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

my_activeloop_org_id =  activeloop_org_id
my_activeloop_dataset_name = "langchain_course_from_zero_to_hero"

dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Your Deep Lake dataset has been successfully created!


Creating 2 embeddings in 1 batches of size 2:: 100%|██████████| 1/1 [00:09<00:00,  9.49s/it]

Dataset(path='hub://glorrysibomana758/langchain_course_from_zero_to_hero', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (2, 1)      str     None   
 metadata     json      (2, 1)      str     None   
 embedding  embedding  (2, 1536)  float32   None   
    id        text      (2, 1)      str     None   





['8ed9e910-c6b8-11ef-b8f8-0242ac1c000c',
 '8ed9ea50-c6b8-11ef-b8f8-0242ac1c000c']

Now, let's create a `RetrievalQA` chain:

In [17]:
retrieval_qa = RetrievalQA.from_chain_type(
	llm=llm,
	chain_type="stuff",
	retriever=db.as_retriever()
)

 let's create an agent that uses the `RetrievalQA` chain as a tool:

In [18]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

agent = initialize_agent(
	tools,
	llm,
	agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
	verbose=True
)

Finally, we can use the agent to ask a question:

In [19]:
response = agent.run("When was Glorry born?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I should use the Retrieval QA System to find the answer
Action: Retrieval QA System
Action Input: "When was Glorry born?"[0m
Observation: [36;1m[1;3m Glorry was born on 21 January 2003.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Glorry was born on 21 January 2003.[0m

[1m> Finished chain.[0m
Glorry was born on 21 January 2003.


Let’s add an example of reloading an existing vector store and adding more data.

We first reload an existing vector store from Deep Lake that's located at a specified dataset path. Then, we load new textual data and split it into manageable chunks. Finally, we add these chunks to the existing dataset, creating and storing corresponding embeddings for each added text segment:



In [20]:
# load the existing Deep Lake dataset and specify the embedding function
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# create new documents
texts = [
    "Kato steven was born on 19th Feb 1997",
    "Kimberly Ann Nambugu was born on 21st March 2000"
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Deep Lake Dataset in hub://glorrysibomana758/langchain_course_from_zero_to_hero already exists, loading from the storage


Creating 2 embeddings in 1 batches of size 2:: 100%|██████████| 1/1 [00:09<00:00,  9.38s/it]

Dataset(path='hub://glorrysibomana758/langchain_course_from_zero_to_hero', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
 embedding  embedding  (4, 1536)  float32   None   
    id        text      (4, 1)      str     None   
 metadata     json      (4, 1)      str     None   
   text       text      (4, 1)      str     None   





['cc447f2a-c6ba-11ef-b8f8-0242ac1c000c',
 'cc448092-c6ba-11ef-b8f8-0242ac1c000c']

We then recreate our previous agent and ask a question that can be answered only by the last documents added.

In [21]:
# instantiate the wrapper class for GPT3
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)

# create a retriever from the db
retrieval_qa = RetrievalQA.from_chain_type(
	llm=llm, chain_type="stuff", retriever=db.as_retriever()
)

# instantiate a tool that uses the retriever
tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

# create an agent that uses the tool
agent = initialize_agent(
	tools,
	llm,
	agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
	verbose=True
)

Let’s now test our agent with a new question.

In [22]:
response = agent.run("When was Kato steven born?")
print(response)



[1m> Entering new  chain...[0m
[32;1m[1;3m I should use the Retrieval QA System to find the answer
Action: Retrieval QA System
Action Input: "When was Kato Steven born?"[0m
Observation: [36;1m[1;3m Kato Steven was born on 19th Feb 1997.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Kato Steven was born on 19th Feb 1997.[0m

[1m> Finished chain.[0m
Kato Steven was born on 19th Feb 1997.


Agents in LangChain
In LangChain, agents are high-level components that use language models (LLMs) to determine which actions to take and in what order. An action can either be using a tool and observing its output or returning it to the user. Tools are functions that perform specific duties, such as Google Search, database lookups, or Python REPL.

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done.

Several types of agents are available in LangChain:

1. The `zero-shot-react-description` agent uses the ReAct framework to decide which tool to employ based purely on the tool's description. It necessitates a description of each tool.
2. The `react-docstore` agent engages with a docstore through the ReAct framework. It needs two tools: a Search tool and a Lookup tool. The Search tool finds a document, and the Lookup tool searches for a term in the most recently discovered document.
3. The `self-ask-with-search` agent employs a single tool named Intermediate Answer, which is capable of looking up factual responses to queries. It is identical to the original self-ask with the search paper, where a Google search API was provided as the tool.
4. The `conversational-react-description` agent is designed for conversational situations. It uses the ReAct framework to select a tool and uses memory to remember past conversation interactions.

In our example, the Agent will use the Google Search tool to look up recent information about the Mars rover and generates a response based on this information.

First, you want to set the environment variables “GOOGLE_API_KEY” and “GOOGLE_CSE_ID” to be able to use Google Search via API.

In [26]:
from langchain.llms import OpenAI

from langchain.agents import AgentType
from langchain.agents import load_tools
from langchain.agents import initialize_agent

from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)

# remember to set the environment variables
# “GOOGLE_API_KEY” and “GOOGLE_CSE_ID” to be able to use
# Google Search via API.
# Get the key from Colab's userdata
google_api_key = userdata.get('GOOGLE_API_KEY')

# Set it as an environment variable
os.environ["GOOGLE_API_KEY"] = google_api_key

google_cse_id = userdata.get('GOOGLE_CSE_ID')

os.environ["GOOGLE_CSE_ID"] = google_cse_id

search = GoogleSearchAPIWrapper()

The `Tool` object represents a specific capability or function the system can use. In this case, it's a tool for performing Google searches.

It is initialized with three parameters:

- `name` parameter: This is a string that serves as a unique identifier for the tool. In this case, the name of the tool is "google-search.”
- `func` parameter: This parameter is assigned the function that the tool will execute when called. In this case, it's the `run` method of the `search` object, which presumably performs a Google search.
- `description` parameter: This is a string that briefly explains what the tool does. The description explains that this tool is helpful when you need to use Google to answer questions about current events.

In [27]:
tools = [
    Tool(
        name = "google-search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    )
]

Next, we create an agent that uses our Google Search tool:

- `initialize_agent()`: This function call creates and initializes an agent. An agent is a component that determines which actions to take based on user input. These actions can be using a tool, returning a response to the user, or something else.
- `tools:`  represents the list of Tool objects that the agent can use.
- `agent="zero-shot-react-description"`: The "zero-shot-react-description" type of an Agent uses the ReAct framework to decide which tool to use based only on the tool's description.
- `verbose=True`: when set to True, it will cause the Agent to print more detailed information about what it's doing. This is useful for debugging and understanding what's happening under the hood.
- `max_iterations=6`: sets a limit on the number of iterations the Agent can perform before stopping. It's a way of preventing the agent from running indefinitely in some cases, which may have unwanted monetary costs.

In [28]:
agent = initialize_agent(tools,
                         llm,
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                         verbose=True,
                         max_iterations=6)

And now, we can check out the response:

In [31]:
response = agent("What's the latest news about the plane crash in South Korea?")
print(response['output'])



[1m> Entering new  chain...[0m
[32;1m[1;3m I should search for news articles about the plane crash
Action: google-search
Action Input: "South Korea plane crash"[0m
Observation: [36;1m[1;3m22 hours ago ... A video captured the moments before a plane carrying 181 passengers crashed while landing in an airport in South Korea. 2 days ago ... More than 170 killed after South Korean jet crash-lands at airport. Here's what we know · Efforts are made to lift the wreckage of an aircraft ... 10 hours ago ... A total of 179 of the 181 people travelling on the Boeing 737-800 were killed, with just two survivors - both cabin staff - pulled from the ... 5 hours ago ... Boeing Shares Drop After South Korean Crash ... A widely used Boeing aircraft, the 737-800, was involved in Sunday's crash-landing of a Jeju Air ... 12 hours ago ... The Jeju Air crash in South Korea on Sunday is the deadliest aviation disaster to hit the country since 1997, when a Korean Airlines flight ... 12 hours ago ... 




Observation: [36;1m[1;3m18 hours ago ... The passenger plane crash that killed 179 in South Korea comes at a time of ... confirmed two Thai passengers were among the fatalities. In ... 24 hours ago ... Reporting from Seoul, South Korea. The National Fire Agency has confirmed a death toll of 167 so far. Officials are raising the death toll as ... 1 day ago ... Death toll from plane fire at South Korean airport rises to 120South Korea's national fire agency says 120 people have been confirmed dead after ... 9 hours ago ... SEOUL, South Korea — Two people survived and 179 were confirmed ... As the confirmed death toll ticked up, anxious families gathered at ... 12 hours ago ... It is the deadliest aviation disaster to hit South Korea since 1997, when a Korean Airlines Boeing 747 crashed in the Guam jungle, leaving 228 ... 23 hours ago ... South Korea plane crash live: 151 of 181 pax confirmed dead, toll likely to rise. Confirmed death toll of the 'Jeju Air Flight 2216 Accident ... 2 da

In summary, Agents in LangChain help decide which actions to take based on user input. The example demonstrates initializing and using a "zero-shot-react-description" agent with a Google search tool.

## Tools in LangChain

LangChain provides a variety of tools for agents to interact with the outside world. These tools can be used to create custom agents that perform various tasks, such as searching the web, answering questions, or running Python code. In this section, we will discuss the different tool types available in LangChain and provide examples of creating and using them.

In our example, two tools are being defined for use within a LangChain agent: a Google Search tool and a Language Model tool acting specifically as a text summarizer. The Google Search tool, using the GoogleSearchAPIWrapper, will handle queries that involve finding recent event information. The Language Model tool leverages the capabilities of a language model to summarize texts. These tools are designed to be used interchangeably by the agent, depending on the nature of the user's query.

Let’s import the necessary libraries.

In [33]:
from langchain.llms import OpenAI
from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.agents import initialize_agent, AgentType

We then instantiate a  `LLMChain` specifically for text summarization.

In [34]:
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)

prompt = PromptTemplate(
    input_variables=["query"],
    template="Write a summary of the following text: {query}"
)

summarize_chain = LLMChain(llm=llm, prompt=prompt)

Next, we create the tools that our agent will use.

In [36]:
# remember to set the environment variables
# “GOOGLE_API_KEY” and “GOOGLE_CSE_ID” to be able to use
# Google Search via API. Check above
search = GoogleSearchAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for finding information about recent events"
    ),
    Tool(
       name='Summarizer',
       func=summarize_chain.run,
       description='useful for summarizing texts'
    )
]

We are now ready to create our agent that leverages two tools.

In [37]:
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Let’s run the agent with a question about summarizing the latest news about the Mars rover.

In [43]:
response = agent("What's the latest news about Jimmy carter death? Then please summarize the results.")
print(response['output'])



[1m> Entering new  chain...[0m
[32;1m[1;3m I should use the search tool to find information about recent events.
Action: Search
Action Input: "Jimmy Carter death"[0m
Observation: [36;1m[1;3m19 hours ago ... 29, 2024) — Jimmy Carter, 39th president of the United States and winner of the 2002 Nobel Peace Prize, died peacefully Sunday, Dec. 29, at ... 13 hours ago ... It is my solemn duty to announce officially the death of James Earl Carter, Jr., the thirty-ninth President of the United States, on December 29 ... James Earl Carter Jr. (October 1, 1924 – December 29, 2024) was an American politician and humanitarian who served as the 39th president of the United ... 19 hours ago ... Carter, the 39th president of the United States, died in Plains on Sunday at age 100. Carter was president from 1977 to 1981, but he was ... 8 hours ago ... Former US President Jimmy Carter has died aged 100, the centre he founded has confirmed. The former peanut farmer lived longer than any ... 18 ho

Notice how the agents used at first the “Search” tool to look for recent information about the Mars rover and then used the “Summarizer” tool for writing a summary.

LangChain provides an expansive toolkit that integrates various functions to improve the functionality of conversational agents. Here are some examples:

- `SerpAPI`: This tool is an interface for the SerpAPI search engine, allowing the agent to perform robust online searches to pull in relevant data for a conversation or task.
- `PythonREPLTool`: This unique tool enables the writing and execution of Python code within an agent. This opens up a wide range of possibilities for advanced computations and interactions within the conversation.

If you wish to add more specialized capabilities to your LangChain conversational agent, the platform offers the flexibility to create custom tools. By following the general tool creation guidelines provided in the LangChain documentation, you can develop tools tailored to the specific needs of your application.