# LangChain: Introduction

This seminar will introduce various concepts and tools that can help to build AI projects with LangChain. It is based on materials from activeloop.ai 

We will cover:
- LLM invoking in LangChain
- LLM Chains
- Memory
- Retrieval-augmented generation
- Agents & Tools

## LLM

The fundamental component of LangChain involves invoking an LLM with a specific input. To illustrate this, we'll explore a simple example. Let's imagine we are building a service that suggests personalized workout routines based on an individual's fitness goals and preferences.

To accomplish this, we will first need to import the LLM wrapper.

In [38]:
from langchain_openai.chat_models import ChatOpenAI  # Chat wrapper
from dotenv import load_dotenv

# load ENVs
load_dotenv('.env') # returns True if 

True

In [21]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

The temperature parameter in OpenAI models manages the randomness of the output. When set to 0, the output is mostly predetermined and suitable for tasks requiring stability and the most probable result. Possible values: 0-2

In [22]:
text = "Suggest a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities."

# the easiest way is to call is to call chat model as LLM
print(llm.call_as_llm(text))

Here is a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities:

1. Warm-up: Start with a 5-10 minute brisk walk or jog to get your heart rate up and prepare your muscles for the workout.

2. Interval Running: Find a nearby park or trail with varying terrain and do interval running. Alternate between jogging and sprinting for 1-2 minutes each, for a total of 20-30 minutes.

3. Hill Sprints: Find a steep hill and do hill sprints. Sprint up the hill as fast as you can, then walk or jog back down to recover. Repeat for 10-15 minutes.

4. Cycling: Go for a long bike ride on a scenic route with hills and varying terrain. Aim for at least 30-60 minutes of continuous cycling.

5. Stair Climbing: Find a set of stairs or a hill with stairs and do stair climbing intervals. Run up the stairs as fast as you can, then walk back down to recover. Repeat for 15-20 minutes.

6. Hiking: Go for a challenging hike in a nearby nature reserve or

Chat models (e.g. GPT-4, GPT-3.5-turbo) takes messages as inputs. There are 3 types of messages:
* SystemMessage: message that instructs/profiles LLM
* HumanMessage: request from human
* AIMessage: usually response from LLM

In [23]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

# Create list of messages
messages = [HumanMessage(text)]

# __call__ is invoked
llm(messages)

AIMessage(content="Here is a personalized workout routine for someone looking to improve cardiovascular endurance and prefers outdoor activities:\n\n1. Warm-up: Start with a 5-10 minute brisk walk or jog to get your heart rate up and prepare your muscles for the workout.\n\n2. Running intervals: Find a flat or hilly trail or path and alternate between running at a moderate pace for 2-3 minutes and sprinting for 1 minute. Repeat this interval for 20-30 minutes.\n\n3. Cycling: Go for a 30-45 minute bike ride on a scenic route with varying terrain to challenge your cardiovascular system.\n\n4. Hiking: Find a challenging hiking trail with inclines and uneven terrain to improve your endurance and strength. Aim for a 60-90 minute hike at a steady pace.\n\n5. Stair climbing: Find a set of stairs or a hill to climb repeatedly for 15-20 minutes. This will help improve your cardiovascular endurance and leg strength.\n\n6. Cool down: Finish your workout with a 5-10 minute walk or jog to gradually

## The Chains

In LangChain, a chain is an end-to-end wrapper around multiple individual components, providing a way to accomplish a common use case by combining these components in a specific sequence. The most commonly used type of chain is the LLMChain, which consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser.

The LLMChain works as follows:

1. Takes (multiple) input variables.
2. Uses the PromptTemplate to format the input variables into a prompt.
3. Passes the formatted prompt to the model (LLM or ChatModel).
4. If an output parser is provided, it uses the OutputParser to parse the output of the LLM into a final format.

In [24]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)

chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain("eco-friendly water bottles"))

{'product': 'eco-friendly water bottles', 'text': 'EcoHydrate'}


In [25]:
# __call__ is deprecated, better to use invoke
print(chain.invoke("eco-friendly water bottles"))

{'product': 'eco-friendly water bottles', 'text': 'EcoHydrate'}


## The Memory

In LangChain, Memory refers to the mechanism that stores and manages the conversation history between a user and the AI. It helps maintain context and coherency throughout the interaction, enabling the AI to generate more relevant and accurate responses. Memory, such as ConversationBufferMemory, acts as a wrapper around ChatMessageHistory, extracting the messages and providing them to the chain for better context-aware generation.

In [26]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

# Start the conversation
conversation.predict(input="Tell me about yourself.")

# Continue the conversation
conversation.predict(input="What can you do?")
conversation.predict(input="How can you help me with data analysis?")

# Display the conversation
print(conversation)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Tell me about yourself.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Tell me about yourself.
AI: I am an artificial intelligence designed to assist with answering questions and providing information. I have access to a vast amount of data and can provide detailed responses on a wide range of topics. I am c

In this output, you can see the memory being used by observing the "Current conversation" section. After each input from the user, the conversation is updated with both the user's input and the AI's response. When the AI generates its next response, it will use this conversation history as context, making its responses more coherent and relevant.

Deep Lake provides storage for embeddings and their corresponding metadata in the context of LLM apps. It enables hybrid searches on these embeddings and their attributes for efficient data retrieval. It also integrates with LangChain, facilitating the development and deployment of applications.

In order to use Deep Lake, you first have to register on the Activeloop website and redeem your API token. Here are the steps for doing it:

1. Sign up for an account on Activeloop's platform. You can sign up at Activeloop's website. After specifying your username, click on the “Sign up” button. You should now see your homepage.
2. You should now see a “Create API token” button at the top of your homepage. Click on it, and you’ll get redirected to the “API tokens” page. This is where you can generate, manage, and revoke your API keys for accessing Deep Lake.
3. Click on the "Create API token" button. Then, you should see a popup asking for a token name and an expiration date. By default, the token expiration date is set so that the token expires after one day from its creation, but you can set it further in the future if you want to keep using the same token for the whole duration of the course. Once you’ve set the token name and its expiration date, click on the “Create API token” button.
4. You should now see a green banner saying that the token has been successfully generated, along with your new API token, on the “API tokens” page. To copy your token to your clipboard, click on the square icon on its right.


Now that you have your API token, you can conveniently store under the `ACTIVELOOP_TOKEN` key in the environment variable to retrieve it automatically by the Deep Lake libraries whenever needed. You can also save the token to its environment variable with Python, like in the following code snippet.

In [117]:
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

# instantiate embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create the documents
texts = ["""Drawing points don't snap to OHLC values correctly
            If you want drawing points to snap to OHLC values of bars automatically, then you need to enable Magnet Mode on the left-hand side of your chart (the icon should turn blue when enabled). 
            
            Magnet sensitivity can be adjusted through the magnet mode menu.
            
            Strong Magnet pulls the drawing points to chart values (bars), regardless of the distance between the points and chart values.
            
            Weak Magnet pulls the drawing points to chart values (bars) when you are drawing near them.
            """,
         """How can I remove all drawings & indicators from the chart? Click the left panel button to open the drop-down menu, then click Remove Drawing Tools & Indicators.""",
         """I want to quickly find a tool or feature
            There is a quick search dialog to search for tools and features across the platform. You can use it to find and choose any drawing tool, change chart, or scale settings, and also use numerous features for charting and data analysis.
            To open the dialog you can use the “Quick search” button on the top toolbar, but it’s much easier to use hotkeys ⌘ + K (macOS), or Ctrl + K (Windows)."""
]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents(texts)

# create Deep Lake dataset
dataset_path = f"database/rag_website"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Using embedding function is deprecated and will be removed in the future. Please use embedding instead.
Creating 3 embeddings in 1 batches of size 3:: 100%|██████████| 1/1 [00:01<00:00,  1.05s/it]

Dataset(path='database/rag_website', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (3, 1)      str     None   
 metadata     json      (3, 1)      str     None   
 embedding  embedding  (3, 1536)  float32   None   
    id        text      (3, 1)      str     None   





['4f96ffca-ea88-11ee-a342-4ecb56d42115',
 '4f9700ba-ea88-11ee-a342-4ecb56d42115',
 '4f9700e2-ea88-11ee-a342-4ecb56d42115']

Now, let's create a RetrievalQA chain

In [118]:
retrieval_qa = RetrievalQA.from_chain_type(
	llm=llm,
	chain_type="stuff",
	retriever=db.as_retriever(),
    verbose=True
)

In [119]:
retrieval_qa.run('How to find a tool?')



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'To quickly find a tool or feature, you can use the quick search dialog on the platform. You can open the dialog by using the "Quick search" button on the top toolbar or by using the hotkeys ⌘ + K (macOS) or Ctrl + K (Windows). This search function allows you to search for and choose any drawing tool, change chart, or scale settings, and access various features for charting and data analysis.'

In [None]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

tools = [
    Tool(
        name="Retrieval QA System",
        func=retrieval_qa.run,
        description="Useful for answering questions."
    ),
]

agent = initialize_agent(
	tools,
	llm,
	agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
	verbose=True
)

## Agents in LangChain

In LangChain, agents are high-level components that use language models (LLMs) to determine which actions to take and in what order. An action can either be using a tool and observing its output or returning it to the user. Tools are functions that perform specific duties, such as Google Search, database lookups, or Python REPL.

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done.

#### Several types of agents are available in LangChain:

- The `zero-shot-react-description` agent uses the ReAct framework to decide which tool to employ based purely on the tool's description. It necessitates a description of each tool.
- The `react-docstore agent` engages with a docstore through the ReAct framework. It needs two tools: a Search tool and a Lookup tool. The Search tool finds a document, and the Lookup tool searches for a term in the most recently discovered document.
- The `self-ask-with-search` agent employs a single tool named Intermediate Answer, which is capable of looking up factual responses to queries. It is identical to the original self-ask with the search paper, where a Google search API was provided as the tool.
- The `conversational-react-description` agent is designed for conversational situations. It uses the ReAct framework to select a tool and uses memory to remember past conversation interactions.

#### Tools in LangChain

LangChain provides a variety of tools for agents to interact with the outside world. These tools can be used to create custom agents that perform various tasks, such as searching the web, answering questions, or running Python code. In this section, we will discuss the different tool types available in LangChain and provide examples of creating and using them.

In our example, two tools are being defined for use within a LangChain agent: a Google Search tool and a Language Model tool acting specifically as a text summarizer. The Google Search tool, using the GoogleSearchAPIWrapper, will handle queries that involve finding recent event information. The Language Model tool leverages the capabilities of a language model to summarize texts. These tools are designed to be used interchangeably by the agent, depending on the nature of the user's query.

In our example we will write a simple agent that will operate on 4 tools:
- User statistics tool
- Behavior analysis tool
- Get users' activity tool

In [59]:
from langchain.agents import AgentType
from langchain.agents import initialize_agent
from langchain.agents import Tool

In [54]:
import pandas as pd
df = pd.read_excel('data/Online Retail 10k.xlsx')
df = df[~df.CustomerID.isnull()]
df['CustomerID'] = df['CustomerID'].apply(int)
df.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
1,1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
2,2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom
3,3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
4,4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom


In [57]:
df['CustomerID'].value_counts()

CustomerID
17850    297
12748    133
15574    121
18118    118
17059    117
        ... 
17949      1
15769      1
16353      1
15100      1
15823      1
Name: count, Length: 323, dtype: int64

Let's define functions for our tools

In [113]:
def get_top_customer_ids(topn: str) -> str:
    topn = int(topn)
    customer_id2counts = df.CustomerID.value_counts()
    res = []
    for customer_id, count in dict(customer_id2counts[:topn]).items():
        res.append(f'{customer_id}: {count}')
    return '\n'.join(res)


def get_customer_id(n: str) -> str:
    n = int(n)
    return df.iloc[n].CustomerID


def get_user_activity(customer_id: str) -> str: 
    customer_id = customer_id.split(' ')[-1]
    customer_id = int(customer_id)
    cur_df = df[df.CustomerID == customer_id]
    res = [f'{desc}: {quantity}' for desc, quantity in zip(cur_df.Description, cur_df.Quantity)]
    return '\n'.join(res)


prompt_behavior_analysis = PromptTemplate(
    input_variables=["activity"],
    template="Given the history of user activity and purchase on the platform: {activity}.\n\n Summarize the main interests of the user, just include main interests based on his/her activity",
)

bahavior_analysis_chain = LLMChain(llm=llm, prompt=prompt)

In [114]:
tools = [
    Tool(
        name = "get-customer-id",
        func=get_customer_id,
        description="useful to get customer id by its rank among all users",
    ),
    Tool(
        name = "behavior-analysis-by-given-activity",
        func=bahavior_analysis_chain.run,
        description="useful to analyse the user behavior by given activity (activity analysis)"
    ),
    Tool(
        name = "top-customer-ids",
        func=get_top_customer_ids,
        description="useful to get sorted customer list of top-n users"
    ),
    Tool(
        name = "get-user_activity",
        func=get_user_activity,
        description="useful to get activity of given user by customer_id. It takes only number (customer_id) as input",
        
    ),
]

Next, we create an agent that uses our tools:

1. `initialize_agent()`: This function call creates and initializes an agent. An agent is a component that determines which actions to take based on user input. These actions can be using a tool, returning a response to the user, or something else.
2. `tools`:  represents the list of Tool objects that the agent can use.
agent="zero-shot-react-description": The "zero-shot-react-description" type of an Agent uses the ReAct framework to decide which tool to use based only on the tool's description.
3. `verbose`=True: when set to True, it will cause the Agent to print more detailed information about what it's doing. This is useful for debugging and understanding what's happening under the hood.
4. `max_iterations`=6: sets a limit on the number of iterations the Agent can perform before stopping. It's a way of preventing the agent from running indefinitely in some cases, which may have unwanted monetary costs.

In [115]:
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True,
                         max_iterations=6)

In [102]:
agent.run('Provide the list of the most popular-20 users?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find the top 20 users based on their activity.
Action: top-customer-ids
Action Input: 20[0m
Observation: [38;5;200m[1;3m17850: 297
12748: 133
15574: 121
18118: 118
17059: 117
15880: 106
17841: 98
14911: 93
13174: 89
15426: 86
17968: 85
12472: 84
14606: 84
18041: 81
17920: 81
14573: 76
13081: 74
15061: 73
12433: 73
14729: 71[0m
Thought:[32;1m[1;3mI have the list of the top 20 users based on their activity.
Final Answer: 17850, 12748, 15574, 18118, 17059, 15880, 17841, 14911, 13174, 15426, 17968, 12472, 14606, 18041, 17920, 14573, 13081, 15061, 12433, 14729[0m

[1m> Finished chain.[0m


'17850, 12748, 15574, 18118, 17059, 15880, 17841, 14911, 13174, 15426, 17968, 12472, 14606, 18041, 17920, 14573, 13081, 15061, 12433, 14729'

In [103]:
agent.run('Provide the list of actions of user with id 14729?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to get the activity of user with id 14729.
Action: get-user_activity
Action Input: 14729[0m
Observation: [36;1m[1;3mCLASSIC FRENCH STYLE BASKET NATURAL: 1
CRAZY DAISY HEART DECORATION: 1
GIN + TONIC DIET METAL SIGN: 1
IVORY GIANT GARDEN THERMOMETER: 1
CARD HOLDER GINGHAM HEART: 2
SMALL STRIPES CHOCOLATE GIFT BAG : 16
S/6 WOODEN SKITTLES IN COTTON BAG: 1
HAND WARMER RED RETROSPOT: 4
HAND WARMER UNION JACK: 3
FRENCH WC SIGN BLUE METAL: 1
PINK UNION JACK  LUGGAGE TAG: 1
TRAY, BREAKFAST IN BED: 1
PIZZA PLATE IN BOX: 2
HEART OF WICKER LARGE: 1
PINK HEART DOTS HOT WATER BOTTLE: 1
PLACE SETTING WHITE HEART: 14
SET OF 4 NAPKIN CHARMS LEAVES   : 4
SET OF 4 NAPKIN CHARMS STARS   : 4
SET OF 4 NAPKIN CHARMS CUTLERY: 3
CARAVAN SQUARE TISSUE BOX: 2
PLASTERS IN TIN SKULLS: 1
PLASTERS IN TIN SPACEBOY: 1
BUTTON BOX : 1
PLASTERS IN TIN VINTAGE PAISLEY : 2
12 MESSAGE CARDS WITH ENVELOPES: 6
SMALL RED BABUSHKA NOTEBOOK : 1
CUPCAKE LACE

'The list of actions of user with id 14729 are as follows:\n- CLASSIC FRENCH STYLE BASKET NATURAL: 1\n- CRAZY DAISY HEART DECORATION: 1\n- GIN + TONIC DIET METAL SIGN: 1\n- IVORY GIANT GARDEN THERMOMETER: 1\n- CARD HOLDER GINGHAM HEART: 2\n- SMALL STRIPES CHOCOLATE GIFT BAG : 16\n- S/6 WOODEN SKITTLES IN COTTON BAG: 1\n- HAND WARMER RED RETROSPOT: 4\n- HAND WARMER UNION JACK: 3\n- FRENCH WC SIGN BLUE METAL: 1\n- PINK UNION JACK  LUGGAGE TAG: 1\n- TRAY, BREAKFAST IN BED: 1\n- PIZZA PLATE IN BOX: 2\n- HEART OF WICKER LARGE: 1\n- PINK HEART DOTS HOT WATER BOTTLE: 1\n- PLACE SETTING WHITE HEART: 14\n- SET OF 4 NAPKIN CHARMS LEAVES   : 4\n- SET OF 4 NAPKIN CHARMS STARS   : 4\n- SET OF 4 NAPKIN CHARMS CUTLERY: 3\n- CARAVAN SQUARE TISSUE BOX: 2\n- PLASTERS IN TIN SKULLS: 1\n- PLASTERS IN TIN SPACEBOY: 1\n- BUTTON BOX : 1\n- PLASTERS IN TIN VINTAGE PAISLEY : 2\n- 12 MESSAGE CARDS WITH ENVELOPES: 6\n- SMALL RED BABUSHKA NOTEBOOK : 1\n- CUPCAKE LACE PAPER SET 6: 5\n- PACK 3 BOXES CHRISTMAS PANNE

In [116]:
agent.run('Analyze activity of a user on 30th place')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to first get the customer id of the user on 30th place.
Action: get-customer-id
Action Input: 30[0m
Observation: [36;1m[1;3m12583[0m
Thought:[32;1m[1;3mNow that I have the customer id, I can analyze the activity of that user.
Action: get-user_activity
Action Input: 12583[0m
Observation: [36;1m[1;3mALARM CLOCK BAKELIKE PINK: 24
ALARM CLOCK BAKELIKE RED : 24
ALARM CLOCK BAKELIKE GREEN: 12
PANDA AND BUNNIES STICKER SHEET: 12
STARS GIFT TAPE : 24
INFLATABLE POLITICAL GLOBE : 48
VINTAGE HEADS AND TAILS CARD GAME : 24
SET/2 RED RETROSPOT TEA TOWELS : 18
ROUND SNACK BOXES SET OF4 WOODLAND : 24
SPACEBOY LUNCH BOX : 24
LUNCH BOX I LOVE LONDON: 24
CIRCUS PARADE LUNCH BOX : 24
CHARLOTTE BAG DOLLY GIRL DESIGN: 20
RED TOADSTOOL LED NIGHT LIGHT: 24
 SET 2 TEA TOWELS I LOVE LONDON : 24
VINTAGE SEASIDE JIGSAW PUZZLES: 12
MINI JIGSAW CIRCUS PARADE : 24
MINI JIGSAW SPACEBOY: 24
MINI PAINT SET VINTAGE : 36
POSTAGE: 3[0m
Thought

'The activity of the user on 30th place includes various items such as alarm clocks, sticker sheets, gift tape, inflatable globe, vintage card game, tea towels, snack boxes, lunch boxes, bags, night light, jigsaw puzzles, mini paint set, and postage.'