# 🦜️🔗 1 - LangChain Overview

LangChain is a framework designed to empower developers to build applications powered by language models. While many applications may simply call language models via an API, LangChain goes beyond this by enabling applications to be data-aware and agentic. This means that LangChain enables language models to connect with other sources of data and interact with their environment, resulting in even more powerful and differentiated applications.
<img src="./star-history-langchain.png" alt="LangChain star history" width="600"/>


### LangChain help us chain the LLM services with different data sources or enviornments
<img src="./LangChain_Overview.png" alt="LangChain Overview" width="800"/>

# 🦜️🔗 2 - LangChain Components


### LangChain provide 7 components (as of now)

- #### Schema
Schema covers the basic data types that are used throughout the LangChain, like Text, ChatMessages, Examples and Document.

- #### Models 
This includes generic interface for all LLMs like OPENAI, HuggingFace, AzureOpenAI and common utilities for working with LLMs.

- #### Prompts
This includes prompt management, prompt optimization and etc.

- #### Memory
Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

- #### Agents
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.

- #### Indexes
Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains. Like Document Loaders, Text Splitters, VectorStores, Retrievers.

- #### Chains
Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.


# 🦜️🔗 3 - LangChain Use Cases

### Use case 1 - LLM service and langChain agent for Q&A

In [1]:
import os
from langchain.llms import AzureOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ["OPENAI_API_BASE"] = "https://qucy-openai-test.openai.azure.com/"
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
os.environ["SERPAPI_API_KEY"] = os.getenv('SERPAPI_API_KEY')

# create LLM
llm = AzureOpenAI(deployment_name="text-davinci-003", model_name="text-davinci-003")

In [2]:
llm("What is Machine Learning ?")

'\n\nMachine learning is a field of Artificial Intelligence (AI) that enables computers to learn from data without the need for explicit programming. It is a process of using algorithms to parse data and learn from it to make decisions or predictions about new data. Machine learning algorithms are used in a variety of applications, such as data mining, natural language processing, image recognition, and recommendation systems.'

In [3]:
llm("What date is it today ?")

'\n\nToday is November 5th, 2020.'

In [15]:
# only Google search tools
tools = load_tools(["serpapi"])

# init agent and set verbose to true to print out the 
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

# run agent
agent.run("What's the date today ? What great events have taken place today in history?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should try to find out the current date and any events that have happened on that day in history.
Action: Search 
Action Input: "What's the date today"[0m
Observation: [36;1m[1;3mThe date today is Wednesday, April 26, 2023. What is the date today in numbers? Depending where you reside and who's asking the date in numbers can be shown ...[0m
Thought:[32;1m[1;3m I should check for any historical events that occured on this day.
Action: Search
Action Input: "Events on April 26"[0m
Observation: [36;1m[1;3mRossini Leaves Italy · 1859 Dan Sickles is acquitted of murder on grounds of temporary insanity - 1st time this defense used successfully in the US · 1865 ...[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Today is Wednesday, April 26, 2023, and on this day in history, Rossini Left Italy in 1859 and Dan Sickles was acquitted of murder on grounds of temporary insanity in 1865.[0m

[1m> Finished 

'Today is Wednesday, April 26, 2023, and on this day in history, Rossini Left Italy in 1859 and Dan Sickles was acquitted of murder on grounds of temporary insanity in 1865.'

## Avaiable agent type
### zero-shot-react-description
This agent uses the ReAct framework to determine which tool to use based solely on the tool’s description. Any number of tools can be provided. This agent requires that a description is provided for each tool.
<img src="./ReAct.png" alt="LangChain Overview" width="800"/>

### react-docstore
This agent uses the ReAct framework to interact with a docstore. Two tools must be provided: a Search tool and a Lookup tool (they must be named exactly as so). The Search tool should search for a document, while the Lookup tool should lookup a term in the most recently found document. This agent is equivalent to the original ReAct paper, specifically the Wikipedia example.

### self-ask-with-search
This agent utilizes a single tool that should be named Intermediate Answer. This tool should be able to lookup factual answers to questions. This agent is equivalent to the original self ask with search paper, where a Google search API was provided as the tool.

### conversational-react-description
This agent is designed to be used in conversational settings. The prompt is designed to make the agent helpful and conversational. It uses the ReAct framework to decide which tool to use, and uses memory to remember the previous conversation interactions.

In [18]:
# search tools and math tools
tools = load_tools(['serpapi', 'llm-math', 'python_repl'], llm=llm)

# init agent and set verbose to true to print out the 
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

# run agent
agent.run("What is Tesla's profit in 2022 Q1 and 2023 Q1 ? And how many percentage increase or drop in 2023 Q1 ?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find the profit for each quarter and calculate the percentage increase or drop
Action: Search
Action Input: Tesla profit 2022 Q1 and 2023 Q1[0m
Observation: [36;1m[1;3mTesla said net income fell 24% to $2.51 billion, or 73 cents a share, from $3.32 billion, or 95 cents a share, a year ago. related investing ...[0m
Thought:[32;1m[1;3m I need to look for more specific numbers
Action: Search
Action Input: Tesla profit 2022 Q1 and 2023 Q1[0m
Observation: [36;1m[1;3mTesla said net income fell 24% to $2.51 billion, or 73 cents a share, from $3.32 billion, or 95 cents a share, a year ago. related investing ...[0m
Thought:[32;1m[1;3m I need to compare the two quarters
Action: Calculator
Action Input: (3.32-2.51)/2.51[0m
Observation: [33;1m[1;3mAnswer: 0.3227091633466136[0m
Thought:[32;1m[1;3m I need to convert that number to a percentage
Action: Calculator
Action Input: 0.3227091633466136*100[0m
Observat

"Tesla's profit in 2022 Q1 was $3.32 billion and in 2023 Q1 was $2.51 billion. This is a drop of 32.27% in 2023 Q1."

In [20]:
# run agent
agent.run("Write quick sort function and sort list [3,2,3,6,1,9,7,6,0]")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to write a sorting algorithm
Action: Python REPL
Action Input: def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)[0m
Observation: [38;5;200m[1;3m[0m
Thought:[32;1m[1;3m I now need to call the function to sort the list
Action: Python REPL
Action Input: print(quicksort([3,2,3,6,1,9,7,6,0]))[0m
Observation: [38;5;200m[1;3m[0, 1, 2, 3, 3, 6, 6, 7, 9]
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: [0, 1, 2, 3, 3, 6, 6, 7, 9][0m

[1m> Finished chain.[0m


'[0, 1, 2, 3, 3, 6, 6, 7, 9]'

In [4]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase

db = SQLDatabase.from_uri("sqlite:////mnt/disk1/python-projects/Chater-GPT-Backend/db/gpt-demo.db")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# create agent
agent_executor = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)

# run agent
agent_executor.run("What columns do prompt_template table have and how many records in the table ?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input:[0m
Observation: [38;5;200m[1;3msqlite_sequence, users, prompt_template[0m
Thought:[32;1m[1;3m I should check the schema of the most relevant table, prompt_template.
Action: schema_sql_db
Action Input: prompt_template[0m
Observation: [33;1m[1;3m
CREATE TABLE prompt_template (
	id INTEGER, 
	name TEXT NOT NULL, 
	description TEXT NOT NULL, 
	system_prompt TEXT NOT NULL, 
	user_prompt TEXT NOT NULL, 
	few_shots BOOLEAN NOT NULL, 
	created_at DATETIME NOT NULL, 
	updated_at DATETIME NOT NULL, 
	PRIMARY KEY (id)
)

/*
3 rows from prompt_template table:
id	name	description	system_prompt	user_prompt	few_shots	created_at	updated_at
1	Summarization	Prompt template for summarization	{"system": "I want you to be a summerization bot"}	{Summarize <{content}> in {target_language} in {content_length} words.}	True	2023-04-20 13:52:06	2023-04-20 13:52:06
2	Translation	Prompt template for trans

'There are 3 records in the prompt_template table and the columns are id, name, description, system_prompt, user_prompt, few_shots, created_at, and updated_at.'

In [5]:
# create large language model service
llm = AzureOpenAI(deployment_name="text-davinci-003", model_name="text-davinci-003")

# create DB tool
db = SQLDatabase.from_uri("sqlite://///mnt/disk1/python-projects/LangChainDemo/demo.db")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# create agent with DB tool
agent_executor = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)

In [6]:
# ask question using natural languager
agent_executor.run("Which stock has the highest PB ?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input: ""[0m
Observation: [38;5;200m[1;3msqlite_sequence, stocks, credit_card_transactions[0m
Thought:[32;1m[1;3m I should query the schema of the stocks table.
Action: schema_sql_db
Action Input: "stocks"[0m
Observation: [33;1m[1;3m
CREATE TABLE stocks (
	id INTEGER, 
	date DATE NOT NULL, 
	stock_symbol VARCHAR(10) NOT NULL, 
	stock_name VARCHAR(255) NOT NULL, 
	pb FLOAT, 
	pe FLOAT, 
	dividend_yield FLOAT, 
	market_value FLOAT, 
	turnover_ratio FLOAT, 
	historical_high FLOAT, 
	historical_low FLOAT, 
	PRIMARY KEY (id)
)

/*
3 rows from stocks table:
id	date	stock_symbol	stock_name	pb	pe	dividend_yield	market_value	turnover_ratio	historical_high	historical_low
1	2022-01-01	AAPL	Apple Inc.	31.4500000000	27.1100000000	0.6300000000	2500000000000.0000000000	10.5000000000	325.4700000000	15.1200000000
2	2022-01-01	GOOGL	Alphabet Inc.	5.3200000000	29.2200000000	0.3700000000	1800000000000.0

'The stock with the highest PB is AAPL (Apple Inc.) with a PB of 31.45.'

In [7]:
# ask question using natural language
agent_executor.run("Which stock is recommended to hold when considering PE ?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input: ""[0m
Observation: [38;5;200m[1;3msqlite_sequence, stocks, credit_card_transactions[0m
Thought:[32;1m[1;3m I should check the schema of the stocks table to get more information
Action: schema_sql_db
Action Input: "stocks"[0m
Observation: [33;1m[1;3m
CREATE TABLE stocks (
	id INTEGER, 
	date DATE NOT NULL, 
	stock_symbol VARCHAR(10) NOT NULL, 
	stock_name VARCHAR(255) NOT NULL, 
	pb FLOAT, 
	pe FLOAT, 
	dividend_yield FLOAT, 
	market_value FLOAT, 
	turnover_ratio FLOAT, 
	historical_high FLOAT, 
	historical_low FLOAT, 
	PRIMARY KEY (id)
)

/*
3 rows from stocks table:
id	date	stock_symbol	stock_name	pb	pe	dividend_yield	market_value	turnover_ratio	historical_high	historical_low
1	2022-01-01	AAPL	Apple Inc.	31.4500000000	27.1100000000	0.6300000000	2500000000000.0000000000	10.5000000000	325.4700000000	15.1200000000
2	2022-01-01	GOOGL	Alphabet Inc.	5.3200000000	29.2200000000	0.370

'There are no stocks with a PE value below 20.'

In [8]:
# ask question using natural language
agent_executor.run("What is the average PE for Apple Inc.?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: list_tables_sql_db
Action Input: [0m
Observation: [38;5;200m[1;3msqlite_sequence, stocks, credit_card_transactions[0m
Thought:[32;1m[1;3m The table 'stocks' looks the most relevant. 
Action: schema_sql_db
Action Input: stocks[0m
Observation: [33;1m[1;3m
CREATE TABLE stocks (
	id INTEGER, 
	date DATE NOT NULL, 
	stock_symbol VARCHAR(10) NOT NULL, 
	stock_name VARCHAR(255) NOT NULL, 
	pb FLOAT, 
	pe FLOAT, 
	dividend_yield FLOAT, 
	market_value FLOAT, 
	turnover_ratio FLOAT, 
	historical_high FLOAT, 
	historical_low FLOAT, 
	PRIMARY KEY (id)
)

/*
3 rows from stocks table:
id	date	stock_symbol	stock_name	pb	pe	dividend_yield	market_value	turnover_ratio	historical_high	historical_low
1	2022-01-01	AAPL	Apple Inc.	31.4500000000	27.1100000000	0.6300000000	2500000000000.0000000000	10.5000000000	325.4700000000	15.1200000000
2	2022-01-01	GOOGL	Alphabet Inc.	5.3200000000	29.2200000000	0.3700000000	1800000000000.0000000

'The average PE for Apple Inc. is 26.11.'

### Use case 2 - Document summarization

<img src="./langChain_map_reduce.png" alt="LangChain map reduce" width="800"/>
<img src="./langChain_refine.png" alt="LangChain map refine" width="800"/>

In [4]:
from langchain.document_loaders import UnstructuredFileLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

# import document
loader = UnstructuredFileLoader("./ReAct.txt")
# load document
document = loader.load()
print(f'documents:{len(document)}')

# split document
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 0
)

split_documents = text_splitter.split_documents(document)
print(f'documents:{len(split_documents)}')


# create summary chain
chain = load_summarize_chain(llm, chain_type="refine", verbose=True)

# show top 5 summarization
chain.run(split_documents[0:5])

documents:1
documents:271


[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Published as a conference paper at ICLR 2023

R E A C T : S YNERGIZING R EASONING L ANGUAGE M ODELS

AND

ACTING

IN

Shunyu Yao∗*,1 , Jeffrey Zhao2 , Dian Yu2 , Nan Du2 , Izhak Shafran2 , Karthik Narasimhan1 , Yuan Cao2 1

arXiv:2210.03629v3 [cs.CL] 10 Mar 2023

Department of Computer Science, Princeton University 2 Google Research, Brain team 1 {shunyuy,karthikn}@princeton.edu 2 {jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.com"


CONCISE SUMMARY:[0m

[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point:  
This paper presents REACT, a model that synergizes reasoning language models and acting capabilities to bring about better d

'\n\nThis paper presents REACT, a model that synergizes reasoning language models and acting capabilities to bring about better decision-making. It was presented at ICLR 2023 and authored by Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao from Princeton University and Google Research Brain team. While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, REACT explores the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two to bring about better decision-making. The traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with and gat

### Use case 3 - Multiple Chains

In [45]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

# dish Chain
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)
dish_chain = LLMChain(llm=llm, prompt=prompt_template)

# recipe Chain
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)
recipe_chain = LLMChain(llm=llm, prompt=prompt_template)

# Use SimpleSequentialChain to Chain the 2 different answers
overall_chain = SimpleSequentialChain(chains=[dish_chain, recipe_chain], verbose=True)
review = overall_chain.run("Rome")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
One classic dish from Rome is Pasta alla Carbonara, a rich and creamy pasta dish made with eggs, guanciale (Italian cured pork cheek), pecorino cheese, black pepper, and olive oil.[0m
[33;1m[1;3m
Recipe for Pasta alla Carbonara:

Ingredients:
- 4 ounces guanciale, diced
- 2 eggs
- 1 cup grated Pecorino cheese
- Olive oil
- Black pepper
- 1 pound of pasta

Instructions:
1. Bring a pot of salted water to a boil and cook the pasta according to the package instructions. Drain, reserving 1/2 cup of the cooking water. 
2. In a large skillet, heat 2 tablespoons of olive oil over medium-high heat. Add the guanciale and cook until golden, about 5 minutes. 
3. In a large bowl, whisk together the eggs, Pecorino cheese, and a generous amount of freshly ground black pepper. 
4. Add the cooked pasta to the skillet with the guanciale and stir to combine. 
5. Pour the egg mixture over the pasta and stir to combine. Add a few tabl

### Eample 4 - Structure output

In [47]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate

# define response schema
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# init output parser
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# {
#	"bad_string": string  // This a poorly formatted user input string
#	"good_string": string  // This is your response, a reformatted response
#}
format_instructions = output_parser.get_format_instructions()

template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

# define prompt template
prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to califonya!")
llm_output = llm(promptValue)

print(llm_output)

# parse the output
output_parser.parse(llm_output)

```json
{
	"bad_string": "welcom to califonya!",
	"good_string": "Welcome to California!"
}
```


{'bad_string': 'welcom to califonya!', 'good_string': 'Welcome to California!'}