## Langchain Use-cases

### Summarization

In [2]:
from dotenv import find_dotenv, load_dotenv
import os

load_dotenv(find_dotenv())
openai_api_key = os.getenv("OPENAI_API_KEY")

In [3]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [5]:
with open('data/PaulGrahamEssays/good.txt', 'r') as file:
    text = file.read()
    
print(text[:285])

April 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the
phrase that became our motto: Make something people want.  We've
learned a lot since then, but if I were choosing now that's still
the one I'd pick.


In [6]:
num_tokens = llm.get_num_tokens(text)
print(f"There are {num_tokens} tokens in your file")

There are 3917 tokens in your file


In [8]:
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)
docs = text_splitter.create_documents([text])

print(f"no of documents = {len(docs)}")

no of documents = 4


In [9]:
chain = load_summarize_chain(llm=llm, chain_type='map_reduce')

In [10]:
output = chain.run(docs)
print(output)



The essay discusses the motto of Y Combinator, "Make something people want," and how it relates to not worrying about the business model in the early stages. It suggests that this approach can lead to unexpected results, such as a successful business that operates like a charity. The article also discusses the importance of benevolence in the success of startups, citing examples of successful companies and how being good can attract support and lead to decisive decision-making. The author also reflects on the concept of being "good" and its value in business, while acknowledging the undervaluing of starting a company with benevolent aims.


### QA using documents as context using embeddings

In [11]:
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings

In [14]:
loader = TextLoader('data/PaulGrahamEssays/worked.txt')
doc = loader.load()
print(f"Length = {len(doc)}")
print(f"Characters = {len(doc[0].page_content)}")

Length = 1
Characters = 74663


In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

In [16]:
num_total_characters = sum(len(x.page_content) for x in docs)
print(f"Num of docs = {len(docs)} with an average of {num_total_characters/len(docs):,.0f} characters")

Num of docs = 29 with an average of 2,930 characters


In [20]:
embeddings = OpenAIEmbeddings(openai_api_key = openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

In [23]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [24]:
query = "what does the author describe as good work?"
qa.run(query)

' The author describes working on things that are not prestigious as good work, as it allows for the discovery of something real and shows the right kind of motives.'

### Extraction using Langchain's Response Schema

In [25]:
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

In [26]:
chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

In [28]:
response_schemas = [
    ResponseSchema(name="artist", description="name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [29]:
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"artist": string  // name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```


In [30]:
prompt = ChatPromptTemplate(
    messages = [
        HumanMessagePromptTemplate.from_template("Give a command from the user, extract the artist and song names \n {format_instructions}\n {user_prompt}")
    ],
    input_variables = ["user_prompt"],
    partial_variables = {"format_instructions": format_instructions}
)

In [31]:
query = prompt.format_prompt(user_prompt="I really like Anyone by Seventeen")

In [33]:
q_output = chat_model(query.to_messages())
output = output_parser.parse(q_output.content)

print(output)
print(type(output))

{'artist': 'Seventeen', 'song': 'Anyone'}
<class 'dict'>


### Evaluation of QA

In [34]:
from langchain.evaluation.qa import QAEvalChain

In [35]:
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

In [36]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), input_key="question")

In [37]:
question_answers = [
    {'question' : "Which company sold the microcomputer kit that his friend built himself?", 'answer' : 'Healthkit'},
    {'question' : "What was the small city he talked about in the city that is the financial capital of USA?", 'answer' : 'Yorkville, NY'}
]

In [38]:
predictions = chain.apply(question_answers)
predictions

[{'question': 'Which company sold the microcomputer kit that his friend built himself?',
  'answer': 'Healthkit',
  'result': ' Heathkit'},
 {'question': 'What was the small city he talked about in the city that is the financial capital of USA?',
  'answer': 'Yorkville, NY',
  'result': ' The small city mentioned in the context is Yorkville, and the financial capital of the USA is New York City.'}]

In [39]:
eval_chain = QAEvalChain.from_llm(llm)

graded_outputs = eval_chain.evaluate(question_answers, predictions, question_key="question", prediction_key="result", answer_key='answer')
graded_outputs

[{'results': ' CORRECT'}, {'results': ' CORRECT'}]

### Querying Tabular Data

In [50]:
from langchain import OpenAI, SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [54]:
sqlite_db_path = 'data/San_Francisco_Trees.db'
db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")

In [55]:
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)



In [56]:
db_chain.run("How many Species of trees are there in San Francisco?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many Species of trees are there in San Francisco?
SQLQuery:[32;1m[1;3mSELECT COUNT(DISTINCT qSpecies) FROM SFTrees[0m
SQLResult: [33;1m[1;3m[(578,)][0m
Answer:[32;1m[1;3mThere are 578 species of trees in San Francisco.[0m
[1m> Finished chain.[0m


'There are 578 species of trees in San Francisco.'

### Understanding Code

In [61]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

In [62]:
root_dir = 'data/thefuzz'
docs = []

for dirpath, dirnames, filenames in os.walk(root_dir):
    for file in filenames:
        try:
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e:
            pass

In [63]:
print (f"You have {len(docs)} documents\n")
print ("------ Start Document ------")
print (docs[0].page_content[:300])

You have 175 documents

------ Start Document ------
import unittest
import re
import pycodestyle

from thefuzz import fuzz
from thefuzz import process
from thefuzz import utils
from thefuzz.string_processing import StringProcessor


class StringProcessingTest(unittest.TestCase):
    def test_replace_non_letters_non_numbers_with_whitespace(self):
    


In [64]:
docsearch = FAISS.from_documents(docs, embeddings)

In [65]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())

In [66]:
query = "What function do I use if I want to find the most similar item in a list of items?"
output = qa.run(query)
print (output)

You can use the `process.extractOne()` function from the `thefuzz` library if you want to find the most similar item in a list of items. This function returns a tuple with the best match and its corresponding similarity score.


In [67]:
query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"
output = qa.run(query)
print (output)

```python
best = process.extractOne(query, choices)
```


### Interacting with APIS

In [68]:
from langchain.chains import APIChain
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

In [70]:
api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france
    
The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP
    
Woo! This is my documentation
"""
chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

ValidationError: 1 validation error for APIChain
__root__
  Please provide a list of domains to limit access using `limit_to_domains`. (type=value_error)

In [None]:
chain_new.run('Can you tell me information about france?')
chain_new.run('Can you tell me about the currency COP?')

### Chatbots

In [74]:
from langchain.prompts.prompt import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain import LLMChain

In [77]:
template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"],
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [78]:
llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key), 
    prompt=prompt, 
    verbose=True, 
    memory=memory
)

In [79]:
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it


Human: Is an pear a fruit or vegetable?
Chatbot:[0m

[1m> Finished chain.[0m


" I'm not sure, but I do know that an apple a day keeps the doctor away, so maybe a pear could keep the farmer away?"

In [80]:
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

Human: Is an pear a fruit or vegetable?
AI:  I'm not sure, but I do know that an apple a day keeps the doctor away, so maybe a pear could keep the farmer away?
Human: What was one of the fruits I first asked you about?
Chatbot:[0m

[1m> Finished chain.[0m


' I have a terrible memory, but I do know that apples and oranges have a peel of a time trying to keep up with the latest fruit trends.'

### Self Ask with Search Agent

In [81]:
from langchain.tools import DuckDuckGoSearchRun
from langchain.agents import Tool, initialize_agent
search = DuckDuckGoSearchRun()

tools = [
    Tool(
        func = search.run,
        name="Intermediate Answer",
        description="Useful for when you need to search the interent for information"
    )
]

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="self-ask-with-search",
    verbose=True,
    handle_parsing_errors=True
)

In [82]:
print(agent.invoke("Question: Who was the Prime Minister of India when the first Moon landing took place?"))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Yes.
Follow up: When did the first Moon landing take place?[0m
Intermediate answer: [36;1m[1;3mThree phrases that recall humanity's first landing on and exploration of the lunar surface. In July 1969, Apollo 11 astronauts Neil A. Armstrong, Michael Collins, and Edwin E. "Buzz" Aldrin completed humanity's first landing on the Moon. Apollo 11 Mission Highlights Watch highlights of the Apollo 11 mission including the launch on July 16, 1969, the landing of the lunar module, Neil Armstrong's first steps on the Moon, splashdown, and more. Apollo 11, U.S. spaceflight in which astronauts Neil Armstrong and Buzz Aldrin became the first people to walk on the Moon. Apollo 11 was the culmination of the Apollo program and a massive national commitment by the United States to beat the Soviet Union in putting people on the Moon. Apollo program. The Apollo program, also known as Project Apollo, was the United States human spaceflight pr