<a href="https://colab.research.google.com/github/connecteev/langchain-crash-course-python-google-colab-notebook/blob/main/langchain_crash_course_python_google_colab_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Introduction to LangChain:
LangChain is a framework for developing applications powered by language models.According to their team the most powerful and differentiated applications will not only call out to a language model via an API, but will also:

- Be data-aware: connect a language model to other sources of data

- Be agentic: allow a language model to interact with its environment

**Links:**

LangChain Docs: https://python.langchain.com/en/latest/index.html

Github: https://github.com/hwchase17/langchain

Based on LangChain Crash Course by PromptEngineering:
https://www.youtube.com/watch?v=5-fc4Tlgmro

### Topics to be covered:
- Installation
- Available LLMs
- Prompt Templates
- Chains
- Agents & Tools
- Memory
- Document Loaders
- Indexes

### Installation

LangChain is available on PyPi, so to it is easily installable with:

(ref: https://tinyurl.com/3fsppvxn)

In [1]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.0.229-py3-none-any.whl (1.3 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.3 MB[0m [31m7.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.3/1.3 MB[0m [31m19.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.9-py3-none-any.whl (26 kB)
Collecting langchainplus-sdk<0.0.21,>=0.0.20 (from langchain)
  Downloading langchainplus_sdk-0.0.20-py3-none-any.whl (25 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

#### Available LLMs

Has integration with several different LLMs, list is here: https://python.langchain.com/en/latest/modules/models/llms/integrations.html

***OpenAI Integration***

In [2]:
!pip install openai

Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.8


In [3]:
# set your openai API key
import os
os.environ["OPENAI_API_KEY"] = "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"

In [18]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9)  # model_name="text-davinci-003"
prompt = "Why does the planet Saturn have rings around it?"
print(llm(prompt))



The rings of Saturn are made up of particles of ice and rock ranging in size from micrometers to meters. Scientists believe that the rings of Saturn are mostly made up of particles that were once part of a larger object such as a comet or an asteroid. It is possible that these objects were pulled apart by Saturn’s gravity before eventually forming the rings. Some scientists also suggest that the rings of Saturn may have been formed by icy particles from the planet's moons.


***Huggingface Hub integration***

In [11]:
!pip install huggingface_hub



In [12]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [13]:
from langchain import HuggingFaceHub

In [None]:
# https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_hub.html

#llm=HuggingFaceHub(repo_id="google/flan-t5-xl", model_kwargs={"temperature":0.9, "max_length":64})
#prompt = "Why is gravity on the moon lower than that on earth?"
#print(llm(prompt))

### Prompt Templates

A prompt template refers to a reproducible way to generate a prompt. It contains a text string (“the template”), that can take in a set of parameters from the end user and generate a prompt.

The prompt template may contain:

- instructions to the language model,

- a set of few shot examples to help the language model generate a better response,

- a question to the language model.


In [19]:
from langchain import PromptTemplate

template = "Write a {adjective} poem about {subject}"

prompt = PromptTemplate(
    input_variables=["adjective", "subject"],
    template=template,
)

myPrompt = prompt.format(adjective='funny', subject='The Simpsons')

myLLMOutput = llm(myPrompt)
print(myLLMOutput)



Oh the Simpson family
What a funny foursome they be 
With Homer always stuffing
Frying up donuts so sweet

Marge and her blue hair
Battling her daily cares 
Lisa and her saxophone
Keeping us in deep pensé 

Bart with his skateboard
Always getting into trouble so hard 
And Maggie the baby
Sucking her dummy so loud

This family is something special
The fans love them through it all 
But no matter what they do
The Simpsons are here to stay for all.



The prompt template may contain:

- instructions to the language model,

- a set of few shot examples to help the language model generate a better response,

- a question to the language model.

In [27]:
template = """
I want you to act as a naming consultant for new companies.

Here are some examples of good company names:

- search engine, Google
- social media, Facebook
- video sharing, YouTube

The name should be short, catchy and easy to remember.

What is a good name for a company that makes {product}?
"""

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)
prompt = prompt.format(product='news articles')
print(llm(prompt=prompt))


NewsCircle.


## Chains

Using an LLM in isolation is fine for some simple applications, but many more complex ones require chaining LLMs - either with each other or with other experts.

In [30]:
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

chainCommandOutput = chain.run("colorful socks")
print(chainCommandOutput)



Vibrant Socksies.


## Agents & Tools

Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning to the user.

- Tool: A function that performs a specific duty. This can be things like: Google Search, Database lookup, Python REPL, other chains.

- LLM: The language model powering the agent.

- Agents: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done.

##### ***Potential use cases:***
- Personal Assistant
- Question Answering
- Chatbots
- Code Understanding etc.


Tools: https://python.langchain.com/en/latest/modules/agents/tools.html

Agents: https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html




In [32]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
!pip install wikipedia



In [33]:
from langchain.llms import OpenAI
llm = OpenAI(temperature=0.7)
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("What year did Lionel Messi Join Barcelona? What is his current age raised to the 0.43 power?")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out when Messi joined Barcelona and compute his current age raised to the 0.43 power. 
Action: Wikipedia
Action Input: Lionel Messi[0m
Observation: [36;1m[1;3mPage: Lionel Messi
Summary: Lionel Andrés Messi (Spanish pronunciation: [ljoˈnel anˈdɾes ˈmesi] (listen); born 24 June 1987), also known as Leo Messi, is an Argentine professional footballer who plays as a forward and captains the Argentina national team. Widely regarded as one of the greatest players of all time, Messi has won a record seven Ballon d'Or awards and a record six European Golden Shoes, and in 2020 he was named to the Ballon d'Or Dream Team. Until leaving the club in 2021, he had spent his entire professional career with Barcelona, where he won a club-record 34 trophies, including ten La Liga titles, seven Copa del Rey titles and the UEFA Champions League four times. With his country, he won the 2021 Copa América and the 2022 FIFA World Cup. A prolif

'Lionel Messi joined Barcelona in 2004 and his current age raised to the 0.43 power is 3.9218486893172186.'

In [34]:
import os

os.environ["LANGCHAIN_TRACING"] = "true"

from langchain import OpenAI
from langchain.agents import initialize_agent, AgentType

llm = OpenAI(temperature=0)

from langchain.tools import StructuredTool


def multiplier(a: float, b: float) -> float:
    """Multiply the provided floats."""
    return a * b


tool = StructuredTool.from_function(multiplier)

# Structured tools are compatible with the STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION agent type.
agent_executor = initialize_agent(
    [tool],
    llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)


agent_executor.run("What is 3 times 4")





[1m> Entering new  chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "multiplier",
  "action_input": {"a": 3, "b": 4}
}
```

[0m
Observation: [36;1m[1;3m12.0[0m
Thought:



[32;1m[1;3m I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "3 times 4 is 12.0"
}
```[0m

[1m> Finished chain.[0m


'3 times 4 is 12.0'

In [39]:
# Test Wikipedia tool integration, per https://python.langchain.com/docs/modules/agents/tools/integrations/wikipedia
!pip install wikipedia



In [41]:
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
wikipedia.run("Man vs Wild")



'Page: Man vs. Wild\nSummary: Man vs. Wild, also called Born Survivor: Bear Grylls, Ultimate Survival, Survival Game, or colloquially as simply Bear Grylls in the United Kingdom, is a survival television series hosted by Bear Grylls on the Discovery Channel. In the United Kingdom, the series was originally shown on Channel 4, but the show\'s later seasons were broadcast on Discovery Channel U.K. The series was produced by British television production company Diverse Bristol. The show was premiered on November 10, 2006, after airing a pilot episode titled "The Rockies" on March 10, 2006.\nGrylls also said he has been approached about doing a Man vs. Wild urban disaster 3D feature film, which he said he would "really like to do". He signed on to showcase urban survival techniques in a Discovery show called Worst-Case Scenario, which premiered on May 5, 2010, on the network.The Discovery Channel terminated its legal relationship with Grylls in 2012 due to contract disputes, effectively c

### Memory


Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

In [49]:
from langchain import OpenAI, ConversationChain

llm = OpenAI(temperature=0)

conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi there!")





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m





[1m> Finished chain.[0m


" Hi there! It's nice to meet you. How can I help you today?"

In [50]:
conversation.predict(input='Lets talk about how physics works on the Moon')





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: Lets talk about how physics works on the Moon
AI:[0m





[1m> Finished chain.[0m


" Sure! Physics on the Moon is a fascinating topic. The Moon's gravity is only about one-sixth of Earth's, so objects on the Moon experience a much weaker gravitational force. This means that objects on the Moon can move faster and farther than they can on Earth. Additionally, the Moon has no atmosphere, so there is no air resistance to slow down objects. This means that objects on the Moon can travel in a straight line for much longer distances than they can on Earth."

In [51]:
conversation.predict(input='Why is the gravitational field lower?')





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi there!
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: Lets talk about how physics works on the Moon
AI:  Sure! Physics on the Moon is a fascinating topic. The Moon's gravity is only about one-sixth of Earth's, so objects on the Moon experience a much weaker gravitational force. This means that objects on the Moon can move faster and farther than they can on Earth. Additionally, the Moon has no atmosphere, so there is no air resistance to slow down objects. This means that objects on the Moon can travel in a straight line for much longer distances than they can on Earth.
Human: Why is the gravitational field lower?
AI:[0m





[1m> Finished chain.[0m


" The Moon's gravitational field is lower because it is much smaller than Earth. The Moon has a mass of only 7.34767309 × 10^22 kg, compared to Earth's mass of 5.97219 × 10^24 kg. This means that the Moon's gravitational pull is much weaker than Earth's."

### Document Loaders

Combining language models with your own text data is a powerful way to differentiate them. The first step in doing this is to load the data into “documents” - a fancy way of say some pieces of text.


https://python.langchain.com/en/latest/modules/indexes/document_loaders.html

In [58]:
!pip install pypdf



In [63]:
from langchain.document_loaders import PyPDFLoader

#loader = PyPDFLoader("example_data/layout-parser-paper.pdf")

loader = PyPDFLoader("https://ai.stanford.edu/~ang/papers/acl11-WordVectorsSentimentAnalysis.pdf")
pages = loader.load_and_split()

# print(pages[0])
for page in pages:
  print(page)

page_content='Learning Word Vectors for Sentiment Analysis\nAndrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang,\nAndrew Y. Ng, and Christopher Potts\nStanford University\nStanford, CA 94305\n[amaas, rdaly, ptpham, yuze, ang, cgpotts]@stanford.edu\nAbstract\nUnsupervised vector-based approaches to se-\nmantics can model rich lexical meanings, but\nthey largely fail to capture sentiment informa-\ntion that is central to many word meanings and\nimportant for a wide range of NLP tasks. We\npresent a model that uses a mix of unsuper-\nvised and supervised techniques to learn word\nvectors capturing semantic term–document in-\nformation as well as rich sentiment content.\nThe proposed model can leverage both con-\ntinuous and multi-dimensional sentiment in-\nformation as well as non-sentiment annota-\ntions. We instantiate the model to utilize the\ndocument-level sentiment polarity annotations\npresent in many online documents (e.g. star\nratings). We evaluate the model using small,\n

### Indexes

Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.

- Embeddings: An embedding is a numerical representation of a piece of information, for example, text, documents, images, audio, etc.
- Text Splitters: When you want to deal with long pieces of text, it is necessary to split up that text into chunks.
- Vectorstores: Vector databases store and index vector embeddings from NLP models to understand the meaning and context of strings of text, sentences, and whole documents for more accurate and relevant search results.


In [76]:
import requests

url = "https://raw.githubusercontent.com/hwchase17/langchain/master/docs/extras/modules/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
  f.write(res.text)

In [77]:
# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

In [78]:
# documents

In [80]:
# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
len(docs)

42