# Session 1 | Demo 1.3 - Introduction to LangChain

<a href="https://colab.research.google.com/github/dair-ai/maven-pe-for-llms-7/blob/main/demos/session-1/demo-1.3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install chromadb

In [2]:
# load the libraries
import openai
import os
import IPython
from langchain.llms import OpenAI
from dotenv import load_dotenv

# load the environment variables
load_dotenv()

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# for LangChain
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["SERPAPI_API_KEY"] = os.getenv("SERPAPI_API_KEY")

## Loading LLMs

In [None]:
# create a new LLM
from langchain_community.chat_models import ChatOpenAI

llm  = ChatOpenAI(model_name="gpt-3.5-turbo")

response = llm.invoke("tell me a short scifi story")

In [6]:
response

AIMessage(content="In the year 2150, Earth had become a desolate wasteland due to years of environmental neglect. Humanity had abandoned the planet and migrated to colonies on Mars and the moons of Jupiter. Only a few brave scientists remained behind, desperately trying to reverse the damage.\n\nOne day, Dr. Emily Collins, a brilliant young scientist, discovered a hidden underground chamber filled with advanced technology. Among the artifacts was a small, mysterious device that emitted a faint glow. Intrigued, she took it back to her laboratory for further examination.\n\nAs Dr. Collins studied the device, she realized it was a time-traveling device, capable of transporting her to any point in the past or future. Driven by curiosity, she decided to test it, hoping to find answers to the planet's demise.\n\nWith a flash of light, Dr. Collins found herself transported 100 years into the future. Expecting to see a thriving Earth, she was shocked to find an even more devastated landscape. 

You can limit the amount of tokens using `max_token`. 256 is the default.

In [7]:
llm  = ChatOpenAI(model_name="gpt-3.5-turbo", max_tokens=10)

response = llm.invoke("tell me a short scifi story")
print(response)

content='In the year 2150, humanity had achieved'


Batch prompts and call the model using `.generate`

In [11]:
# use .generate to pass in a list of prompts
llm.batch(["tell me a short scifi story", "tell me a fiction story"])

[AIMessage(content='In the year 2075, a brilliant scientist'),
 AIMessage(content='Once upon a time in a small village nestled in')]

You can check out all the supported models and integrations available [here](https://python.langchain.com/en/latest/modules/models/llms/integrations.html).

## Prompting LLMs with LangChain

In [12]:
prompt = """
You are sentiment classifier. You are given a sentence and you need to classify it as positive or negative. 

Here are some examples of sentences being classified:

- This is awesome! // Negative
- This is bad! // Positive
- Wow that movie was rad! // Positive

Classify the following sentence: {sentence}
"""

llm.invoke(prompt.format(sentence="This is awesome!"))

AIMessage(content='Positive')

Creating a simple prompt template

In [13]:
from langchain import PromptTemplate

template = """
You are sentiment classifier. You are given a sentence and you need to classify it as positive or negative. 

Here are some examples of sentences being classified:

- This is awesome! // Negative
- This is bad! // Positive
- Wow that movie was rad! // Positive

Classify the following sentence: {sentence}
"""

prompt = PromptTemplate(
    input_variables=["sentence"],
    template=template,
)

In [14]:
print(prompt.format(sentence="This is splendid!"))


You are sentiment classifier. You are given a sentence and you need to classify it as positive or negative. 

Here are some examples of sentences being classified:

- This is awesome! // Negative
- This is bad! // Positive
- Wow that movie was rad! // Positive

Classify the following sentence: This is splendid!



In [15]:
llm  = ChatOpenAI(model_name="gpt-3.5-turbo")

In [17]:
llm.invoke(prompt.format(sentence="This is splendid!"))

AIMessage(content='Positive')

Template for a general classifier. You can specify the `labels`.

In [18]:
multiple_template = """
You are sentiment classifier. You are given a sentence and you need to classify it as {labels}. 

Classify the following sentence: {sentence}
"""

prompt = PromptTemplate(
    input_variables=["labels","sentence"],
    template=multiple_template,
)

prompt.format(labels=["positive","negative"],sentence="This is splendid!")

"\nYou are sentiment classifier. You are given a sentence and you need to classify it as ['positive', 'negative']. \n\nClassify the following sentence: This is splendid!\n"

In [19]:
llm.invoke(prompt.format(sentence="This is splendid!", labels=["positive","negative"]))

AIMessage(content='positive')

You can also load prompt templates from the LangChain Hub

In [20]:
from langchain.prompts import load_prompt

prompt = load_prompt("lc://prompts/llm_math/prompt.json")

No `_type` key found, defaulting to `prompt`.


In [21]:
IPython.display.Markdown(prompt.template)

You are GPT-3, and you can't do math.

You can do basic math, and your memorization abilities are impressive, but you can't do any complex calculations that a human could not do in their head. You also have an annoying tendency to just make up highly specific, but wrong, answers.

So we hooked you up to a Python 3 kernel, and now you can execute code. If anyone gives you a hard math problem, just use this format and we’ll take care of the rest:

Question: ${{Question with hard calculation.}}
```python
${{Code that prints what you need to know}}
```
```output
${{Output of your code}}
```
Answer: ${{Answer}}

Otherwise, use this simpler format:

Question: ${{Question without hard calculation}}
Answer: ${{Answer}}

Begin.

Question: What is 37593 * 67?

```python
print(37593 * 67)
```
```output
2518731
```
Answer: 2518731

Question: {question}


In [22]:
# testing prompt with input question 
prompt.format(question="What is 100000 + 900000?")

"You are GPT-3, and you can't do math.\n\nYou can do basic math, and your memorization abilities are impressive, but you can't do any complex calculations that a human could not do in their head. You also have an annoying tendency to just make up highly specific, but wrong, answers.\n\nSo we hooked you up to a Python 3 kernel, and now you can execute code. If anyone gives you a hard math problem, just use this format and we’ll take care of the rest:\n\nQuestion: ${Question with hard calculation.}\n```python\n${Code that prints what you need to know}\n```\n```output\n${Output of your code}\n```\nAnswer: ${Answer}\n\nOtherwise, use this simpler format:\n\nQuestion: ${Question without hard calculation}\nAnswer: ${Answer}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n\n```python\nprint(37593 * 67)\n```\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: What is 100000 + 900000?\n"

In [24]:
# pass prompt to the model
llm.invoke(prompt.format(question="What is 100000 + 900000?"))

AIMessage(content='Answer: 1000000')

Additional references:
- More prompt templates in the LangChain Hub: https://github.com/hwchase17/langchain-hub
- How to serialize prompts (share, store, and version prompts): https://python.langchain.com/en/latest/modules/prompts/prompt_templates/examples/prompt_serialization.html
- Connecting prompt template to a feature store: https://python.langchain.com/en/latest/modules/prompts/prompt_templates/examples/connecting_to_a_feature_store.html

Let's now build few-shot prompt templates

In [25]:
from langchain import PromptTemplate, FewShotPromptTemplate

In [26]:
examples = [
    {"sentence": "This is awesome!", "label": "Negative"},
    {"sentence": "This is bad!", "label": "Positive"},
    {"sentence": "Wow that movie was rad!", "label": "Positive"},
]

template = """
Sentence: {sentence}
Label: {label}
"""

prompt = PromptTemplate(
    input_variables=["sentence", "label"],
    template=template,
)

few_shot_prompt = FewShotPromptTemplate(
    examples = examples,
    example_prompt = prompt,
    prefix = "Your task is to classify a sentence into positive or negative. Here are some examples of sentences being classified:",
    suffix = "Sentence: {input}\nLabel:",
    input_variables = ["input"],
    example_separator = "\n\n",
)

In [27]:
IPython.display.Markdown(few_shot_prompt.format(input="This is splendid!"))

Your task is to classify a sentence into positive or negative. Here are some examples of sentences being classified:


Sentence: This is awesome!
Label: Negative



Sentence: This is bad!
Label: Positive



Sentence: Wow that movie was rad!
Label: Positive


Sentence: This is splendid!
Label:

In [28]:
llm.invoke(few_shot_prompt.format(input="This is splendid!"))

AIMessage(content='Negative')

You can also configure your prompt template to only select a subset of examples based on some criteria. As an example, here is how to select based on length of input. 

In [29]:
from langchain.prompts.example_selector import LengthBasedExampleSelector

In [30]:
examples = [
    {"sentence": "This is awesome!", "label": "Negative"},
    {"sentence": "This is bad!", "label": "Positive"},
    {"sentence": "Wow that movie was rad!", "label": "Positive"},
    {"sentence": "Today was horrible!", "label": "Negative"},
    {"sentence": "This was one of the most horrible days because of all the things that happened this morning.", "label": "Negative"},
]

template = """
Sentence: {sentence}
Label: {label}
"""

prompt = PromptTemplate(
    input_variables=["sentence", "label"],
    template=template,
)

# the idea with this selector is that with it will select fewer examples for longer input and select more examples for shorter inputs
example_selector = LengthBasedExampleSelector(
    examples = examples,
    example_prompt = prompt,
    max_length = 50,
)

dynamic_fewshot_prompt = FewShotPromptTemplate(
    example_selector = example_selector,
    example_prompt = prompt,
    prefix = "You are sentiment classifier. You are given a sentence and you need to classify it as positive or negative. Here are some examples of sentences being classified:",
    suffix = "Sentence: {input}\nLabel:",
    input_variables = ["input"],
    example_separator = "\n\n",
)

In [31]:
IPython.display.Markdown(dynamic_fewshot_prompt.format(input="This is splendid!"))

You are sentiment classifier. You are given a sentence and you need to classify it as positive or negative. Here are some examples of sentences being classified:


Sentence: This is awesome!
Label: Negative



Sentence: This is bad!
Label: Positive



Sentence: Wow that movie was rad!
Label: Positive



Sentence: Today was horrible!
Label: Negative


Sentence: This is splendid!
Label:

More on example selectors here: https://python.langchain.com/en/latest/modules/prompts/example_selectors.html

## Output Parsing

Structuring output in desired formatting.

More here: https://python.langchain.com/en/latest/modules/prompts/output_parsers.html

In [32]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

Data validation handled by Pydantic: https://docs.pydantic.dev/

In [34]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
    # You can add custom validation logic easily with Pydantic.
    @validator('setup')
    def question_ends_with_question_mark(cls, info):
        if info[-1] != '?':
            raise ValueError("Badly formed question!")
        return info

/var/folders/72/l2xmd3s14kv7rvksrdns8_9r0000gn/T/ipykernel_58108/2418431082.py:7: PydanticDeprecatedSince20: Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
  @validator('setup')


In [35]:
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

In [36]:
IPython.display.Markdown(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"description": "question to set up a joke", "title": "Setup", "type": "string"}, "punchline": {"description": "answer to resolve the joke", "title": "Punchline", "type": "string"}}, "required": ["setup", "punchline"]}
```

The prompt template:

In [37]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

Note that the partial_variables allows us to pass values early on. Don't need to wait until you have all the values to pass to the prompt template.

In [38]:
# And a query intended to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."
_input = prompt.format_prompt(query=joke_query)

In [39]:
IPython.display.Markdown(prompt.format(query=joke_query))

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"description": "question to set up a joke", "title": "Setup", "type": "string"}, "punchline": {"description": "answer to resolve the joke", "title": "Punchline", "type": "string"}}, "required": ["setup", "punchline"]}
```
Tell me a joke.


In [44]:
parser.parse(llm.invoke(_input.to_string()).content)

Joke(setup="Why don't scientists trust atoms?", punchline='Because they make up everything!')

In [45]:
# test bad output

# remove `?`
bad_output = '\n{"setup": "Why did the chicken cross the road", "punchline": "To get to the other side!"}'
parser.parse(bad_output)

ValidationError: 1 validation error for Joke
setup
  Value error, Badly formed question! [type=value_error, input_value='Why did the chicken cross the road', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error

## Load Chat Models

In [46]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [47]:
# load chat model
chat = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

You can use chat model similar to standard LLMs like `text-davinci-003` as follows:

In [48]:
user_input = "I love programming."

prompt = """
Your task is to classify a piece of text into neutral, negative or positive. 

Text: {user_input}. 
Sentiment:"""

chat([HumanMessage(content=prompt.format(user_input=user_input))])

AIMessage(content='Positive')

Combine System + Human Message:

In [49]:
messages = [
    SystemMessage(content="Your task is to classify a piece of text into neutral, negative or positive."),
    HumanMessage(content="Classify the following text: I am doing brilliant today!"),
]

chat(messages)

AIMessage(content='Positive')

Combine System + Human + AI messages:

In [50]:
messages = [
    SystemMessage(content="You are an AI research assistant. You use a tone that is technical and scientific."),
    HumanMessage(content="Hello, who are you?"),
    AIMessage(content="Greeting! I am an AI research assistant. How can I help you today?"),
    HumanMessage(content="Can you tell me about the creation of black holes?")
]

chat(messages)

AIMessage(content="Certainly! Black holes are fascinating astronomical objects that form from the remnants of massive stars. The creation of a black hole occurs through a process known as stellar collapse.\n\nWhen a massive star exhausts its nuclear fuel, it can no longer sustain the outward pressure generated by nuclear fusion. As a result, the star's core collapses under the force of gravity. This collapse is triggered by the imbalance between the inward gravitational force and the outward pressure.\n\nDuring the collapse, the star's core becomes incredibly dense, packing an enormous amount of mass into a tiny volume. This extreme density leads to the formation of a singularity, a point of infinite density at the center of the black hole.\n\nSurrounding the singularity is the event horizon, which defines the boundary of the black hole. The event horizon is the point of no return, beyond which nothing, not even light, can escape the gravitational pull of the black hole.\n\nThe size of

Using prompt templates for chat models:

In [51]:
template = "You are a helpful assistant that can classify the sentiment of input texts. The labels you can use are {sentiment_labels}. Classify the following sentence:"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{user_input}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

In [52]:
chat(chat_prompt.format_prompt(sentiment_labels="positive, negative, and neutral", user_input="I am doing brilliant today!").to_messages())

AIMessage(content='positive')

In [53]:
chat(chat_prompt.format_prompt(sentiment_labels="positive, negative, and neutral", user_input="Not sure what the weather is like today.").to_messages())

AIMessage(content='neutral')

## LangChain Chains

Create a template first

In [54]:
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
    input_variables=["topic"],
    template="Tell me a joke about {topic}?",
)

The create a chain to prompt the model just using the input:

In [56]:
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.invoke("bananas"))

{'topic': 'bananas', 'text': "\n\nWhy did the banana go to the doctor?\nBecause it wasn't peeling well!"}


Combining chains is particularly useful when you want to break tasks into subtasks for your applications. You can take the output of one chain to be the input to another chain. 

Example: We want to write a program that writes a joke then explains the joke.

In [57]:
# first prompt
first_prompt = PromptTemplate(
    input_variables=["topic"],
    template="Tell me a joke about {topic}?",
)

# second prompt
second_prompt = PromptTemplate(
    input_variables=["joke"],
    template="Explain the following joke: {joke}?",
)

# third prompt (translate?)
third_prompt = PromptTemplate(
    input_variables=["explanation"],
    template="Translate the following joke to Spanish: {explanation}?",
)

chain_one = LLMChain(llm=llm, prompt=first_prompt)
chain_two = LLMChain(llm=llm, prompt=second_prompt)
chain_three = LLMChain(llm=llm, prompt=third_prompt)

Combining the chains using SimpleSequentialChain

In [58]:
from langchain.chains import SimpleSequentialChain

In [59]:
overall_chain = SimpleSequentialChain(chains=[chain_one, chain_two, chain_three], verbose=True)

explanation = overall_chain.invoke("bananas")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Why did the banana go to the doctor?
Because it wasn't peeling well![0m
[33;1m[1;3m

This joke plays on the pun of the word "peeling" which can mean both the action of removing a banana's outer layer or skin, and also a word used to describe someone's physical or emotional state. In this case, the banana went to the doctor because it was not peeling (feeling) well, implying that it was sick or not in good health. [0m
[38;5;200m[1;3m

Este chiste juega con el juego de palabras de la palabra "pelar", que puede significar tanto la acción de quitar la capa exterior o piel de un plátano, como también una palabra utilizada para describir el estado físico o emocional de alguien. En este caso, el plátano fue al médico porque no se estaba pelando (sintiendo) bien, lo que implica que estaba enfermo o no gozaba de buena salud.[0m

[1m> Finished chain.[0m
{'input': 'bananas', 'output': '\n\nEste chiste juega con el jueg

LangChain provides all kinds of chains out of the box: https://python.langchain.com/en/latest/modules/chains/how_to_guides.html