# LangChain: A library used to build language model applications

In [1]:
from dotenv import load_dotenv
from enum import Enum
from pydantic import BaseModel, Field
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.llms.openai import OpenAI
from langchain.llms.ollama import Ollama
from langchain.llms.huggingface_hub import HuggingFaceHub
from langchain.schema import HumanMessage, StrOutputParser, BaseOutputParser
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate, FewShotChatMessagePromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.output_parsers import (
  PydanticOutputParser, 
  CommaSeparatedListOutputParser,
  DatetimeOutputParser,
  EnumOutputParser,
)

In [None]:
load_dotenv()

Overall, LangChain consists of a `chain` or sequence which contains:
1. LLM
2. Prompt
3. Parser

The most basic prompt, with the default parser, can be invoked by the LLM using:

In [None]:
llm = OpenAI()
llm("What do you think of the color green?")

However, asking a single quesiton is simple, so if you want to build more advanced LLM solutions, you will need to use LangChain chains. There are two ways to create chains:

1. `Chain` interface (considered legacy)
2. `LCEL` pipelines

Here is a basic legacy chain, with the introduction of a prompt that will be chained:

In [14]:
prompt = PromptTemplate.from_template("{question}")

In [None]:
chain = LLMChain(llm=llm, prompt=prompt)
chain.run(question="What is the meaning of life?")

Here is the same chain but using LCEL:

In [None]:
chain = prompt | llm
chain.invoke({"question": "What is the meaning of life?"})

## LLM

There are two types of language models:
* `LLM`: a model that takes a string as input and returns a string
* `ChatModel`: a model that takes a list of messages as input and returns a message

The basic `LLM` is often referred to as an `Instruct` model, whereas the other is referred to as a `Chat` model. Ultimately, these are both foundational LLM models fine-tuned for instruction and conversations.

We already saw the basic usage of a LLM. Here is a simple example of a ChatModel that uses messages, where the `HumanMessage` is passed in and it returns an `AIMessage`. All messages are derived from the `BaseMessage` which has a `role` and `content`:

In [None]:
llm = ChatOpenAI()
input_message = HumanMessage(content="how many days are in a year?")
llm([input_message])

We are also not limited to online LLMs. Here is an example using `Ollama` with the `Mistral` LLM running local:

In [None]:
llm = Ollama(model="mistral")
print(llm("The first man on the moon was ..."))

We can also stream the LLM response instead of waiting for the entire text to be generated:

In [None]:
llm = Ollama(
  model="mistral",
  callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
llm("Who is Elon Musk?")
print()

It is also worth noting, it is easy to use models on HuggingFace using their `huggingface_hub` library, just make sure your `HUGGINGFACEHUB_API_TOKEN` is setup in the environment. It can also be slow, since you are running on the shared infrastructure:

In [None]:
llm = HuggingFaceHub(
  repo_id="google/flan-t5-xl", 
  model_kwargs={"temperature": 1}
)
llm("translate English to German: Hello, my name is John.", raw_response=True)

## Prompt

Prompts are the instructions to the LLM. There are two tools provided by LangChain for prompts:
1. `Prompt Templates`: parameterized prompts
2. `Example Selectors`: dynamically select examples to include in the prompts

Up to this point, the prompts have been simple strings. However, usually the prompts will be more complicated:

In [None]:
prompt = PromptTemplate.from_template("What is a good company that makes {product}?")
print(prompt.format(product="cars"))

The `PromptTemplate` works with basic strings but you can also use the more powerful `ChatPromptTemplate` which works with messages and `Chat` models. The types of possible messages are:

1. System
2. Human
3. AI

In [None]:
prompt = ChatPromptTemplate.from_messages([
  ("system", "You are able to translate from {in_language} to {out_language}."),
  ("human", "{text}")
])
print(prompt.format(in_language="English", out_language="German", text="Hello, my name is John."))

The above is just a shortcut way of using special messages, which can be non-variable messages or message prompt templates:

In [None]:
prompt = ChatPromptTemplate.from_messages([
  SystemMessagePromptTemplate(prompt=PromptTemplate.from_template("You are able to translate from {in_language} to {out_language}.")),
  HumanMessage(content="USER:"),
  HumanMessagePromptTemplate(prompt=PromptTemplate.from_template("{text}")),
])
print(prompt.format(in_language="English", out_language="German", text="Hello, my name is John."))

Prompt templates also implement the `Runnable` interface, which is how they can be used with LCEL:

In [None]:
prompt = PromptTemplate.from_template("My name is {name}?")
prompt.invoke({"name": "John"})

In [None]:
prompt = ChatPromptTemplate.from_messages([('human', 'My name is {name}?')])
prompt.invoke({"name": "John"})

It is also very common to include a few examples within a prompt, referred to as `one-shot` or `few-shot` examples. The most basic way of doing that:

In [None]:
examples = [
  {
    "question": "is the name Brian a cool name?",
    "answer": 
"""
The length of the name Brian is 5 characters.
Because the name has an odd length, it is NOT a cool name.
"""
  },  
  {
    "question": "is the name Tami a cool name?",
    "answer": 
"""
The length of the name Tami is 4 characters.
Because the name has an even length, it is a cool name.
"""
  },  
  {
    "question": "is the name Jason a cool name?",
    "answer": 
"""
The length of the name Jason is 5 characters.
Because the name has an odd length, it is NOT a cool name.
"""
  },  
  {
    "question": "is the name Nick a cool name?",
    "answer": 
"""
The length of the name Nick is 4 characters.
Because the name has an even length, it is a cool name.
"""
  },
]

In [None]:
example_prompt = PromptTemplate.from_template("Question: {question}\n{answer}")
print(example_prompt.format(**examples[0]))

In [None]:
prompt = FewShotPromptTemplate(
  examples=examples,
  example_prompt=example_prompt,
  suffix="Question: {input}",
  input_variables=["input"]
)
print(prompt.format(input="is the name Jack a cool name?"))

This works fine if you want to include all the examples in every prompt. However, if you want to only select some of the examples, then you need to use an `ExampleSelector`. In this case, we will use the `SemanticSimilarityExampleSelector` which will decide which examples to include based off of similarity of the input and the examples:

In [None]:
example_selector = SemanticSimilarityExampleSelector.from_examples(
  examples, # examples to  select from
  OpenAIEmbeddings(), # used to create the embeddings
  Chroma, # VectorStore class
  k=1 # number of examples to produce; one-shot in this case
)

Now we can select the examples based off a new question:

In [None]:
question = "is the name Jack a cool name?"
example_selector.select_examples({"question": question})

Using the example selector we can now define a `FewShotPromptTemplate` without passing in all the examples:

In [None]:
prompt = FewShotPromptTemplate(
  example_selector=example_selector,
  example_prompt=example_prompt,
  suffix="Question: {input}",
  input_variables=["input"]
)

print(prompt.format(input="Is Brian a cool name?"))

Now that we have the prompt template defined, let's use it with a LLM:

In [None]:
llm = Ollama(model="mistral")
chain = prompt | llm
chain.invoke({"input": "is the name Nick a cool name?"})

Using examples with a `Chat` is slightly different but not too much.

Here is the simplest example, where `FewShotChatMessagePromptTemplate` is included in every message. This example will also demonstrate that the example few shot prompt template doesn't need to include the `input` suffix, but can be included within another prompt template:

In [None]:
examples = [
  {"input": "2+2", "output": "4"},
  {"input": "2+3", "output": "5"}
]

In [None]:
example_prompt = ChatPromptTemplate.from_messages([
  ("human", "{input}"),
  ("ai", "{output}")
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
  example_prompt=example_prompt,
  examples=examples
)
print(few_shot_prompt.format())

In [None]:
chat_prompt = ChatPromptTemplate.from_messages([
  ("system", "You are a wizard of math."),
  few_shot_prompt,
  ("human", "{input}")
])
print(chat_prompt.format(input="5+2"))

Now we will look at dynamic examples, which is something we've already seen. However, I will also show working with a `VectorStore` to create an examples selector using mixed examples as would be seen in a chat history:

In [None]:
examples = [
    {"input": "2+2", "output": "4"},
    {"input": "2+3", "output": "5"},
    {"input": "2+4", "output": "6"},
    {"input": "Who are you?", "output": "My name is Mistral."},
    {"input": "Hello", "output": "Hello, my name is Mistral."}
]
to_vectorize = [" ".join(e.values()) for e in examples]
print(to_vectorize)

In [None]:
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=examples)

In [None]:
example_selector = SemanticSimilarityExampleSelector(
  vectorstore=vectorstore,
  k=2
)
example_selector.select_examples({"input": "2+2"})

We will create the few shot prompt template using the example selector and example prompt:

In [None]:
few_shot_prompt = FewShotChatMessagePromptTemplate(
  input_variables=["input"],
  example_selector=example_selector,
  example_prompt=ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}")
  ])
)

print(few_shot_prompt.format(input="2+7"))

In [None]:
final_prompt = ChatPromptTemplate.from_messages([
  ("system", "You are a wizard of math."),
  few_shot_prompt,
  ("human", "{input}")
])
print(final_prompt.format(input="5+2"))

Finally, we will use the prompt with a `Chat` model:

In [None]:
llm = ChatOpenAI()
chain = final_prompt | llm
chain.invoke({"input": "5+2"})

## Output Parsers

Output parsers convert the raw output from the language model into a format that you want to use. Most models will return a `string` and the default and most basic parser is the `StrOutputParser`. All parsers are based on the `BaseOutputParser` interface and have a `parse()` function. Here is a simple example:

In [None]:
StrOutputParser().parse("my output")

If you want to have output that is structured, like with Json, then you need to define the data structure with `pydantic`:

In [None]:
class Joke(BaseModel):
  setup: str = Field(description="set up for the joke")
  punchline: str = Field(description="punchline for the joke")

In [None]:
parser = PydanticOutputParser(pydantic_object=Joke)
parser.parse('{"setup": "What do you call a bear with no teeth?", "punchline": "A gummy bear!"}')

Let's see it all together in a chain:

In [None]:
llm = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages([
  ("system", "You create jokes with a <setup> and a <punchline> about the topic provided by the user. Return the joke as JSON with a <setup> and <punchline> property."),
  ("human", "{input}")
])
chain = prompt | llm | parser

In [None]:
chain.invoke({"input": "Tell me a joke about dentists."})

There are a number of built-in parsers like the two we have already seen. Here are few more:

In [None]:
CommaSeparatedListOutputParser().parse("1, 2, 3, 4, 5") # the space between is required

In [None]:
DatetimeOutputParser().parse("2008-01-03T18:15:05.000000Z") # ISO 8601 format

In [None]:
class Colors(Enum):
  RED = "red"
  BLUE = "blue"
  GREEN = "green"

EnumOutputParser(enum=Colors).parse("red")

Now let's look at how easy it is to create your own output parser:

In [None]:
class BetterCommaSeperatedListOutputParser(BaseOutputParser):
  def parse(self, text: str) -> list:
    return text.strip().split(",")

BetterCommaSeperatedListOutputParser().parse("1,2,3,4,5")