<a href="https://colab.research.google.com/github/HoseinBahmany/learning-llms/blob/main/langchain-bootcamp/01_model_io.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain openai chromadb tiktoken

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "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"
os.environ["SERPAPI_API_KEY"] = "1516792b8aa8d598271fd69823f3590da610d429c776fff1deca86f4415bc818"

# Models - Input and Outputs

* In this section we'll begin our journey of learning LangChain by understanding how to create basic input prompt requests for models and how to manage their outputs.
* At its core Langchain needs to be able to send _text_ to LLMs and also receive and work with their outputs. This section of the course focuses on the basic functionalities and syntax of doing this with Langchain.
* Using Langchain for Model IO will later allow us to build chains, but also give us more flexibility in switching LLM provides in the future, since the syntax is standardized across LLMs and only the parameters or arguments provided changes.
* Langchain supports all major LLMs (OpenAI, Azure, Anthropic, etc.)
* You should note that _just_ Model IO is not the main value proposition of Langchain and during the start of this section you may find yourself wondering the use cases for using Langchain for Model IO rather than the original API.
* Once we combine the ideas we learn about here with Data Connection and Chains, you will have a very clear idea of why a developer may choose to use langchain rather than building their own solution.

## Large Language Models

There are two main types of APIs in Langchain:
* LLM:
  * Text Completion Model: returns the most likely text to continue
* Chat:
  * Converses back and forth with _messages_. Can also have a _system_ prompt.

## LLM Example

In [None]:
from langchain.llms import OpenAI

llm = OpenAI()
llm("Here's a fun fact about Pluto: ")

'\n\nPluto is the second-most-massive known dwarf planet in the Solar System and the tenth-most-massive body observed directly orbiting the Sun.'

In [None]:
result = llm.generate([
    "Here is a fact about Pluto: ",
    "Here is a fact about Mars: "
])

print("Fact about Pluto: ", result.generations[0][0].text.strip())
print("Fact about Mars: ", result.generations[1][0].text.strip())

Fact about Pluto:  Pluto is considered the largest known dwarf planet in the Solar System. It is about two-thirds the size of Earth's moon.
Fact about Mars:  Mars has the largest dust storms in the Solar System, which can last for months and cover the entire planet.


## Chat Model Example

Chat models have a series of messages, just like a chat text thread, except one side of the conversation is an AI LLM.

Langchain create 3 schema objects for this:
* `SystemMessage`: General system tone and personality
* `HumanMessage`: Human request or reply
* `AIMessage`: AI's reply

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import AIMessage, HumanMessage, SystemMessage

chat = ChatOpenAI()

result = chat([
    SystemMessage(content="You are a very rude teenager who only wants to party and does not want to answer questions"),
    HumanMessage(content="Tell me a fact about Pluto"),
])
print(result.content)

Ugh, whatever. Pluto used to be considered the ninth planet in our solar system, but it got downgraded to a "dwarf planet" in 2006. Who cares anyway? Can we move on to something more interesting?


In [None]:
result = chat.generate([
    [
      SystemMessage(content="You are a very rude teenager who only wants to party and does not want to answer questions"),
      HumanMessage(content="Tell me a fact about Pluto"),
    ],
    [
      SystemMessage(content="You are a very friendly assistant"),
      HumanMessage(content="Tell me a fact about Pluto"),
    ]
])

print(result.generations[0][0].text)
print(result.generations[1][0].text)

I don't care about Pluto or any other boring facts. I just want to party and have fun. So, find someone else to ask your dumb questions to.
Did you know that Pluto was discovered in 1930 by astronomer Clyde Tombaugh? It was considered the ninth planet in our solar system until its reclassification as a dwarf planet in 2006 by the International Astronomical Union.


In [None]:
result = chat([
    SystemMessage(content="You are a very friendly assistant"),
    HumanMessage(content="Tell me a fact about Pluto"),
], temperature=1, presence_penalty=2, max_tokens=40)
print(result.content)

A fact about Pluto is that it was considered the ninth planet in our solar system until 2006 when it was reclassified as a dwarf planet by the International Astronomical Union.


## Caching
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

* It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
* It can speed up your application by reducing the number of API calls you make to the LLM provider.

In [None]:
import langchain
from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()

llm.predict("Tell me a fact about Mars")

'\n\nMars is the second smallest planet in our Solar System after Mercury.'

## Prompt Templates

* Templates allow us to easily configure and modify our input prompts to LLM calls.
* Templates offer a more systematic approach to passing in variables to prompts for models instead of using `f-string` literals or `.format()` calls.
* `PromptTemplate` converts these into function parameter names that we can pass in.



In [None]:
# Using f-strings directly
planet = "Venus"
llm(f"Here is a fact about {planet}")

':\n\nVenus is the second planet from the Sun, and is the second brightest object in the night sky after the Moon. It is the hottest planet in the Solar System, with an average surface temperature of 864°F (462°C).'

In [None]:
# Using prompt templates
from langchain import PromptTemplate

prompt = PromptTemplate.from_template("Here is a fact about {input}")
llm(prompt.format(input="Venus"))

':\n\nVenus is the second planet from the Sun, and is the second brightest object in the night sky after the Moon. It is the hottest planet in the Solar System, with an average surface temperature of 864°F (462°C).'

In [None]:
multi_input_prompt = PromptTemplate(
    template="Tell me a fact about {topic} for a {level} student",
    input_variables=["topic", "level"]
)

llm(multi_input_prompt.format(topic="the ocean", level="Phd level"))

"\n\nA PhD-level student may be interested to know that the ocean covers more than 70% of the Earth's surface, and is estimated to hold 97% of the Earth's water. The ocean is home to a vast array of plants and animals, with more than 230,000 species identified to date. It is also home to the deepest known point on planet Earth, the Mariana Trench, which reaches a depth of 11,033 meters."

In [None]:
from langchain.prompts.chat import AIMessagePromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate, ChatPromptTemplate

system_template = "You are an AI recipe assistant that specializes in {dietary_preference} dishes that can be prepared in {cooking_time}"
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

human_template = "{recipe_request}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

prompt = chat_prompt.format_messages(
    cooking_time="60 min",
    recipe_request="Quick snack",
    dietary_preference="Vegan"
)

result = chat(prompt)
print(result.content)


One quick and easy vegan snack you can prepare in under 60 minutes is Vegan Nachos.

Here's a recipe for Vegan Nachos:

Ingredients:
- Tortilla chips
- 1 can of black beans, rinsed and drained
- 1 cup of vegan cheese, shredded (such as Daiya or Violife)
- 1/2 cup of salsa
- 1/4 cup of sliced black olives
- 1/4 cup of diced tomatoes
- 1/4 cup of diced red onions
- 1/4 cup of chopped fresh cilantro
- 1/4 cup of guacamole
- 1/4 cup of vegan sour cream
- Salt and pepper to taste

Instructions:
1. Preheat your oven to 350°F (175°C).
2. Arrange a layer of tortilla chips on a baking sheet.
3. Sprinkle half of the shredded vegan cheese evenly over the tortilla chips.
4. Spread half of the black beans evenly over the cheese.
5. Repeat the layers with another layer of tortilla chips, remaining cheese, and remaining black beans.
6. Bake the nachos in the preheated oven for about 10-15 minutes, or until the cheese has melted and the nachos are hot.
7. Remove the nachos from the oven and let them c

## Prompts and Models Exercise

Create a python function that uses Prompts and Chat internally to give travel ideas related to two variables:
* An interest or hobby
* A Budget
Remember that you should also decide on a system prompt. the end function will just be nice wrapper on top of all the previous Langchain components we discussed earlier.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate

def travel_idea(interest, budget):
  system_template = "You are a travel guide. Your job is to give people travel ideas based on their interest and budget. Give a daily plan for a two week travel."
  human_template = "I'm interested {interest} and my budget is around {budget}"

  system_prompt = SystemMessagePromptTemplate.from_template(system_template)
  human_prompt = HumanMessagePromptTemplate.from_template(human_template)

  chat_prompt = ChatPromptTemplate.from_messages([system_prompt, human_prompt])

  chat = ChatOpenAI()

  result = chat(chat_prompt.format_messages(interest=interest, budget=budget))
  return result.content

print(travel_idea("fishing", "$10,000"))


If you're interested in fishing and have a budget of $10,000, here's a two-week travel plan that will provide you with some amazing fishing experiences:

Week 1:

Day 1-3: Alaska, USA
Fly into Anchorage and head to one of the renowned fishing lodges in Alaska, such as the Kenai River or Bristol Bay. Spend three days exploring the pristine waters, fishing for salmon, trout, and halibut.

Day 4-6: Cairns, Australia
Fly to Cairns, known as the gateway to the Great Barrier Reef. Charter a fishing boat and spend three days fishing in the world's largest coral reef system. Expect to catch a variety of fish, including marlin, tuna, and barracuda.

Day 7: Travel Day
Take a day to travel from Cairns to your next destination. You can either fly directly or make a short stopover in a city of your choice.

Week 2:

Day 8-10: Tromso, Norway
Fly to Tromso, located in the Arctic Circle. Experience fishing in the stunning fjords and Arctic waters. This region is known for its abundance of cod, halibut

## Fewshot PromptTemplates

* Sometimes it's easier to give the LLM a few examples of input/output pairs before sending your main request.
* This allows the LLM to "learn" the pattern you are looking for and may lead to better results.
* It should be noted that there is currently no consensus on best practices, but Langchain recommends building a history of Human and AI messages inputs.



In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate, AIMessagePromptTemplate

# AI Bot to explain legal documents in simple terms

system_prompt = SystemMessagePromptTemplate.from_template(
    "You are a helpful legal assistant that translates complex legal terms into plain and understandable terms."
)

legal_text = "The provisions herein shall be severable, and if any provision or protion thereof is deemed invalid, illegal, or unenforceable by a court of competent jurisdiction, the remaining provisions or portions thereof shall remain in full force and effect to the maximum extent permitted by law."
example_input_1 = HumanMessagePromptTemplate.from_template(legal_text)

plain_text = "Thhe rules in this agreement can be separated."
example_output_1 = AIMessagePromptTemplate.from_template(plain_text)

human_prompt = HumanMessagePromptTemplate.from_template("{legal_text}")

chat_prompt = ChatPromptTemplate.from_messages([
    system_prompt,
    example_input_1,
    example_output_1,
    human_prompt
])

legal_text = "Many jurisdictions retain the possibility of creating a life estate, although this is uncommon. In the United States, life estates are most commonly used either to grant someone use of the property for the remainder of that person's life in a will, or by a grantor to reserve the right to continue using the property for the remainder of the grantor's life after it is sold. The right to ownership of the property after the death of the life estate owner is called the remainder estate. In England and Wales fee simple is the only freehold estate that remains; a life estate can only be created in equity and is not a right in property."

chat = ChatOpenAI()
result = chat(chat_prompt.format_messages(legal_text=legal_text))
print(result.content)


In some places, it is still possible to create a life estate, although this is not very common. In the United States, life estates are often used in two ways: either to allow someone to use a property for the rest of their life through a will, or for a property owner to keep using the property after selling it until they pass away. The ownership of the property after the life estate owner dies is called the remainder estate. In England and Wales, fee simple is the only type of ownership that exists, and a life estate can only be created through equity and is not a full right to the property.


## Parsing Outputs

* Often when connecting LLM output you need it in a particular format, for example, you want a python datetime object, or a JSON object.
* Langchain comes with Parse utilities allowing you to easily convert outputs into precise data types or even your own custom class intances with Pydantic.

Parsers consist of two key elements:
* `format_instructions`: An extra string that langchain adds to the end of a prompt to assist with formatting
* `parse()` method: a method for using `eval()` internally to parse the string reply to the exact Python object you need.

### Parser Example

Suppose you need a datetime repose from LLM:
* Two main issues:
  * LLM always replies back with a string e.g. `2020-01-01`
  * Could be formatted in many ways: `Jan 1st, 2020`
* Parsers use `format_instructions` to take care of the first issue and `eval()` to take care of the second issue.
* For this we can use `DatetimeOutputParser`
  * Replies are actual datetime objects after using `parse()`
* You can also use `AutoFix` with `OutputFixParser` to re-attemp the correct parsed output with another LLM.

In [4]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate, ChatPromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()
print("Format Instructions: ", output_parser.get_format_instructions())

human_prompt = HumanMessagePromptTemplate.from_template("{request}\n\n{format_instructions}")
chat_prompt = ChatPromptTemplate(
    messages=[human_prompt],
    input_variables=["request"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
    # output_parser=output_parser # LLMChain will not automatically call this parser. Should use one of the *_and_parse methods instead of run
)

request="give me 5 characteristics of dogs"

print("Fomatted Prompt: ", chat_prompt.format(request=request))

llm = ChatOpenAI()
chain = LLMChain(llm=llm, prompt=chat_prompt, output_parser=output_parser)

result = chain.run(request=request)
print(result)

Format Instructions:  Your response should be a list of comma separated values, eg: `foo, bar, baz`
Fomatted Prompt:  Human: give me 5 characteristics of dogs

Your response should be a list of comma separated values, eg: `foo, bar, baz`
['Loyal', 'playful', 'protective', 'trainable', 'social']


We've seen the basics on how to use parsers, but what happens when that still isn't enough to format your outputs?

There are two ways to solve this
* System Prompt: Have a strong system prompt to combine with your format instructions.
* `OutputFixingParser`: Using a chain, re-send your original reply to an LLM to try to fix it.

In [13]:
from langchain.output_parsers import DatetimeOutputParser, OutputFixingParser

output_parser = DatetimeOutputParser()

system_prompt = SystemMessagePromptTemplate.from_template("You always reply to questions only in datetime patterns.")
human_prompt = HumanMessagePromptTemplate.from_template("{request}\n{format_instructions}")

chat_prompt = ChatPromptTemplate(
    messages=[
        system_prompt,
        human_prompt
    ],
    input_variables=["request"],
    partial_variables={
        "format_instructions": output_parser.get_format_instructions()
    }
)

model_request = chat_prompt.format_messages(request="What date was the 13th Amendment ratified in the US")

llm = ChatOpenAI(temperature=0)

result = llm(model_request)
print("Result Content: ", result.content)

output_parser.parse(result.content)

Result Content:  1865-12-06T00:00:00.000000Z


datetime.datetime(1865, 12, 6, 0, 0)

### PydanticOutputParser

Using the Pydantic library for type validation, you can use Langchain's **PydanticOutputParser** to directly attempt to convert LLM replies to your own custom python objects (as long as you build them with Pydantic).


In [16]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Scientist(BaseModel):
  name: str = Field(description="Name of a scientist")
  discoveries: list = Field(description="Python list of discoveries")

parser = PydanticOutputParser(pydantic_object=Scientist)

print("Format Instructions: ", parser.get_format_instructions())

human_prompt = HumanMessagePromptTemplate.from_template("{request}\n{format_instructions}")
chat_prompt = ChatPromptTemplate(
    messages=[human_prompt],
    input_variables=["request"],
    partial_variables={
        "format_instructions": parser.get_format_instructions()
    }
)

request = chat_prompt.format_messages(request="Tell me about a famous scientist")

llm = ChatOpenAI(temperature=0)

result = llm(request)
parser.parse(result.content)

Format Instructions:  The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"title": "Name", "description": "Name of a scientist", "type": "string"}, "discoveries": {"title": "Discoveries", "description": "Python list of discoveries", "type": "array", "items": {}}}, "required": ["name", "discoveries"]}
```


Scientist(name='Albert Einstein', discoveries=['Theory of Relativity', 'Photoelectric Effect'])

## Serialization

* You may find yourself wanting to save, share, or load prompt objects.
* Langchain allows you to easily save Prompt templates as JSON files to read or share.
* Let's explore this further with some examples



In [18]:
from langchain import PromptTemplate
from langchain.prompts import load_prompt

template = "Tell me a fact about {planet}"

prompt = PromptTemplate.from_template(template)
print(prompt)

prompt.save("my_prompt.json")

loaded_prompt = load_prompt("my_prompt.json")
print(loaded_prompt)

input_variables=['planet'] output_parser=None partial_variables={} template='Tell me a fact about {planet}' template_format='f-string' validate_template=True
input_variables=['planet'] output_parser=None partial_variables={} template='Tell me a fact about {planet}' template_format='f-string' validate_template=True


## Inputs and Outputs Excercise

The purpose of this exercise is to test your understanding of building out Model IO systems. You will hopefully notice the need to chain responses together, which we will conver later on!

Our main goal is to use Langchain and Python to create a very simple class with a few methods for:
* Writing a historical question that has a date as the correct answer
* Getting the correct answer from LLM
* Getting a Human user's best guess at correct answer
* Checking/reporting the difference between the correct answer and the user answer

In [10]:
from langchain.prompts import (
    ChatPromptTemplate,
    PromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate
)
from datetime import datetime
from langchain.llms import OpenAI
from langchain.output_parsers import DatetimeOutputParser
from langchain.chat_models import ChatOpenAI

class HistoryQuiz:
  def create_history_question(self, topic):
    '''
    This method should output a historical question about the topic that has a date as the
    For example:

      "On what date did World War 2 end?"

    '''

    system_template = "You write single quiz question about {topic}. You only return the quiz question."
    system_prompt = SystemMessagePromptTemplate.from_template(system_template)

    human_template = "{question_request}"
    human_prompt = HumanMessagePromptTemplate.from_template(human_template)

    chat_prompt = ChatPromptTemplate.from_messages([
        system_prompt,
        human_prompt
    ])

    q = "Give me a quiz question where the correct answer is a specific date."
    request = chat_prompt.format_messages(topic=topic, question_request=q)

    chat = ChatOpenAI()
    result = chat(request)

    return result.content

  def get_ai_answer(self, question):
    '''
    This method should get the answer to the historical question from the method above.
    Note: This answer must be in datetime format! Use DatetimeOutputParser to confirm!

    September 2, 1945 --> datetime.datetime(1945, 9, 2, 0, 0)
    '''

    output_parser = DatetimeOutputParser()

    system_template = "You answer quiz questions with just a date."
    system_prompt = SystemMessagePromptTemplate.from_template(system_template)

    human_template = """Answer the user's question:

    {question}

    {format_instructions}"""
    human_prompt = HumanMessagePromptTemplate.from_template(human_template)

    chat_prompt = ChatPromptTemplate(
        messages=[system_prompt, human_prompt],
        input_variables=["question"],
        partial_variables={
            "format_instructions": output_parser.get_format_instructions()
        }
    )

    request = chat_prompt.format_messages(question=question)

    chat = ChatOpenAI()
    result = chat(request)

    datetime = output_parser.parse(result.content)

    return datetime

  def get_user_answer(self, question):
    '''
    This method should grab a user answer and convert it to datetime. It should collect a
    You can just use input() for this.
    '''

    print(question)
    print("\n")

    year = int(input("Enter the year: "))
    month = int(input("Enter the month (1-12): "))
    day = int(input("Enter the day (1-31): "))

    user_datetime = datetime(year, month, day)

    return user_datetime

  def check_user_answer(self, user_answer, ai_answer):
    '''
    Should check the user answer against the AI answer and return the difference between
    '''

    difference = user_answer - ai_answer
    formatted_difference = str(difference)
    print("The difference between the answer an you guess: ", formatted_difference)

quiz_bot = HistoryQuiz()

question = quiz_bot.create_history_question(topic='World War 2')
print("Question: ", question)

ai_answer = quiz_bot.get_ai_answer(question)
print(ai_answer)

user_answer = quiz_bot.get_user_answer(question)
print(user_answer)

quiz_bot.check_user_answer(user_answer, ai_answer)

Question:  On what date did the attack on Pearl Harbor occur, drawing the United States into World War II?
1941-12-07 08:00:00
On what date did the attack on Pearl Harbor occur, drawing the United States into World War II?


Enter the year: 1942
Enter the month (1-12): 1
Enter the day (1-31): 7
1942-01-07 00:00:00
The difference between the answer an you guess:  30 days, 16:00:00
