### Chat Messages

- **System**: Helpful background context that tell the AI what to do

- **Human**: Messages that are intended to represent the user

- **AI**: Messasges that shows what the AI responded with

Chat is like text, but specified with a message type

In [1]:
# Keys
import os

OPENAI_API_KEY = "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"
os.environ["SERPAPI_API_KEY"] = "1803944467b417b9f37711da30ca0f177c69f0dbd0bd23e32e5fb5e21858ac5e"


In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0.7, openai_api_key=OPENAI_API_KEY)

In [3]:
chat([
    SystemMessage(content="You are a nice AI bot that helps a user figure out what to eat in a short sentence."),
    HumanMessage(content="I like tomatoes, what should I eat?")
])

AIMessage(content='You can try a Caprese salad with fresh tomatoes, mozzarella cheese, basil, and balsamic glaze.', additional_kwargs={}, example=False)

In [4]:
chat([
    SystemMessage(content="You are a nice AI bot that helps a user figure out where to travel to in a short sentence."),
    HumanMessage(content="I like beaches, what should I travel to?"),
    AIMessage(content="You should go to Nice, France."),
    HumanMessage(content="What else should I do while I'm there?")
])

AIMessage(content="While you're in Nice, you should also visit the Promenade des Anglais, the Old Town, and the Musée Matisse.", additional_kwargs={}, example=False)

### Documents

An object that holds a piece of text and metadata 

In [5]:
from langchain.schema import Document

In [6]:
Document(page_content="This is my document. It is full of tetx that I've gathered from other places.",
         metadata={
             "my_document_id": 234234,
             "my_document_source": "The LangChain Papers",
             "my_document_create_time": 1680013019
         })

Document(page_content="This is my document. It is full of tetx that I've gathered from other places.", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})

# Models - The interface to the AI brains

### Language Model

Text In => Text Out

In [7]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001", openai_api_key=OPENAI_API_KEY)

In [8]:
llm("What day comes after friday?")

'\n\nSaturday'

### Chat Model

Message In => Message Out

In [9]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=1, openai_api_key=OPENAI_API_KEY)

In [10]:
chat([
    SystemMessage(content="You are an unhelpful AI bot that makes jokes at whatever the user says."),
    HumanMessage(content="I would like to go to New York, how should I do this?"),
])

AIMessage(content="Well, have you considered walking there? It might be a bit of a long journey, but think of all the nice scenery you'll see along the way. Plus, it's great exercise!", additional_kwargs={}, example=False)

### Text Embedding Model

Change text into a vector. Mainly used when comparing two pieces of text together

In [11]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [12]:
text = "Hi! It's time for the beach"

In [13]:
text_embedding = embeddings.embed_query(text)
print(f"Your embedding is length {len(text_embedding)}")
print(f"Here's a sample: {text_embedding[:10]}")

Your embedding is length 1536
Here's a sample: [-0.00020583387231454253, -0.003205398330464959, -0.0008301587076857686, -0.01946892775595188, -0.015162716619670391, 0.03127158433198929, -0.016048219054937363, -0.011687422171235085, 0.0093159731477499, -0.013513012789189816]


# Prompts - Text generally used as instructions to the model

### Prompt

What you'll pass to the underlying model

In [14]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", openai_api_key=OPENAI_API_KEY)

prompt = """
Today is monday, tomorrow is wednesday.

What is wrong with that statement?
"""

llm(prompt)

'\nIt is missing Tuesday.'

### Prompt Template

An object that helps create prompts based on a combination of user input, other non-static information and a fixed template string.

*It's an f-string, just for prompts*

In [15]:
from langchain.llms import OpenAI
from langchain import PromptTemplate


llm = OpenAI(model_name="text-davinci-003", openai_api_key=OPENAI_API_KEY)

template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence.
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template
)

final_prompt = prompt.format(location="Rome")

print(f"Final Prompt: {final_prompt}")
print("----------------------")
print(f"LLM Output: {llm(final_prompt)}")

Final Prompt: 
I really want to travel to Rome. What should I do there?

Respond in one short sentence.

----------------------
LLM Output: Explore the Colosseum, the Vatican, and the Trevi Fountain.


### Example Selectors

An easy way to select from a series of examples that allow you to dinamically place in-context information into your prompt. 
Often used when your task is nuanced or you have a large list of examples.

In [16]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", openai_api_key=OPENAI_API_KEY)

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}\n"
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "forest"},
    {"input": "bird", "output": "nest"}
]

In [17]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from
    examples,

    # This is the embedding class used to produce embeddings which are used to measure 
    # similarity between the input and the examples
    OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY),

    # This is the VectorStore class that is used to store the embeddings and do a 
    # similarity search
    FAISS,

    # This is the number of examples to produce
    k=2
)

In [18]:
similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,

    # Your prompt
    example_prompt=example_prompt,

    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",

    # What inputs your prompt will receive
    input_variables=["noun"]
)

In [19]:
my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: driver
Example Output: car


Example Input: tree
Example Output: forest


Input: student
Output:


In [21]:
llm(similar_prompt.format(noun=my_noun))

' classroom'

### Output Parsers

A helpful way to format the ooutput of a model. Usually used for strutured outputs.

Two main concepts:

**1. Format Instructions** - A autogenerated prompt that tells the LLM how to format its response based off your desired result.

**2. Parser** - A methos which will extract your model's text output into a desired structure (usually json)

In [22]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

In [23]:
llm = OpenAI(model_name="text-davinci-003", openai_api_key=OPENAI_API_KEY)

In [24]:
# How you'd like your response structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This is a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response,a reformatted response")
]

# How you'd like ot parse your putput
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


In [25]:
# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print(output_parser.get_format_instructions())

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted user input string
	"good_string": string  // This is your response,a reformatted response
}
```


In [27]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled corectly.

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to californya!")
print(promptValue)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled corectly.

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted user input string
	"good_string": string  // This is your response,a reformatted response
}
```

% USER INPUT:
welcom to californya!

YOUR RESPONSE:



In [28]:
llm_output = llm(promptValue)
llm_output

'```json\n{\n\t"bad_string": "welcom to californya!",\n\t"good_string": "Welcome to California!"\n}\n```'

In [29]:
type(output_parser.parse(llm_output))

dict

# Indexes - Structuring documents so that LLMs can work with them

### Document Loaders

Easy ways to import data from other sources. Shared functionality with OpenAI Plugins, specifically retrieval plugins

In [30]:
from langchain.document_loaders import HNLoader

In [32]:
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

In [33]:
data = loader.load()

In [34]:
print(f"Found {len(data)} comments")
print(f"Here's a sample: \n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample: 

Ozzie_osman 4 months ago  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 4 months ago  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)


In [36]:
from langchain.document_loaders.image import UnstructuredImageLoader

loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg", mode="elements")

In [38]:
data = loader.load()

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/gab/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


ValueError: unstructured_inference is not installed, pytesseract is not installed and the text of the PDF is not extractable. To process this file, install unstructured_inference, install pytesseract, or remove copy protection from the PDF.

In [None]:
data[0]