# Langchain Course Charpter 2

[Lesson 2](https://learn.deeplearning.ai/courses/langchain/lesson/2/models%2C-prompts-and-parsers) of the Deep Learning AI course on Langchain teaches how to use prompt templates to feed variables into prompts and how to extract content from the model response in a structured way using an output parser. The example used in this lesson is a translation and summarization of customer reviews.

To rework this lesson, I am going to implement my own little example which builds on the Wittmann-Tours dataset to translate and summarize blog posts, additionally extracting the country of the activities described in the blog post and the publication date.


## Setup of Wittmann-Tours dataset

The [Wittmann-Tours.de](https://wittmann-tours.de) blog is available for download as a dataset in [the Wittmann-Tours repo](https://github.com/chrwittm/wittmann-tours).

To download the dataset, we can use the `wget` command to download the `blogposts-md.zip` file, afterwards, simply `unzip` the file.

In [1]:
#!wget -P ./wt-blogposts https://github.com/chrwittm/wittmann-tours/raw/main/zip/blogposts-md.zip
#!unzip -o ./wt-blogposts/blogposts-md.zip -d ./../wt-blogposts/

As a result we have all the blog posts in a folder called `wt-blogposts`. For each blog post there is a directory with the name of the blog post and a file `index.md` containing the blog post. For example:   

```bash
wt-blogposts
├── 3-tage-in-melbourne
│   └── index.md
```

Here are 2 helper functions to get the blog post files and the read a blog post. The functions are borrowed / enhanced version from my blog post on [Remembering the Wittmann Tours World Trip with RAG](https://chrwittm.github.io/posts/2024-03-22-rag1-remembering-world-trip/).

In [2]:
import os
import glob

def get_blog_post_files(path_to_blog):

    pattern = os.path.join(path_to_blog, "**/*.md")
    return sorted(glob.glob(pattern, recursive=True))

def get_blogpost(path_to_blogpost):
    with open(path_to_blogpost, 'r') as file:
        content = file.read()

    return content

Here are the first 5 blog post files:

In [3]:
blogpost_files = get_blog_post_files("./../wt-blogposts")
blogpost_files[0:5]


['./../wt-blogposts/3-tage-in-melbourne/index.md',
 './../wt-blogposts/addis-abeba-die-hauptstadt-athiopiens/index.md',
 './../wt-blogposts/aksum-aufbewahrungsort-der-bundeslade/index.md',
 './../wt-blogposts/am-fusse-des-cotopaxi/index.md',
 './../wt-blogposts/an-der-grenze-von-mexiko-nach-belize/index.md']

Let's read the first blog post and print the heading and the first sentence (omitting the header information):

In [4]:
path_to_blogpost = "./../wt-blogposts/3-tage-in-melbourne/index.md"
blogpost = get_blogpost(path_to_blogpost)

print(f"The blogpost has {len(blogpost)} characters. \n")
print(f"Heading and first sentence: \n\n{blogpost[317:835]}")

The blogpost has 7666 characters. 

Heading and first sentence: 

# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-liveable-city-by-the-economist-for-seventh-year-20170816-gxx1kg.html)“, zu der sie der Economist wiederholt gekürt hat. 


## Installing Langchain

Since this is the first time I use Langchain, it needs to be installed:

In [5]:
#!pip install langchain openai

## Initializing the LLM

Unlike in the lesson, we will use OpenAI's `gpt-4o-mini` model. By now, [OpenAI recommends](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to use `gpt-4o-mini` for new applications because it is significantly smarter, cheaper, and just as fast as GPT-3.5 Turbo.

Here is my usual quick example to check if the LLM-connection works, prompting it to list all the planets in the solar system.

In [6]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv() #contains the OPENAI_API_KEY

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Prompt model
prompt = "List all the planets in the solar system."
response = llm.invoke(prompt)
print(response.content)

The planets in the solar system, in order from the Sun, are:

1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn
7. Uranus
8. Neptune

Additionally, there are dwarf planets, such as Pluto, Eris, Haumea, and Makemake, but they are not classified as the main planets of the solar system.


## Three Approaches to Prompt Construction: From Basic Python to Langchain Templates

Let's switch to the Wittmann-Tours blog posts.
In this section, we will see three approaches to prompt construction:

1. F-string
2. Langchain Prompt Template
3. Langchain Chat Prompt Template



### Translating and summarizing a blog post with F-string

This is the most basic approach to prompt construction. We define placeholders for the variables we want to include in the prompt and insert the actual values into the prompt.

In [7]:
summary_language = "English"
summary_words = 100

prompt = f"""
You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into {summary_language}.

Then summarize the translated blog post into {summary_words} words.

Respond only with the summary in {summary_language}.

Blog post:
```{blogpost}```
"""

print(prompt[:1100])


You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into English.

Then summarize the translated blog post into 100 words.

Respond only with the summary in English.

Blog post:
```---
title: '3 Tage in Melbourne'
description: ""
published: 2018-04-13
redirect_from: 
            - https://wittmann-tours.de/3-tage-in-melbourne/
categories: "Aardman, Australien, Australien, Claymation, Melbourne, Ned Kelly, Stadt"
hero: ./img/wp-content-uploads-2018-04-CW-20180124-095738-4825-1-1024x683.jpg
---
# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-liveable-city-by-th

Using the `invoke` method, we can now send the prompt to the LLM and print the response.

In [8]:
response = llm.invoke(prompt)
print(response.content)

The blog post describes a three-day visit to Melbourne, Australia, highlighting its status as a vibrant cultural hub despite being overshadowed by Sydney. The author appreciates Melbourne's colonial architecture, particularly the Flinders Street Station, and its charming arcades filled with unique shops. The visit coincided with the Australian Open, creating a lively atmosphere throughout the city. The post also explores Melbourne's street art scene, the ACMI film museum showcasing Aardman animations, and the legend of Ned Kelly at the Victoria State Library. Overall, the author enjoyed their time in Melbourne, noting its rich offerings and friendly locals.


### Translating and summarizing a blog post with prompt template

Instead of using an `f-string` for the prompt, we can use a prompt template. While the result may seem the same at first, a prompt template offers greater flexibility when scaling up a project. The key advantages of using prompt templates are:

- **Reusability and Modularity**: With the `f-string` approach, variables are directly linked to the runtime code. In contrast, prompt templates allow variables to be inserted dynamically, making them reusable across different parts of the application or runtime environments.
  
- **Separation of Logic and Content**: In the current example notebook, prompts are embedded directly within the code. However, in larger projects, prompts are often stored separately from the logic, such as in a configuration file or a database. This clear separation between logic and content simplifies the management of both, making the code more maintainable and scalable.

- **Error Handling and Validation**: Prompt templates provide better error handling by ensuring that all required variables are supplied before formatting the template. This helps catch potential issues earlier and makes the code more robust.

When defining the prompt template, notice that the text is the same as above, but it is not written as f-string. Instead, it is a "regular" string. The variables are still included in curly braces. To fill in the variables, we use the `format` method.

In [9]:
from langchain_core.prompts import PromptTemplate

template_prompt = """
You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into {summary_language}.

Then summarize the translated blog post into {summary_words} words.

Respond only with the summary in {summary_language}.

Blog post:
```{blogpost}```
"""

prompt_template = PromptTemplate.from_template(template_prompt)

formatted_prompt = prompt_template.format(blogpost=blogpost, summary_language=summary_language, summary_words=summary_words)

print(formatted_prompt[:1100])


You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into English.

Then summarize the translated blog post into 100 words.

Respond only with the summary in English.

Blog post:
```---
title: '3 Tage in Melbourne'
description: ""
published: 2018-04-13
redirect_from: 
            - https://wittmann-tours.de/3-tage-in-melbourne/
categories: "Aardman, Australien, Australien, Claymation, Melbourne, Ned Kelly, Stadt"
hero: ./img/wp-content-uploads-2018-04-CW-20180124-095738-4825-1-1024x683.jpg
---
# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-liveable-city-by-th

We can now use the formatted prompt to translate and summarize the blog post.

In [10]:
response = llm.invoke(formatted_prompt)
print(response.content)

The blog post describes a three-day visit to Melbourne, Australia, highlighting its status as a vibrant cultural hub despite being overshadowed by Sydney. The author appreciates Melbourne's colonial architecture, particularly the Flinders Street Station, and its charming arcades filled with unique shops. The visit coincided with the Australian Open, creating a lively atmosphere throughout the city. The post also emphasizes Melbourne's street art scene, notable museums like the ACMI, and the legend of Ned Kelly at the State Library. Overall, the author enjoyed their time in Melbourne, noting its rich offerings and friendly locals.


## How to dynamically format a prompt template

In the example, we have hardcoded the values for prompt variables. This is not always practical, so explore how we can do this dynamically.

We create the dictionary `provided_variables` with the variables we want to include in the prompt. Notice that the first attempt fails with a `KeyError`, because we are missing the `summary_words` variable.

In [11]:
prompt_template = PromptTemplate.from_template(template_prompt)
required_variables = prompt_template.input_variables
print(required_variables)

provided_variables = {"blogpost": blogpost, "summary_language": summary_language}

try:
    prompt = prompt_template.format(**provided_variables)
    print("Formatted Prompt:", prompt[:1100])
except KeyError as e:
    print(f"Missing variable: {e}")

['blogpost', 'summary_language', 'summary_words']
Missing variable: 'summary_words'


In [12]:
provided_variables = {"blogpost": blogpost, "summary_language": summary_language, "summary_words": summary_words}

try:
    prompt = prompt_template.format(**provided_variables)
    print("Formatted Prompt:", prompt[:1100])
except KeyError as e:
    print(f"Missing variable: {e}")

Formatted Prompt: 
You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into English.

Then summarize the translated blog post into 100 words.

Respond only with the summary in English.

Blog post:
```---
title: '3 Tage in Melbourne'
description: ""
published: 2018-04-13
redirect_from: 
            - https://wittmann-tours.de/3-tage-in-melbourne/
categories: "Aardman, Australien, Australien, Claymation, Melbourne, Ned Kelly, Stadt"
hero: ./img/wp-content-uploads-2018-04-CW-20180124-095738-4825-1-1024x683.jpg
---
# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-l

### Translating and summarizing a blog post with chat prompt template

When working in chat applications, i.e. multi-turn conversations, it is often easier to work with the `ChatPromptTemplate`. For our example, it is technically not necessary, but since it is used in the lesson, let's see how it works.

The main difference is that we do not format a single prompt, but we are working with a list of messages.

In [13]:
from langchain.prompts import ChatPromptTemplate

chat_prompt_template = ChatPromptTemplate.from_template(template_prompt)

formatted_messages = chat_prompt_template.format_messages(blogpost=blogpost, summary_language=summary_language, summary_words=summary_words)

print(formatted_messages[0].content[:1100])


You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into English.

Then summarize the translated blog post into 100 words.

Respond only with the summary in English.

Blog post:
```---
title: '3 Tage in Melbourne'
description: ""
published: 2018-04-13
redirect_from: 
            - https://wittmann-tours.de/3-tage-in-melbourne/
categories: "Aardman, Australien, Australien, Claymation, Melbourne, Ned Kelly, Stadt"
hero: ./img/wp-content-uploads-2018-04-CW-20180124-095738-4825-1-1024x683.jpg
---
# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-liveable-city-by-th

Using the format messages, the call to the LLM looks the same as we've seen before for a single prompt.

In [14]:
response = llm.invoke(formatted_messages)
print(response.content)

The blog post describes a three-day visit to Melbourne, Australia, highlighting its status as a vibrant cultural hub despite being overshadowed by Sydney. The author appreciates Melbourne's colonial architecture, particularly the Flinders Street Station, and its charming arcades filled with unique shops. The visit coincided with the Australian Open, creating a lively atmosphere throughout the city. The post also explores Melbourne's street art scene, the ACMI Film Museum showcasing Aardman animations, and the Victoria State Library's exhibit on the legendary Ned Kelly. Overall, the author enjoyed their time in Melbourne, noting its rich offerings and friendly locals.


## Output Parsers

Now, let's focus on the output. To make the example for interesting, let's expand the request to the LLM to also extract the country and the publication date from the blog post.

Here are the steps we need to follow:

1. We define the response schema. A response schema is the description of a data structure we want the output to contain.
2. We combine all the response schemas into a single response schema by adding them to a list.
3. W create an output parser from the response schema.
4. We can generate the instructions for the LLM and parse the output.

In [15]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser


country_schema = ResponseSchema(name="country", description="The country of the activities described in the blog post")
date_published_schema = ResponseSchema(name="date_published", description="The date the blog post was published")
summary_schema = ResponseSchema(name="summary", description="The summary of the blog post in {summary_language}")

response_schema = [country_schema, date_published_schema, summary_schema]

output_parser = StructuredOutputParser.from_response_schemas(response_schema)

instructions = output_parser.get_format_instructions()

print(instructions)



The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"country": string  // The country of the activities described in the blog post
	"date_published": string  // The date the blog post was published
	"summary": string  // The summary of the blog post in {summary_language}
}
```


Now we can update the prompt template to include the instructions for the output parser.

In [16]:
template_prompt = """
You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into {summary_language}.

Then summarize the translated blog post into {summary_words} words.

Blog post:
```{blogpost}```

{format_instructions}
"""

prompt_template = PromptTemplate.from_template(template_prompt)

formatted_prompt = prompt_template.format(blogpost=blogpost, summary_language=summary_language, summary_words=summary_words, format_instructions=instructions)

print(formatted_prompt[:1057] + "\n\n")



You are a helpful assistant that translates and summarizes blog posts.

Translate the following blog post delimited by triple backticks into English.

Then summarize the translated blog post into 100 words.

Blog post:
```---
title: '3 Tage in Melbourne'
description: ""
published: 2018-04-13
redirect_from: 
            - https://wittmann-tours.de/3-tage-in-melbourne/
categories: "Aardman, Australien, Australien, Claymation, Melbourne, Ned Kelly, Stadt"
hero: ./img/wp-content-uploads-2018-04-CW-20180124-095738-4825-1-1024x683.jpg
---
# 3 Tage in Melbourne

Auch wenn Canberra die offizielle Hauptstadt Australiens ist, so liefern sich Melbourne und Sydney als die beiden größten Städte des Kontinents ein Wettrennen um die Wahrnehmung als geistige Kapitale des Landes. Nach relativ viel Naturprogramm besuchten wir Melbourne, „[the world's most liveable city](https://www.smh.com.au/business/the-economy/melbourne-named-worlds-most-liveable-city-by-the-economist-for-seventh-year-20170816-gxx1k

As defined as the prompt lead, the instructions on how to format the output at the end of the prompt

In [17]:
#print(formatted_prompt[len(formatted_prompt)-374:len(formatted_prompt)])
print(formatted_prompt[-374:])

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"country": string  // The country of the activities described in the blog post
	"date_published": string  // The date the blog post was published
	"summary": string  // The summary of the blog post in {summary_language}
}
```



In [18]:
response = llm.invoke(formatted_prompt)
print(response.content)

```json
{
	"country": "Australia",
	"date_published": "2018-04-13",
	"summary": "The blog post describes a 3.5-day visit to Melbourne, Australia, highlighting its status as a vibrant cultural hub. The author explores the city's colonial architecture, including the iconic Flinders Street Station, and its charming arcades filled with unique shops. The visit coincides with the Australian Open, creating a lively atmosphere. Melbourne's street art scene is celebrated, particularly in Hosier Lane, and the ACMI Film Museum showcases Aardman's work. The post also touches on the legend of Ned Kelly at the Victoria State Library. Overall, the author enjoyed their time in this dynamic city."
}
```


In [19]:
type(response.content)

str

The response from the model is still a string, so we need to parse it using the output parser to get a structured output as a dictionary.

In [20]:
output_dict = output_parser.parse(response.content)
print(output_dict.keys(), "\n")
for key in output_dict.keys():
    print(f"{key}: {output_dict[key]} \n")

dict_keys(['country', 'date_published', 'summary']) 

country: Australia 

date_published: 2018-04-13 

summary: The blog post describes a 3.5-day visit to Melbourne, Australia, highlighting its status as a vibrant cultural hub. The author explores the city's colonial architecture, including the iconic Flinders Street Station, and its charming arcades filled with unique shops. The visit coincides with the Australian Open, creating a lively atmosphere. Melbourne's street art scene is celebrated, particularly in Hosier Lane, and the ACMI Film Museum showcases Aardman's work. The post also touches on the legend of Ned Kelly at the Victoria State Library. Overall, the author enjoyed their time in this dynamic city. 



## Summary

Wrapping up, we have been how we can structure both the input of our prompt as well as the output returned by the model.

In the input side, we have seen how we can use prompt templates, and we discussed the benefits of doing so.

On the output side, we have seen how we can use output parsers to structure the output of the model.

