### Improving Our News Articles Summarizer

Here, we will improve the previously developed “News Article Summarizer” script. The goal is to improve its accuracy in extracting and presenting key information from long news articles in a bulleted list format.

To achieve this, we will adapt our current summarizer to prompt the underlying language model to produce summaries as bulleted lists using output parsers. This requires specific adjustments to the framing of our prompts.

Here’s a recap of what we did and what we are going to do in this project:

![image](./pipeline-news-articles-summarizer.jpg)

*Pipeline for our news articles summarizer with scraping, parsing, prompting, and generation.*

To improve the article summarizer, we employ few-shot learning to show the model how the output should be structured in advance, helping it adapt to the desired format. We also pass the output generated by the model through output parsers to ensure that the structure of the output adheres to the desired format.

This entire process, leveraging **few-shot learning** for accuracy and output parsers for format control, ensures a high-quality, structured summarization of the news articles.

The initial phases of this process are technically identical to [part 1](./build-a-news-articles-summarizer.ipynb).


In [None]:
import os
from langchain_custom_utils.helper import get_openai_api_key, print_response
OPENAI_API_KEY = get_openai_api_key()

## Extract & parse news articles

In [None]:
import requests
from newspaper import Article

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_url = "https://www.artificialintelligence-news.com/2022/01/25/meta-claims-new-ai-supercomputer-will-set-records/"

session = requests.Session()


try:
    response = session.get(article_url, headers=headers, timeout=10)

    if response.status_code == 200:
        article = Article(article_url)
        article.download()
        article.parse()

        print(f"Title: {article.title}")
        print(f"Text: {article.text}")
    else:
        print(f"Failed to fetch article at {article_url}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_url}: {e}")

Now, we will incorporate examples into a prompt using the `FewShotPromptTemplate approach`. When applied, it will guide the model in producing a bullet list that briefly summarizes the content of the provided article.

In [None]:
from langchain.schema import (
    HumanMessage
)

# we get the article data from the scraping part
article_title = article.title
article_text = article.text

# prepare template for prompt
template = """
As an advanced AI, you've been tasked to summarize online articles into bulleted points. Here are a few examples of how you've done this in the past:

Example 1:
Original Article: 'The Effects of Climate Change
Summary:
- Climate change is causing a rise in global temperatures.
- This leads to melting ice caps and rising sea levels.
- Resulting in more frequent and severe weather conditions.

Example 2:
Original Article: 'The Evolution of Artificial Intelligence
Summary:
- Artificial Intelligence (AI) has developed significantly over the past decade.
- AI is now used in multiple fields such as healthcare, finance, and transportation.
- The future of AI is promising but requires careful regulation.

Now, here's the article you need to summarize:

==================
Title: {article_title}

{article_text}
==================

Please provide a summarized version of the article in a bulleted list format.
"""

# format prompt
prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]

These examples give the model a better understanding of the expected response. Here, we need a couple of essential components:

-   **Article**: Collecting the title and text of the article. These elements serve as the primary inputs for the model.
-   **Template**: Crafting a detailed template for the prompt. This template adopts a few-shot learning approach, providing the model with examples of articles summarized into bullet lists. Additionally, it contains placeholders for the actual article title and text, which will be summarized. Subsequently, these placeholders (`{article_title}` and `{article_text}`) are replaced with the real title and text of the article using the `.format()` method.

The next step involves employing the `ChatOpenAI` class to load the GPT-4 model, which creates the summary. The prompt is then fed to the language model as input. The `ChatOpenAI` class’s instance receives a `HumanMessage` list as its input argument, facilitating the generation of the desired output.

The examples here involve several key components that enhance the model’s response accuracy:


In [None]:
from langchain.chat_models import ChatOpenAI

# load the model
chat = ChatOpenAI(model_name="gpt-4-turbo", temperature=0)

In [None]:
# generate summary
summary = chat(messages)
print(summary.content)

The key objective of this strategy is to incorporate a few-shot learning style within the prompt. This technique provides the model with examples that demonstrate the ideal task execution. It is possible to adapt the model’s output to meet various objectives and adhere to a specific format, tone, and style criteria by changing the prompt and examples.