## Prequisites

To run this notebook, you will need acesss to:
- An Azure OpenAI resource with access to the Phi-4 model.
- A local machine capable of running transformer models (with sufficient CPU/GPU and memory).

In your working directory, create a `.env` file (if it doesn't already exist) and add the following environment variables:
```bash
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
OPENAI_API_VERSION=2024-02-15
```

In [None]:
import pathlib

pathlib.Path(".env").touch()

# Install web scraping dependencies
%pip install beautifulsoup4 lxml markdownify

%pip install seaborn matplotlib

# Install AI dependencies
%pip install "transformers[torch]" "pydantic-ai-slim[openai]" "python-dotenv" "openai" "chonkie"

# Install Evaluation dependencies
%pip install "torchmetrics"

In [None]:
import os

import dotenv

dotenv.load_dotenv(dotenv_path=".env")

## Setup Models

In this PoC, we will be using two models for summarisation:
1. Microsoft Phi-4 as a smaller LLM via Azure AI Foundry
2. Facebook BART-large-CNN as a local transformer model

In [None]:
from dataclasses import dataclass

import openai
import pydantic_ai
import pydantic_ai.direct
import pydantic_ai.models.openai
import pydantic_ai.providers.azure

openai_client = openai.AsyncOpenAI(
    base_url=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)

phi_4 = pydantic_ai.models.openai.OpenAIChatModel(
    model_name="Phi-4",
    provider=pydantic_ai.providers.azure.AzureProvider(
        openai_client=openai_client
    ),
)

system_prompt = """You are a professional summarizer. Create a concise and comprehensive summary of the provided text.

Guidelines:
- Create a summary that is detailed, thorough, in-depth, and complex, while maintaining clarity and conciseness
- Cover all key points and main ideas presented in the original text
- Condense the information into an easy-to-understand format
- Include relevant details and examples that support the main ideas
- Only use information present in the original text
- Rely strictly on the provided text, without including external information
- Ensure the summary length is appropriate for the complexity of the original text
- Organize the summary clearly with well-structured paragraphs
- Write in a direct, factual style without conversational language
- Do not use meta-references like "the article", "the text", "this document", or "the author"
- Begin directly with the subject matter itself"""

@dataclass
class TokenUsage:
    input_tokens: int
    completion_tokens: int
    total_tokens: int

@dataclass
class SummaryResult:
    text: str
    usage: TokenUsage | None

async def summarise_with_phi_4(text: str) -> SummaryResult:
    response = await pydantic_ai.direct.model_request(
        model=phi_4,
        messages=[
            pydantic_ai.messages.ModelRequest(parts=[
                pydantic_ai.messages.SystemPromptPart(system_prompt),
                pydantic_ai.messages.UserPromptPart(text)
            ])
        ]
    )

    usage = None
    if response.usage:
        usage = TokenUsage(
            input_tokens=response.usage.input_tokens,
            completion_tokens=response.usage.output_tokens,
            total_tokens=response.usage.total_tokens
        )

    return SummaryResult(text=response.text, usage=usage)

This uses the HuggingFace `transformers` library to load the BART model and tokeniser.

> [!IMPORTANT]
> If you cannot access the HuggingFace model hub, you will need to download the model manually and load it from a local path. `notebooks/models/` is a good place to store it.

In [None]:
from transformers import pipeline

bart_pipeline = pipeline(
    task="summarization",
    model="facebook/bart-large-cnn",
)

def summarise_with_bart(text: str) -> str:
    summary = bart_pipeline(
        text,
        do_sample=False,
    )
    return summary[0]['summary_text']

In [None]:
basic_article = """Nuclear fusion is a process in which two or more atomic nuclei combine to form a single, larger nucleus, accompanied by the release or absorption of energy. This reaction occurs due to the difference in nuclear binding energy between the reactants and the products. The mass difference between the nuclei before and after the reaction is converted into energy, as described by Einstein's equation, (E=mc^2). Fusion is the primary energy source for stars, including the Sun, where hydrogen nuclei fuse to form helium through a series of reactions.

For nuclear fusion to occur, extremely high temperatures, pressures, and confinement times are required to overcome the electrostatic repulsion between positively charged nuclei. These conditions are naturally found in stellar cores and are replicated in advanced nuclear weapons and experimental fusion reactors. The "triple product" of temperature, density, and confinement time is a critical parameter for achieving sustained fusion.

Fusion reactions involving light nuclei, such as deuterium and tritium, are generally exothermic, meaning they release energy. This is because lighter nuclei have a steep positive gradient in the nuclear binding energy curve up to iron and nickel. In contrast, nuclear fission, which involves splitting heavy nuclei, is most energetic for elements like uranium.

Applications of nuclear fusion include the development of fusion power as a clean energy source, thermonuclear weapons, neutron sources, and the production of superheavy elements. While fusion offers immense potential as a sustainable energy solution, challenges such as achieving and maintaining the required conditions for ignition and energy gain remain significant."""

In [None]:
bart_summary = summarise_with_bart(basic_article)

In [None]:
print(bart_summary)

In [None]:
phi_4_result = await summarise_with_phi_4(basic_article)

In [None]:
print(phi_4_result.text)

We are using the `torchmetrics` library to gain access to numerous evaluation metrics. In this case, we will be using ROUGE and BERTScore to evaluate the quality of the summaries generated by both models.

> [!IMPORTANT]
> `torchmetrics` also has a dependency on the HuggingFace `transformers` library to load pre-trained models for BERTScore calculation. Like earlier, if you cannot access the HuggingFace model hub, you will need to download the required models manually and load them from a local path. `notebooks/models/` is a good place to store them.

In [None]:
from torchmetrics.text.rouge import ROUGEScore
from torchmetrics.text.bert import BERTScore

# Initialize ROUGE scorer
rouge_scorer = ROUGEScore()

# Initialize BERTScore with local DeBERTa model
bert_scorer = BERTScore(model_name_or_path="microsoft/deberta-xlarge-mnli")

In [None]:
import pandas as pd

bart_rouge = rouge_scorer(preds=bart_summary, target=basic_article)
bart_bert = bert_scorer(preds=bart_summary, target=basic_article)

phi_4_rouge = rouge_scorer(preds=phi_4_result.text, target=basic_article)
phi_4_bert = bert_scorer(preds=phi_4_result.text, target=basic_article)

simple_article_metrics = pd.DataFrame([
    {
        'article': 'simple_article',
        'model': 'BART-large-CNN',
        'rougeL_fmeasure': bart_rouge['rougeL_fmeasure'].item(),
        'rougeL_precision': bart_rouge['rougeL_precision'].item(),
        'rougeL_recall': bart_rouge['rougeL_recall'].item(),
        'bert_f1': bart_bert['f1'].item(),
        'bert_precision': bart_bert['precision'].item(),
        'bert_recall': bart_bert['recall'].item(),
    },
    {
        'article': 'simple_article',
        'model': 'Phi-4',
        'rougeL_fmeasure': phi_4_rouge['rougeL_fmeasure'].item(),
        'rougeL_precision': phi_4_rouge['rougeL_precision'].item(),
        'rougeL_recall': phi_4_rouge['rougeL_recall'].item(),
        'bert_f1': phi_4_bert['f1'].item(),
        'bert_precision': phi_4_bert['precision'].item(),
        'bert_recall': phi_4_bert['recall'].item(),
    }
])[['article', 'model', 'rougeL_fmeasure', 'rougeL_precision', 'rougeL_recall', 'bert_f1', 'bert_precision', 'bert_recall']]

simple_article_metrics

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
sns.set_theme(style="whitegrid")

# Prepare data for visualization
metrics_melted = simple_article_metrics.melt(
    id_vars=['article', 'model'],
    var_name='metric',
    value_name='score'
)

# Separate ROUGE and BERT metrics
rouge_metrics = metrics_melted[metrics_melted['metric'].str.startswith('rougeL')]
bert_metrics = metrics_melted[metrics_melted['metric'].str.startswith('bert')]

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROUGE-L metrics
sns.barplot(data=rouge_metrics, x='metric', y='score', hue='model', ax=axes[0], palette='viridis')
axes[0].set_title('ROUGE-L Metrics - Simple Article', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Metric', fontsize=12)
axes[0].set_ylabel('Score', fontsize=12)
axes[0].set_ylim(0, 1)
axes[0].tick_params(axis='x', rotation=45)
axes[0].legend(title='Model')

# BERTScore metrics
sns.barplot(data=bert_metrics, x='metric', y='score', hue='model', ax=axes[1], palette='magma')
axes[1].set_title('BERTScore Metrics - Simple Article', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Metric', fontsize=12)
axes[1].set_ylabel('Score', fontsize=12)
axes[1].set_ylim(0, 1)
axes[1].tick_params(axis='x', rotation=45)
axes[1].legend(title='Model')

plt.tight_layout()
plt.show()

In [None]:
import re
import requests

import bs4
import markdownify

ai_oap_url = "https://www.gov.uk/government/publications/ai-opportunities-action-plan/ai-opportunities-action-plan"

article = requests.get(ai_oap_url)

html_content = bs4.BeautifulSoup(article.content, 'html.parser')

playbook_content = html_content.find("div", id="contents")

article_to_summarise = markdownify.markdownify(str(playbook_content), heading_style="ATX")

sections = re.split(r'(?m)^#{1,6} ', article_to_summarise, flags=re.MULTILINE)

In [None]:
from chonkie import RecursiveChunker

# Initialize the recursive chunker with BART-friendly token limits
# BART-large-CNN typically handles up to 1024 tokens
chunker = RecursiveChunker(
    tokenizer="gpt2",
    chunk_size=1024,
    min_characters_per_chunk=24,
)

# Chunk the entire document recursively
chunks = chunker.chunk(article_to_summarise)

print(f"Total chunks: {len(chunks)}")

In [None]:
# Summarize each chunk with BART
bart_chunk_summaries = []

for chunk_idx, chunk in enumerate(chunks):
    summary = summarise_with_bart(chunk.text)
    bart_chunk_summaries.append({
        'chunk_index': chunk_idx,
        'summary': summary
    })

print(f"Total chunks processed: {len(bart_chunk_summaries)}")

# Map-Reduce: Iteratively reduce summaries until we have a final summary
current_summaries = [item['summary'] for item in bart_chunk_summaries]
iteration = 1

while len(current_summaries) > 1:
    print(f"\nIteration {iteration}: Reducing {len(current_summaries)} summaries")
    
    # Combine all summaries into one text
    combined_text = "\n\n".join(current_summaries)
    
    # Chunk the combined summaries
    reduction_chunks = chunker.chunk(combined_text)
    print(f"Created {len(reduction_chunks)} chunks from combined summaries")
    
    # Summarize each chunk
    reduced_summaries = []
    for chunk in reduction_chunks:
        summary = summarise_with_bart(chunk.text)
        reduced_summaries.append(summary)
    
    current_summaries = reduced_summaries
    iteration += 1

# Final summary
bart_final_summary = current_summaries[0]

In [None]:
print(bart_final_summary)

In [None]:
# Summarize the full AI Action Plan document with Phi-4
phi_4_ai_oap_result = await summarise_with_phi_4(article_to_summarise)

print(phi_4_ai_oap_result.text)

In [None]:
print(phi_4_ai_oap_result.text)

In [None]:
import statistics

# Compare summaries using ROUGE-L and BERTScore metrics
# 1. Phi-4 output vs original article
phi_vs_original_rouge = rouge_scorer(preds=phi_4_ai_oap_result.text, target=article_to_summarise)

# Calculate BERTScore by chunking (to avoid memory issues)
phi_bert_scores = {'precision': [], 'recall': [], 'f1': []}
for chunk in chunks:
    chunk_bert = bert_scorer(preds=phi_4_ai_oap_result.text, target=chunk.text)
    phi_bert_scores['precision'].append(chunk_bert['precision'].item())
    phi_bert_scores['recall'].append(chunk_bert['recall'].item())
    phi_bert_scores['f1'].append(chunk_bert['f1'].item())

phi_vs_original_bert = {
    'precision': statistics.mean(phi_bert_scores['precision']),
    'recall': statistics.mean(phi_bert_scores['recall']),
    'f1': statistics.mean(phi_bert_scores['f1'])
}

# 2. BART output vs original article
bart_vs_original_rouge = rouge_scorer(preds=bart_final_summary, target=article_to_summarise)

# Calculate BERTScore by chunking (to avoid memory issues)
bart_bert_scores = {'precision': [], 'recall': [], 'f1': []}
for chunk in chunks:
    chunk_bert = bert_scorer(preds=bart_final_summary, target=chunk.text)
    bart_bert_scores['precision'].append(chunk_bert['precision'].item())
    bart_bert_scores['recall'].append(chunk_bert['recall'].item())
    bart_bert_scores['f1'].append(chunk_bert['f1'].item())

bart_vs_original_bert = {
    'precision': statistics.mean(bart_bert_scores['precision']),
    'recall': statistics.mean(bart_bert_scores['recall']),
    'f1': statistics.mean(bart_bert_scores['f1'])
}

# 3. BART output vs Phi-4 output
bart_vs_phi_rouge = rouge_scorer(preds=bart_final_summary, target=phi_4_ai_oap_result.text)
bart_vs_phi_bert = bert_scorer(preds=bart_final_summary, target=phi_4_ai_oap_result.text)

ai_oap_metrics = pd.DataFrame([
    {
        'comparison': 'Phi-4 vs Original',
        'rougeL_fmeasure': phi_vs_original_rouge['rougeL_fmeasure'].item(),
        'rougeL_precision': phi_vs_original_rouge['rougeL_precision'].item(),
        'rougeL_recall': phi_vs_original_rouge['rougeL_recall'].item(),
        'bert_f1': phi_vs_original_bert['f1'],
        'bert_precision': phi_vs_original_bert['precision'],
        'bert_recall': phi_vs_original_bert['recall'],
    },
    {
        'comparison': 'BART vs Original',
        'rougeL_fmeasure': bart_vs_original_rouge['rougeL_fmeasure'].item(),
        'rougeL_precision': bart_vs_original_rouge['rougeL_precision'].item(),
        'rougeL_recall': bart_vs_original_rouge['rougeL_recall'].item(),
        'bert_f1': bart_vs_original_bert['f1'],
        'bert_precision': bart_vs_original_bert['precision'],
        'bert_recall': bart_vs_original_bert['recall'],
    },
    {
        'comparison': 'BART vs Phi-4',
        'rougeL_fmeasure': bart_vs_phi_rouge['rougeL_fmeasure'].item(),
        'rougeL_precision': bart_vs_phi_rouge['rougeL_precision'].item(),
        'rougeL_recall': bart_vs_phi_rouge['rougeL_recall'].item(),
        'bert_f1': bart_vs_phi_bert['f1'].item(),
        'bert_precision': bart_vs_phi_bert['precision'].item(),
        'bert_recall': bart_vs_phi_bert['recall'].item(),
    }
])[['comparison', 'rougeL_fmeasure', 'rougeL_precision', 'rougeL_recall', 'bert_f1', 'bert_precision', 'bert_recall']]

ai_oap_metrics

In [None]:
# Prepare data for visualization
ai_metrics_melted = ai_oap_metrics.melt(
    id_vars=['comparison'],
    var_name='metric',
    value_name='score'
)

# Separate ROUGE and BERT metrics
ai_rouge_metrics = ai_metrics_melted[ai_metrics_melted['metric'].str.startswith('rougeL')]
ai_bert_metrics = ai_metrics_melted[ai_metrics_melted['metric'].str.startswith('bert')]

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# ROUGE-L metrics
sns.barplot(data=ai_rouge_metrics, x='comparison', y='score', hue='metric', ax=axes[0], palette='viridis')
axes[0].set_title('ROUGE-L Metrics - AI Action Plan Comparisons', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Comparison', fontsize=12)
axes[0].set_ylabel('Score', fontsize=12)
axes[0].set_ylim(0, 1)
axes[0].tick_params(axis='x', rotation=15)
axes[0].legend(title='Metric', bbox_to_anchor=(1.05, 1), loc='upper left')

# BERTScore metrics
sns.barplot(data=ai_bert_metrics, x='comparison', y='score', hue='metric', ax=axes[1], palette='magma')
axes[1].set_title('BERTScore Metrics - AI Action Plan Comparisons', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Comparison', fontsize=12)
axes[1].set_ylabel('Score', fontsize=12)
axes[1].set_ylim(0, 1)
axes[1].tick_params(axis='x', rotation=15)
axes[1].legend(title='Metric', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()