# Azure Language - Key Phrase Extraction

This notebook demonstrates how to use the Azure Language Analyze Text API to
extract key phrases from an essay about exercise and nutrition.

## Prerequisites

Set the following environment variables:
- `FOUNDRY_ENDPOINT` - Azure Language service endpoint
- `FOUNDRY_API_KEY` - API key for authentication

In [None]:
import json
import os

import requests
from rich.columns import Columns
from rich.console import Console
from rich.panel import Panel
from rich.table import Table

## Load Sample Data

Load the essay on exercise and nutrition from the sample data file.

In [None]:
with open(
    "04-KeyPhraseExtraction-data/exercise_nutrition_essay.txt", encoding="utf-8"
) as f:
    essay_text = f.read()

# Count words
word_count = len(essay_text.split())
print(f"Loaded essay: {len(essay_text)} characters, {word_count} words")
print(f"\nFirst 300 characters:\n{essay_text[:300]}...")

## API Setup

Configure the Azure Language API endpoint and authentication.

In [None]:
endpoint = os.environ["FOUNDRY_ENDPOINT"]
api_key = os.environ["FOUNDRY_API_KEY"]

# Construct the Analyze Text API URL
api_url = f"{endpoint}/language/:analyze-text?api-version=2024-11-01"

headers = {
    "Ocp-Apim-Subscription-Key": api_key,
    "Content-Type": "application/json",
}

print(f"API Endpoint: {endpoint}")
print("API Key: [CONFIGURED]")

## Make API Call

Send the essay to the Key Phrase Extraction API.

In [None]:
# Prepare the document for the API
documents = [{"id": "1", "language": "en", "text": essay_text}]

# Build the request payload
payload = {
    "kind": "KeyPhraseExtraction",
    "parameters": {"modelVersion": "latest"},
    "analysisInput": {"documents": documents},
}

# Make the API request
response = requests.post(api_url, headers=headers, json=payload, timeout=30)
response.raise_for_status()
result = response.json()

print("API call successful!")

## Raw API Results

Display the complete JSON response from the API.

In [None]:
print(json.dumps(result, indent=2))

## Human-Friendly Results

Present the extracted key phrases in a readable format.

In [None]:
console = Console()

# Extract key phrases from the result
key_phrases = result["results"]["documents"][0]["keyPhrases"]

console.print(
    Panel(
        f"[bold green]{len(key_phrases)}[/bold green] key phrases extracted",
        title="Key Phrase Extraction Results",
    )
)

# Create a table with key phrases
table = Table(title="Extracted Key Phrases")
table.add_column("#", style="cyan", justify="right", width=4)
table.add_column("Key Phrase", style="green")
table.add_column("Word Count", justify="center", style="dim")

for i, phrase in enumerate(key_phrases, 1):
    word_ct = len(phrase.split())
    table.add_row(str(i), phrase, str(word_ct))

console.print(table)

# Categorize key phrases by theme
exercise_terms = [
    p
    for p in key_phrases
    if any(
        w in p.lower()
        for w in [
            "exercise",
            "physical",
            "cardio",
            "training",
            "workout",
            "muscle",
            "fitness",
        ]
    )
]
nutrition_terms = [
    p
    for p in key_phrases
    if any(
        w in p.lower()
        for w in [
            "nutrition",
            "diet",
            "food",
            "vitamin",
            "protein",
            "carb",
            "mineral",
            "nutrient",
        ]
    )
]
health_terms = [
    p
    for p in key_phrases
    if any(
        w in p.lower()
        for w in [
            "health",
            "mental",
            "cognitive",
            "disease",
            "wellness",
            "sleep",
            "body",
        ]
    )
]

# Display categorized phrases
categories = [
    ("Exercise & Fitness", exercise_terms, "cyan"),
    ("Nutrition & Diet", nutrition_terms, "green"),
    ("Health & Wellness", health_terms, "magenta"),
]

console.print("\n")
panels = []
for title, terms, color in categories:
    if terms:
        content = "\n".join(f"• {t}" for t in terms[:8])
        if len(terms) > 8:
            content += f"\n... and {len(terms) - 8} more"
        panels.append(
            Panel(content, title=f"[{color}]{title}[/{color}]", border_style=color)
        )

if panels:
    console.print(Columns(panels, equal=True))

# Summary
summary = f"""
[bold]Document Analysis Summary:[/bold]

• Total Key Phrases: [cyan]{len(key_phrases)}[/cyan]
• Document Length: [cyan]{word_count}[/cyan] words
• Key Phrase Density: [cyan]{len(key_phrases) / word_count * 100:.1f}%[/cyan]

[dim]Key phrases represent the main topics and concepts discussed in the text.[/dim]
"""

console.print(Panel(summary, title="Summary"))