# Summarization - Document Summarization

https://learn.microsoft.com/en-us/azure/ai-services/language-service/summarization/overview?tabs=document-summarization?wt.mc_id=MVP_322781

The service provides summarization solutions for three types of genre: plain texts, conversations, and native documents. 

* **Text summarization** only accepts plain text blocks. 
* **Conversation summarization** accepts conversational input, including various speech audio signals. 
* **Native document summarization** accepts documents in their native formats, such as Word, PDF, or plain text

**Key features for text summarization**
Text summarization uses natural language processing techniques to generate a summary for plain texts, which can be from a document, conversation, or any texts. There are two approaches of summarization this API provides:

* **Extractive summarization**: Produces a summary by extracting salient sentences within the source text, together the positioning information of these sentences.

    - *Multiple extracted sentences*: These sentences collectively convey the main idea of the input text. They're original sentences extracted from the input text content.
    - *Rank score*: The rank score indicates how relevant a sentence is to the main topic. Text summarization ranks extracted sentences, and you can determine whether they're returned in the order they appear, or according to their rank. For example, if you request a three-sentence summary extractive summarization returns the three highest scored sentences.
    - *Positional information*: The start position and length of extracted sentences.
    - *Abstractive summarization*: Generates a summary with concise, coherent sentences or words that aren't verbatim extract sentences from the original source.

- **Summary texts**: Abstractive summarization returns a summary for each contextual input range. A long input can be segmented so multiple groups of summary texts can be returned with their contextual input range.
    - *Contextual input range*: The range within the input that was used to generate the summary text.

**Supported Documents**
* Text	.txt	An unformatted text document.
* Adobe PDF	.pdf	A portable document file formatted document.
* Microsoft Word	.docx	A Microsoft Word document file.

## Load Azure Configurations

In [9]:
import os

# Load Azure configurations from environment variables
# Ensure that AZURE_AI_LANGUAGE_KEY and AZURE_AI_LANGUAGE_ENDPOINT are set in your environment
language_key = os.environ.get('AZURE_AI_LANGUAGE_KEY')
language_endpoint = os.environ.get('AZURE_AI_LANGUAGE_ENDPOINT')

## Run the Post Request

In [22]:
source_location_sas = "https://ziggystorage01.blob.core.windows.net/products/product_info_1.pdf?sp=r&st=2025-04-21T03:05:01Z&se=2025-04-28T11:05:01Z&spr=https&sv=2024-11-04&sr=b&sig=PFo3oL%2B6Tod%2FGcNkASIdnVT%2Fb8Y7U%2B6M5ocExkJd%2FRg%3D"
target_location_sas = "https://ziggystorage01.blob.core.windows.net/summary?sp=w&st=2025-04-21T03:05:47Z&se=2025-04-28T11:05:47Z&spr=https&sv=2024-11-04&sr=c&sig=1UEryQVoT0LfnErLARv6QR3mfBld3HJn5Cj5mt5cYiU%3D"

abstractive_data_payload = {
  "displayName": "Abstractive Summarization Example",
  "analysisInput": {
    "documents": [
      {
        "language": "en",
        "id": "Output-1",
        "source": {
          "location":source_location_sas
        },
        "target": {
          "location":target_location_sas 
        }
      }
    ]
  },
  "tasks": [
    {
      "kind": "AbstractiveSummarization",
      "taskName": "Summarize Document Task 1"
    }
  ]
}

extractive_data_payload = {
  "displayName": "Extractive Summarization Example",
  "analysisInput": {
    "documents": [
      {
        "language": "en",
        "id": "Output-1",
        "source": {
          "location":source_location_sas
        },
        "target": {
          "location":target_location_sas
        }
      }
    ]
  },
  "tasks": [
    {
      "kind": "ExtractiveSummarization",
      "taskName": "Summarize Document Task 1",
      "parameters": {
        "sentenceCount": 6,
        "modelVersion": "latest"
      }
    }
  ]

}

In [23]:
import requests
import json

def summarize_document(data_payload):
    """
    Function to summarize a document using Azure AI Language service.
    """    
    # URL
    url = f"{language_endpoint}/language/analyze-documents/jobs?api-version=2024-11-15-preview"

    # Set the headers
    headers = {
        "Content-Type": "application/json",
        "Ocp-Apim-Subscription-Key": language_key
    }

    # Make the POST request
    response = requests.post(url, headers=headers, data=json.dumps(data_payload))

    # Extract the operation-location from the response headers
    return response.headers.get('operation-location')

In [24]:
abstractive_operation_location = summarize_document(abstractive_data_payload)
print(f"Abstractive operation location: {abstractive_operation_location}")

extractive_operation_location = summarize_document(extractive_data_payload)
print(f"Extractive operation location: {extractive_operation_location}")

Abstractive operation location: https://ziggylanguagedemocomplete.cognitiveservices.azure.com/language/analyze-documents/jobs/9e34c4a1-5aa2-41b4-afbc-504371c65331?api-version=2024-11-15-preview
Extractive operation location: https://ziggylanguagedemocomplete.cognitiveservices.azure.com/language/analyze-documents/jobs/9532e2d8-59e3-47e3-ba73-fe70ef05302d?api-version=2024-11-15-preview


## Get Results

In [30]:
def fetch_operation_result(operation_location):
    """
    Fetch the operation result from the operation location URL.
    """
    headers = {
        "Ocp-Apim-Subscription-Key": language_key
    }
    response = requests.get(operation_location, headers=headers)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Failed to fetch operation result. Status code: {response.status_code}, Response: {response.text}")
        return None

In [33]:
import requests
import json

def print_summary_document(json_response):
    """
    Function to print the summarized document content from the provided JSON response.
    """
    try:
        # Extract the document URL from the JSON response
        document_url = json_response['tasks']['items'][0]['results']['documents'][0]['targets'][0]['location']
        print(f"Fetching summarized document from: {document_url}")
        
        # Make a GET request to fetch the document
        response = requests.get(document_url)
        
        if response.status_code == 200:
            # Parse and pretty-print the JSON content
            document_content = response.json()
            print("Summarized Document Content:")
            print(json.dumps(document_content, indent=4))
        else:
            print(f"Failed to fetch document. Status code: {response.status_code}, Response: {response.text}")
    except KeyError as e:
        print(f"Error extracting document URL: {e}")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON content: {e}")

In [34]:
operation_result = fetch_operation_result(abstractive_operation_location)
print_summary_document(operation_result)


Fetching summarized document from: https://ziggystorage01.blob.core.windows.net/summary/9e34c4a1-5aa2-41b4-afbc-504371c65331/AbstractiveSummarization/0001/product_info_1.json
Summarized Document Content:
{
    "summaries": [
        {
            "text": "The TrailMaster X4 Tent is a 3-season camping tent featuring polyester material, designed to accommodate 4 people with a floor area of 80 square feet. It boasts a freestanding design, waterproof construction, mesh ventilation, and reflective guy lines for improved visibility at night. The tent includes a rainfly rated for 2000mm of water protection, aluminum tent poles, and a carry bag for convenient transport. Users are advised to select suitable locations for pitching, avoid sharp objects near the tent, and adhere to regular maintenance guidels to ensure longevity, with a 2-year limited warranty covering manufacturing defects. There is a return policy in place for unused items within specified time frames, and customer support is av

In [35]:
operation_result = fetch_operation_result(extractive_operation_location)
print_summary_document(operation_result)

Fetching summarized document from: https://ziggystorage01.blob.core.windows.net/summary/9532e2d8-59e3-47e3-ba73-fe70ef05302d/ExtractiveSummarization/0001/product_info_1.json
Summarized Document Content:
{
    "id": "Output-1",
    "statistics": {
        "charactersCount": 10994,
        "transactionsCount": 11
    },
    "sentences": [
        {
            "text": "- Lay out all the tent components on the ground.",
            "rankScore": 0.94,
            "offset": 2147,
            "length": 48
        },
        {
            "text": "- Familiarize yourself with each part, including the tent body, poles, rainfly, stakes, and guy lines.",
            "rankScore": 0.95,
            "offset": 2196,
            "length": 102
        },
        {
            "text": "- Secure the tent body to the ground using stakes and guy lines as needed.",
            "rankScore": 0.96,
            "offset": 2679,
            "length": 74
        },
        {
            "text": "- If your tent inc