# Summarization - Document Summarization

https://learn.microsoft.com/en-us/azure/ai-services/language-service/summarization/overview?tabs=document-summarization?wt.mc_id=MVP_322781

The service provides summarization solutions for three types of genre: plain texts, conversations, and native documents. 

* **Text summarization** only accepts plain text blocks. 
* **Conversation summarization** accepts conversational input, including various speech audio signals. 
* **Native document summarization** accepts documents in their native formats, such as Word, PDF, or plain text

**Key features for text summarization**
Text summarization uses natural language processing techniques to generate a summary for plain texts, which can be from a document, conversation, or any texts. There are two approaches of summarization this API provides:

* **Extractive summarization**: Produces a summary by extracting salient sentences within the source text, together the positioning information of these sentences.

    - *Multiple extracted sentences*: These sentences collectively convey the main idea of the input text. They're original sentences extracted from the input text content.
    - *Rank score*: The rank score indicates how relevant a sentence is to the main topic. Text summarization ranks extracted sentences, and you can determine whether they're returned in the order they appear, or according to their rank. For example, if you request a three-sentence summary extractive summarization returns the three highest scored sentences.
    - *Positional information*: The start position and length of extracted sentences.
    - *Abstractive summarization*: Generates a summary with concise, coherent sentences or words that aren't verbatim extract sentences from the original source.

- **Summary texts**: Abstractive summarization returns a summary for each contextual input range. A long input can be segmented so multiple groups of summary texts can be returned with their contextual input range.
    - *Contextual input range*: The range within the input that was used to generate the summary text.

**Supported Documents**
* Text	.txt	An unformatted text document.
* Adobe PDF	.pdf	A portable document file formatted document.
* Microsoft Word	.docx	A Microsoft Word document file.

## Load Azure Configurations

In [1]:
import os

# Load Azure configurations from environment variables
# Ensure that AZURE_AI_LANGUAGE_KEY and AZURE_AI_LANGUAGE_ENDPOINT are set in your environment
language_key = os.environ.get('AZURE_AI_LANGUAGE_KEY')
language_endpoint = os.environ.get('AZURE_AI_LANGUAGE_ENDPOINT')

## Run the Post Request

In [5]:
source_location_sas_1 = "https://ziggystorage01.blob.core.windows.net/nasabooks/page-11.pdf?sp=r&st=2025-04-23T11:59:45Z&se=2025-04-30T19:59:45Z&spr=https&sv=2024-11-04&sr=b&sig=tPZ3S0vATW0EKnI4zK5WWcidfAxItYHQmXDAz9Knqbc%3D"
source_location_sas_2 = "https://ziggystorage01.blob.core.windows.net/nasabooks/page-13.pdf?sp=r&st=2025-04-23T12:01:34Z&se=2025-05-01T20:01:34Z&spr=https&sv=2024-11-04&sr=b&sig=9zCWzYqnS10HQDKW%2FkX8MnawVPIv46IOylLlNqYh14I%3D"

target_location_sas = "https://ziggystorage01.blob.core.windows.net/summary?sp=w&st=2025-04-21T03:05:47Z&se=2025-04-28T11:05:47Z&spr=https&sv=2024-11-04&sr=c&sig=1UEryQVoT0LfnErLARv6QR3mfBld3HJn5Cj5mt5cYiU%3D"

abstractive_data_payload = {
  "displayName": "Abstractive Summarization Example",
  "analysisInput": {
    "documents": [
      {
        "language": "en",
        "id": "Output-1",
        "source": {
          "location":source_location_sas_1
        },
        "target": {
          "location":target_location_sas 
        }
      },
      {
        "language": "en",
        "id": "Output-2",
        "source": {
          "location":source_location_sas_2
        },
        "target": {
          "location":target_location_sas 
        }
      }
    ]
  },
  "tasks": [
    {
      "kind": "AbstractiveSummarization",
      "taskName": "Summarize Document Task 1"
    }
  ]
}

extractive_data_payload = {
  "displayName": "Extractive Summarization Example",
  "analysisInput": {
    "documents": [
      {
        "language": "en",
        "id": "Output-1",
        "source": {
          "location":source_location_sas_1
        },
        "target": {
          "location":target_location_sas
        }
      },
      {
        "language": "en",
        "id": "Output-2",
        "source": {
          "location":source_location_sas_2
        },
        "target": {
          "location":target_location_sas
        }
      }
    ]
  },
  "tasks": [
    {
      "kind": "ExtractiveSummarization",
      "taskName": "Summarize Document Task 1",
      "parameters": {
        "sentenceCount": 3,
        "modelVersion": "latest"
      }
    }
  ]

}

In [6]:
import requests
import json

def summarize_document(data_payload):
    """
    Function to summarize a document using Azure AI Language service.
    """    
    # URL
    url = f"{language_endpoint}/language/analyze-documents/jobs?api-version=2024-11-15-preview"

    # Set the headers
    headers = {
        "Content-Type": "application/json",
        "Ocp-Apim-Subscription-Key": language_key
    }

    # Make the POST request
    response = requests.post(url, headers=headers, data=json.dumps(data_payload))

    # Extract the operation-location from the response headers
    return response.headers.get('operation-location')

In [7]:
abstractive_operation_location = summarize_document(abstractive_data_payload)
print(f"Abstractive operation location: {abstractive_operation_location}")

extractive_operation_location = summarize_document(extractive_data_payload)
print(f"Extractive operation location: {extractive_operation_location}")

Abstractive operation location: https://ziggylanguagedemocomplete.cognitiveservices.azure.com/language/analyze-documents/jobs/fde3a726-b255-420a-8385-d57a7b9e3893?api-version=2024-11-15-preview
Extractive operation location: https://ziggylanguagedemocomplete.cognitiveservices.azure.com/language/analyze-documents/jobs/e5724eae-9707-4bdb-bcd1-9894c83aa3a6?api-version=2024-11-15-preview


## Get Results

In [8]:
def fetch_operation_result(operation_location):
    """
    Fetch the operation result from the operation location URL.
    """
    headers = {
        "Ocp-Apim-Subscription-Key": language_key
    }
    response = requests.get(operation_location, headers=headers)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Failed to fetch operation result. Status code: {response.status_code}, Response: {response.text}")
        return None

In [13]:
def print_summary_document(json_response):
    """
    Function to print the summarized document content from the provided JSON response.
    """
    try:
        # Iterate over all documents in the results
        documents = json_response['tasks']['items'][0]['results']['documents']
        for document in documents:
            # Extract the document URL from the JSON response
            document_url = document['targets'][0]['location']
            print(f"Fetching summarized document from: {document_url}")
            
            # Make a GET request to fetch the document
            response = requests.get(document_url)
            
            if response.status_code == 200:
                # Parse and pretty-print the JSON content
                document_content = response.json()
                print("Summarized Document Content:")
                print(json.dumps(document_content, indent=4))
            else:
                print(f"Failed to fetch document. Status code: {response.status_code}, Response: {response.text}")
    except KeyError as e:
        print(f"Error extracting document URL: {e}")
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON content: {e}")

In [14]:
operation_result = fetch_operation_result(abstractive_operation_location)
print_summary_document(operation_result)


Fetching summarized document from: https://ziggystorage01.blob.core.windows.net/summary/fde3a726-b255-420a-8385-d57a7b9e3893/AbstractiveSummarization/0001/page-11.json
Summarized Document Content:
{
    "summaries": [
        {
            "text": "The document presents an image captured by the Aqua satellite over the Amazon rainforest in Brazil and Bolivia, illustrating how wind patterns can be inferred from cloud formations. As the sun heats the forest, water vapor is lifted, forming cumulus clouds when humid air encounters cooler air above. These formations, known as cloud streets, align with wind direction, generally appearing as straight lines but can also follow the patterns of high-pressure systems, as seen in the image. The visualization of wind through cloud patterns provides a unique insight into atmospheric movements that are otherwise invisible to the naked eye. This image serves as a proxy to understand the dynamics of wind in the region, highlighting the interaction betwe

In [15]:
operation_result = fetch_operation_result(extractive_operation_location)
print_summary_document(operation_result)

Fetching summarized document from: https://ziggystorage01.blob.core.windows.net/summary/e5724eae-9707-4bdb-bcd1-9894c83aa3a6/ExtractiveSummarization/0001/page-11.json
Summarized Document Content:
{
    "id": "Output-1",
    "statistics": {
        "charactersCount": 888,
        "transactionsCount": 1
    },
    "sentences": [
        {
            "text": "Curving Cloud Streets Brazil and Bolivia",
            "rankScore": 0.87,
            "offset": 0,
            "length": 40
        },
        {
            "text": "Acquired in June 2014 by the Aqua satellite, this image shows a broad swath of the Amazon rainforest in Brazil and Bolivia as it appeared in the early afternoon.",
            "rankScore": 0.92,
            "offset": 239,
            "length": 161
        },
        {
            "text": "Cumulus cloud streets often trace the direction, and sometimes the intensity, of winds\u2014lining up parallel to the direction of the wind.",
            "rankScore": 1.0,
           