# Summarization - Document Summarization

https://learn.microsoft.com/en-us/azure/ai-services/language-service/summarization/overview?tabs=document-summarization?wt.mc_id=MVP_322781

The service provides summarization solutions for three types of genre: plain texts, conversations, and native documents. 

* **Text summarization** only accepts plain text blocks. 
* **Conversation summarization** accepts conversational input, including various speech audio signals. 
* **Native document summarization** accepts documents in their native formats, such as Word, PDF, or plain text

**Key features for text summarization**
Text summarization uses natural language processing techniques to generate a summary for plain texts, which can be from a document, conversation, or any texts. There are two approaches of summarization this API provides:

* **Extractive summarization**: Produces a summary by extracting salient sentences within the source text, together the positioning information of these sentences.

    - *Multiple extracted sentences*: These sentences collectively convey the main idea of the input text. They're original sentences extracted from the input text content.
    - *Rank score*: The rank score indicates how relevant a sentence is to the main topic. Text summarization ranks extracted sentences, and you can determine whether they're returned in the order they appear, or according to their rank. For example, if you request a three-sentence summary extractive summarization returns the three highest scored sentences.
    - *Positional information*: The start position and length of extracted sentences.
    - *Abstractive summarization*: Generates a summary with concise, coherent sentences or words that aren't verbatim extract sentences from the original source.

- **Summary texts**: Abstractive summarization returns a summary for each contextual input range. A long input can be segmented so multiple groups of summary texts can be returned with their contextual input range.
    - *Contextual input range*: The range within the input that was used to generate the summary text.

**Supported Documents**
* Text	.txt	An unformatted text document.
* Adobe PDF	.pdf	A portable document file formatted document.
* Microsoft Word	.docx	A Microsoft Word document file.

## Load Azure Configurations

In [1]:
import os

# Load Azure configurations from environment variables
# Ensure that AZURE_AI_LANGUAGE_KEY and AZURE_AI_LANGUAGE_ENDPOINT are set in your environment
language_key = os.environ.get('AZURE_AI_LANGUAGE_KEY')
language_endpoint = os.environ.get('AZURE_AI_LANGUAGE_ENDPOINT')

## Run the Post Request

In [35]:
data_payload = {
 "kind": "ExtractiveSummarization",
 "parameters": {
      "sentenceCount": 6
  },
 "analysisInput":{
      "documents":[
          {
        "source":{
          "location":"https://ziggystorage01.blob.core.windows.net/products?sp=racwdli&st=2025-04-14T09:00:15Z&se=2025-04-20T17:00:15Z&spr=https&sv=2024-11-04&sr=c&sig=WmDcKvCGHvBpR91G5fjIiPm1NdmScveJnV70GFFkpE8%3D"
        },
        "targets":
          {
            "location":"https://ziggystorage01.blob.core.windows.net/summary?sp=racwdli&st=2025-04-14T08:59:43Z&se=2025-04-21T16:59:43Z&spr=https&sv=2024-11-04&sr=c&sig=YgAtzPDoyvcuwBk70GcegV9CdTDRXklzlCFjNgPvXKs%3D"
          }
          }
      ]
  }
}

In [36]:
import requests
import json


url = f"{language_endpoint}/language/analyze-documents/jobs?api-version=2023-11-15-preview"

# Set the headers
headers = {
    "Content-Type": "application/json",
    "Ocp-Apim-Subscription-Key": language_key
}

# Make the POST request
response = requests.post(language_endpoint, headers=headers, data=json.dumps(data_payload))

# Print the response
print(f"Status Code: {response.status_code}")
print(f"Response: {response.text}")

Status Code: 404
Response: {"error":{"code":"404","message": "Resource not found"}}


## Begin Extract Summary function

In [54]:
def extractive_summarization(client, documents):
    """
    Performs extractive summarization on the provided documents.

    Args:
        client (TextAnalyticsClient): The authenticated Azure Text Analytics client.
        documents (list): A list of documents (strings) to summarize.

    Prints:
        Extracted sentences for each document or error messages if any.
    """
    # Start the extractive summarization process
    poller = client.begin_extract_summary(
        documents,
        max_sentence_count=4  # Limit the summary to 4 sentences
    )
    extract_summary_results = poller.result()

    # Iterate through the results and print summaries or errors
    for doc_index, document in enumerate(extract_summary_results, start=1):
        if document.kind == "ExtractiveSummarization":
            print(f"Document {doc_index} Summary:")
            for i, sentence in enumerate(document.sentences, start=1):
                print(f"  Sentence {i}: {sentence.text}")
        elif document.is_error is True:
            print(f"Document {doc_index} has an error with code '{document.error.code}' and message '{document.error.message}'")

## Begin Abstract Summary function

In [55]:
import textwrap

def abstractive_summarization(client, documents):
    """
    Performs abstractive summarization on the provided documents.

    Args:
        client (TextAnalyticsClient): The authenticated Azure Text Analytics client.
        documents (list): A list of documents (strings) to summarize.

    Prints:
        Abstractive summaries for each document or error messages if any.
    """
    # Start the abstractive summarization process
    poller = client.begin_abstract_summary(documents)
    abstract_summary_results = poller.result()

    # Iterate through the results and print summaries or errors
    for doc_index, result in enumerate(abstract_summary_results, start=1):
        if result.kind == "AbstractiveSummarization":
            print(f"Document {doc_index} Summary:")
            for summary in result.summaries:
                # Wrap text to 120 characters for better readability
                wrapped_text = textwrap.fill(summary.text, width=120)
                print(wrapped_text)
                print()  # Add a blank line for better readability
        elif result.is_error is True:
            print(f"Document {doc_index} has an error with code '{result.error.code}' and message '{result.error.message}'")

In [56]:
documents = [
    """The extractive summarization feature uses natural language processing techniques to locate key sentences in an unstructured text document. 
    These sentences collectively convey the main idea of the document. This feature is provided as an API for developers. 
    They can use it to build intelligent solutions based on the relevant information extracted to support various use cases. 
    Extractive summarization supports several languages. 
    It is based on pretrained multilingual transformer models, part of our quest for holistic representations. 
    It draws its strength from transfer learning across monolingual and harness the shared nature of languages to produce models of improved quality and efficiency.
    """,
    
    """At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. 
    As Chief Technology Officer of Azure AI Cognitive Services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. 
    In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). 
    At the intersection of all three, there's magic-what we call XYZ-code as illustrated in Figure 1-a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. 
    We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. 
    The goal is to have pretrained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. 
    Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. 
    These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multisensory and multilingual learning that is closer in line with how humans learn and understand. 
    I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."""
]

In [57]:
extractive_summarization(client, documents)

Document 1 Summary:
  Sentence 1: The extractive summarization feature uses natural language processing techniques to locate key sentences in an unstructured text document.
  Sentence 2: This feature is provided as an API for developers.
  Sentence 3: Extractive summarization supports several languages.
  Sentence 4: It is based on pretrained multilingual transformer models, part of our quest for holistic representations.
Document 2 Summary:
  Sentence 1: At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding.
  Sentence 2: At the intersection of all three, there's magic-what we call XYZ-code as illustrated in Figure 1-a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.
  Sentence 3: We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages.
  Sentence 4

In [58]:
abstractive_summarization(client, documents)

Document 1 Summary:
The source document describes an extractive summarization API that leverages natural language processing to identify
pivotal sentences encapsulating the core message of an unstructured text. Developers can integrate this API into their
applications to access essential information across multiple languages, thanks to its foundation on multilingual
transformer models. These models, which are part of a broader initiative for comprehensive language understanding,
utilize transfer learning to enhance model performance by applying knowledge across languages. The summarization
technique not only supports various languages but also benefits from shared linguistic properties to improve efficiency
and quality of the extracted summaries. This makes the API a versatile tool for developers aiming to create intelligent
solutions that rely on distilled information from large text documents. The document highlights the API'ged capability
to process and summarize content, making it 