## Summarization

Summarization is the process of shortening a text document or dataset into a concise version that retains the most important information. This can be useful for quickly understanding the content of longer documents, or for generating summaries of large datasets for easier consumption. There are two main approaches to summarization: extractive and abstractive.

In [1]:
# Initial setup for Azure OpenAI API
import os
import openai
from dotenv import load_dotenv

# Load environment variables from .env file
dotenv_path = os.path.join(os.path.dirname(os.getcwd()), '.env')  # Assumes .env is in the parent directory of your notebook
load_dotenv(dotenv_path)

# Access environment variables
AZURE_OPENAI_API_KEY = os.environ.get('AZURE_OPENAI_KEY')
AZURE_OPENAI_ENDPOINT = os.environ.get('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_VERSION = os.environ.get('AZURE_OPENAI_VERSION')

openai.api_type = AZURE_OPENAI_VERSION
openai.api_key = AZURE_OPENAI_API_KEY
openai.api_base = AZURE_OPENAI_ENDPOINT
openai.api_version = AZURE_OPENAI_VERSION # this may change in the future

# Setting constant for text-davinci-003 model used, name of deployment in azure resource
deployment_name = "text-davinci-003"

### 1. Extractive Summarization
Extractive summarization involves selecting whole sentences or phrases directly from the source document to form the summary. This method essentially "extracts" the most relevant parts of the original content without altering the wording. The challenge here is to identify which sentences or parts are the most informative and relevant to the main idea of the document.

In [2]:
## select the prompt to use
with open('promptlibrary/extractive_summary.txt', 'r') as file:
    prompt_extractive_summary = file.read()


# Open the file with the text to summarize
with open('./data/financial.txt', 'r') as file:
    text_to_summarize = file.read()



# full prompt
final_extr_prompt = f"{prompt_extractive_summary} \n\n# Start of Report \n{text_to_summarize}\n# End of Report \n\nSummary: \n"
print(final_extr_prompt)

Below is an extract from the annual financial report of a company. Extract key financial number (if present), key internal risk factors, and key external risk factors. 

# Start of Report 
Revenue increased $7.5 billion or 16%. Commercial products and cloud services revenue increased $4.0 billion or 13%. O365 Commercial revenue grew 22% driven by seat growth of 17% and higher revenue per user. Office Consumer products and cloud services revenue increased $474 million or 10% driven by Consumer subscription revenue, on a strong prior year comparable that benefited from transactional strength in Japan. Gross margin increased $6.5 billion or 18% driven by the change in estimated useful lives of our server and network equipment. 
Our competitors range in size from diversified global companies with significant research and development resources to small, specialized firms whose narrower product lines may let them be more effective in deploying technical, marketing, and financial resources. B

In [3]:
response = openai.Completion.create(
        engine = deployment_name,
        prompt = final_extr_prompt,
        temperature = 0.3,
        max_tokens = 250
    )

print(response.choices[0].text)

Key Financial Number: $7.5 billion revenue increase, $4.0 billion increase in commercial products and cloud services revenue, $474 million increase in Office Consumer products and cloud services revenue, $6.5 billion increase in gross margin. 
Key Internal Risk Factors: Competitors range in size from diversified global companies to small, specialized firms, low barriers to entry in many of our businesses, need to remain competitive by making innovative products, devices, and services. 
Key External Risk Factors: Evolving technologies, shifting user needs, frequent introductions of new products and services.


### 2. Abstractive Summarization
Abstractive summarization, on the other hand, involves generating new sentences that convey the main information from the original document. Instead of just copying parts of the source content, abstractive methods aim to understand the content and then "rephrase" or "rewrite" it in a more condensed form. This approach can lead to more natural-sounding summaries but requires more advanced natural language processing capabilities.

In [5]:
# Select the prompt to use
with open('promptlibrary/abstractive_summary.txt', 'r') as file:
    prompt_abstractive_summary = file.read()

# Open the file with the text to summarize
with open('./data/microsoft.txt', 'r') as file:
    text_to_summarize = file.read()


# full prompt
final_abs_prompt = f"{prompt_abstractive_summary} \n\n# Start of Report\n{text_to_summarize}\n# End of Report \n\nSummary: \n"
print(final_abs_prompt)

Provide a summary of the text below that captures its main idea. 

# Start of Report

At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive Services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, thereâ€™s magicâ€”what we call XYZ-code as illustrated in Figure 1â€”a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can join

In [6]:
response = openai.Completion.create(
        engine = deployment_name,
        prompt = final_abs_prompt,
        temperature = 0.3,
        max_tokens = 250
    )

print(response.choices[0].text)

Microsoft is working to advance AI by taking a more holistic, human-centric approach to learning and understanding. This approach, called XYZ-code, is a joint representation of monolingual text, audio or visual sensory signals, and multilingual signals. Over the past five years, Microsoft has achieved human performance on various AI tasks, and they believe XYZ-code will enable them to achieve a leap in AI capabilities that is closer to how humans learn and understand.
