# Text Summarization Demonstration

This notebook demonstrates how to use the summarization API in the mood-map project. It shows both direct model usage and API interaction.

## 1. Direct Model Usage

First, let's demonstrate how to use the text summarization model directly.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import time

In [None]:
# Check for GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device} for summarization model")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

In [None]:
# Load pre-trained summarization model
model_name = "facebook/bart-large-cnn"  # Good balance of quality and speed
print(f"Loading model: {model_name}...")

start_time = time.time()
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
print(f"Model loaded in {time.time() - start_time:.2f} seconds")

# Create a summarization pipeline
summarizer = pipeline(
    "summarization", 
    model=model, 
    tokenizer=tokenizer,
    device=0 if device == "cuda" else -1
)

In [None]:
# Sample text to summarize
article = """
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. 
It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.
AI technology is widely used throughout industry, government and science. Some high-profile applications include advanced web search engines (e.g., Google Search), 
recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Waymo), 
generative and creative tools (ChatGPT and AI art), and superhuman play and analysis in strategy games (such as chess and Go).

Artificial intelligence was founded as an academic discipline in 1956. The field went through multiple cycles of optimism followed by 
disappointment and loss of funding, but after 2012, advances in machine learning led to a resurgence, massive growth, 
and substantial industrial investment. Deep learning's growing capabilities and successes since the early 2010s laid the groundwork 
for advanced, widely available AI chatbots like Replika, ChatGPT, Google Bard, Claude, and Anthropic, 
along with image generators like DALL-E, Midjourney, and Stable Diffusion. This wave of generative AI, able to create novel text, images, 
audio, video, and code, has fueled a surge in public interest and conversation starting in 2022.

Machine learning, which investigates algorithms that can learn from data, has been central to AI since its founding. Many modern AI 
systems are based on machine learning, Large Language Models, and other statistical approaches that analyze large amounts of data.
These methods have shown successful recent application across many tasks, but they remain brittle in ways that humans are not: 
they can fail in unexpected ways, be biased, hallucinate non-factual statements, and generate toxic content.

The long-term goal of artificial intelligence research is Artificial General Intelligence (AGI), a hypothetical form of AI that 
can learn to accomplish any intellectual task that human beings or other animals can. Risks of advanced AI have been debated 
since its founding and may pose an existential risk to humanity.
"""
print(f"Original text length: {len(article.split())} words")

In [None]:
# Generate summary
start_time = time.time()
summary = summarizer(article, max_length=150, min_length=40, do_sample=False)
print(f"Summarization completed in {time.time() - start_time:.2f} seconds")

print("\nSummary:")
print(summary[0]['summary_text'])
print(f"\nSummary length: {len(summary[0]['summary_text'].split())} words")

## 2. Using the Backend API

Now let's demonstrate how to use the summarization API endpoint.

In [None]:
import requests
import json

# Make sure your Flask backend is running
API_URL = "http://localhost:5000/summarize"

# Sample text (shorter than the previous one for variety)
sample_text = """
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to 
imitate the way that humans learn, gradually improving its accuracy. IBM has a rich history with machine learning. One of its own, 
Arthur Samuel, is credited for coining the term, "machine learning" with his research (PDF, 481 KB) (link resides outside IBM) 
around the game of checkers. Robert Nealey, the self-proclaimed checkers master, played the game on an IBM 7094 computer in 1962, 
and he lost to the computer. Compared to what can be done today, this feat seems trivial, but it's considered a major milestone in 
the field of artificial intelligence.

In the modern era, machine learning has been one of the most exciting technologies to emerge for decades and will continue to be 
a topic of discussion among industry and academic circles for a long time to come. Especially with the tremendous amount of data 
that is being produced today, from text to images and videos, across many sectors, intelligent systems that can learn from this data 
to improve their functioning and deliver better performance are bound to be incredibly valuable.
"""

# Function to make API request
def get_summary_from_api(text, max_length=120, min_length=30):
    try:
        payload = {
            "text": text,
            "max_length": max_length,
            "min_length": min_length
        }
        
        response = requests.post(API_URL, json=payload)
        
        if response.status_code == 200:
            return response.json()
        else:
            return {"error": f"API request failed with status code {response.status_code}: {response.text}"}
    except Exception as e:
        return {"error": f"Exception occurred: {str(e)}"}

# Note: You need to have your Flask backend running to execute this cell successfully
print("To use this cell, make sure your backend Flask server is running first!")
print("Run the backend with: python backend/app.py")

In [None]:
# Call the API (uncomment to run)
# result = get_summary_from_api(sample_text)
# print(json.dumps(result, indent=2))

## 3. Customizing the Summary Length

Let's experiment with different summary lengths.

In [None]:
# Using the model directly for different summary lengths
def compare_summary_lengths(text):
    # Short summary
    short_summary = summarizer(text, max_length=60, min_length=30, do_sample=False)
    
    # Medium summary
    medium_summary = summarizer(text, max_length=120, min_length=60, do_sample=False)
    
    # Long summary
    long_summary = summarizer(text, max_length=180, min_length=90, do_sample=False)
    
    print(f"Original text: {len(text.split())} words")
    
    print("\nShort summary:")
    print(short_summary[0]['summary_text'])
    print(f"Length: {len(short_summary[0]['summary_text'].split())} words")
    
    print("\nMedium summary:")
    print(medium_summary[0]['summary_text'])
    print(f"Length: {len(medium_summary[0]['summary_text'].split())} words")
    
    print("\nLong summary:")
    print(long_summary[0]['summary_text'])
    print(f"Length: {len(long_summary[0]['summary_text'].split())} words")

# Run on the article
compare_summary_lengths(article)

## 4. Combining Sentiment Analysis and Summarization

Now let's demonstrate how to use both sentiment analysis and summarization together.

In [None]:
# Function to get sentiment analysis from the API
def get_sentiment(text):
    try:
        sentiment_url = "http://localhost:5000/analyze"
        payload = {"text": text}
        response = requests.post(sentiment_url, json=payload)
        
        if response.status_code == 200:
            return response.json()["prediction"]
        else:
            return f"API request failed with status code {response.status_code}: {response.text}"
    except Exception as e:
        return f"Exception occurred: {str(e)}"

# Note: You need to have your Flask backend running to execute this cell successfully
print("To use this cell, make sure your backend Flask server is running first!")

In [None]:
# Sample text with strong sentiment
text_with_sentiment = """
The new restaurant that opened downtown last week is absolutely fantastic! I had the most amazing dining experience there. 
The ambiance was elegant yet comfortable, with soft lighting and tasteful decor that created a welcoming atmosphere. 
The service was impeccable - our server was attentive, knowledgeable about the menu, and made excellent recommendations. 
The food was the real star though. Every dish was beautifully presented and burst with flavor. The chef clearly uses only 
the freshest ingredients and has mastered the art of balancing flavors and textures. I started with the truffle risotto, 
which was creamy and rich with just the right amount of truffle. For the main course, I had the pan-seared sea bass with 
a lemon butter sauce that was light yet decadent. The dessert - a chocolate soufflé with homemade vanilla ice cream - 
was pure perfection. I can't remember the last time I had such an outstanding meal from start to finish. The prices were 
reasonable considering the quality of food and overall experience. I've already made reservations to go back next week and 
can't wait to try more dishes. This place is destined to become the top dining destination in the city!
"""

# Generate summary
summary_text = summarizer(text_with_sentiment, max_length=100, min_length=30, do_sample=False)[0]['summary_text']

print("Summary:")
print(summary_text)
print(f"\nSummary length: {len(summary_text.split())} words")

# Get sentiment for both original and summary (uncomment when API is available)
# original_sentiment = get_sentiment(text_with_sentiment)
# summary_sentiment = get_sentiment(summary_text)

# print(f"\nOriginal text sentiment: {original_sentiment}")
# print(f"Summary sentiment: {summary_sentiment}")

## 5. Batch Processing Multiple Texts

Finally, let's see how to process multiple texts efficiently.

In [None]:
# Sample texts to summarize
sample_texts = [
    """Climate change is the long-term alteration of temperature and typical weather patterns in a place. Climate change could refer to a particular location or the planet as a whole. Climate change may cause weather patterns to be less predictable. These unexpected weather patterns can make it difficult to maintain and grow crops in regions that rely on farming because expected temperature and rainfall levels can no longer be relied on. Climate change has also been connected with other damaging weather events such as more frequent and more intense hurricanes, floods, downpours, and winter storms.""",
    
    """The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identified from an outbreak in Wuhan, China, in December 2019. Attempts to contain it there failed, allowing the virus to spread worldwide. The World Health Organization (WHO) declared a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March 2020. As of 27 April 2025, the pandemic had caused more than 774 million confirmed cases and 6.98 million confirmed deaths, making it one of the deadliest in history.""",
    
    """Renewable energy is useful energy that is collected from renewable resources, which are naturally replenished on a human timescale, including carbon-neutral sources like sunlight, wind, rain, tides, waves, and geothermal heat. This type of energy source stands in contrast to fossil fuels, which are being used far more quickly than they are being replenished. Renewable energy often provides energy in four important areas: electricity generation, air and water heating/cooling, transportation, and rural (off-grid) energy services."""
]

# Process each text and time it
total_start_time = time.time()

for i, text in enumerate(sample_texts):
    print(f"\nText {i+1} - Original length: {len(text.split())} words")
    
    start_time = time.time()
    summary = summarizer(text, max_length=60, min_length=20, do_sample=False)
    process_time = time.time() - start_time
    
    print(f"Summary (generated in {process_time:.2f} seconds):")
    print(summary[0]['summary_text'])
    print(f"Summary length: {len(summary[0]['summary_text'].split())} words")

total_time = time.time() - total_start_time
print(f"\nTotal processing time for {len(sample_texts)} texts: {total_time:.2f} seconds")
print(f"Average time per text: {total_time/len(sample_texts):.2f} seconds")