# AI-Powered News Classifier and Headline Detector - Demo

This notebook demonstrates the capabilities of our AI-powered news processing system.

## Features
- **Text Classification**: Categorize news articles into Sports, Business, Sci/Tech, or World news
- **Headline Generation**: Create catchy headlines from article content
- **URL Processing**: Extract and process articles directly from web URLs

## Models Used
- **Classification**: DistilBERT fine-tuned on AG News dataset
- **Headline Generation**: Google PEGASUS-XSum model

In [None]:
# Import required libraries
import sys
import warnings
warnings.filterwarnings('ignore')

# Import our news processor
from news_processor import NewsProcessor

print("Loading AI models...")
processor = NewsProcessor()
print("Models loaded successfully!")

## Example 1: Technology News Classification and Headline Generation

In [None]:
# Technology news example
tech_article = """
Apple Inc. announced today that they are releasing a new iPhone model with advanced artificial 
intelligence capabilities. The device features an improved camera system with computational 
photography, faster processing speeds with the new A17 chip, and enhanced battery life. 
The company expects this to be their best-selling product this year, with pre-orders starting 
next week. CEO Tim Cook highlighted the revolutionary AI features that will change how users 
interact with their phones. The starting price will be $999 for the base model.
"""

result = processor.process_text(tech_article, "Apple Announces New iPhone")

print("=== TECHNOLOGY NEWS EXAMPLE ===")
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Original Headline: {result['original_headline']}")
print(f"Generated Headline: {result['generated_headline']}")
print(f"Word Count: {result['word_count']}")

## Example 2: Sports News

In [None]:
# Sports news example
sports_article = """
The Los Angeles Lakers secured a thrilling victory against the Golden State Warriors last night, 
winning 118-112 in overtime at the Staples Center. LeBron James led the scoring with 35 points, 
while Anthony Davis contributed 28 points and 15 rebounds. The game went into overtime after 
Stephen Curry hit a three-pointer with 15 seconds remaining in regulation. This victory puts 
the Lakers in third place in the Western Conference standings. Coach Darvin Ham praised his 
team's resilience and determination in the post-game interview.
"""

result = processor.process_text(sports_article, "Lakers Beat Warriors in Overtime")

print("=== SPORTS NEWS EXAMPLE ===")
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Original Headline: {result['original_headline']}")
print(f"Generated Headline: {result['generated_headline']}")
print(f"Word Count: {result['word_count']}")

## Example 3: Business News

In [None]:
# Business news example
business_article = """
Global stock markets experienced significant volatility today as investors reacted to new 
economic data and Federal Reserve announcements. The Dow Jones Industrial Average fell 2.3%, 
while the S&P 500 dropped 1.8%. Technology stocks were particularly affected, with major 
companies like Apple, Microsoft, and Google seeing declines of 3-4%. Analysts attribute the 
sell-off to concerns about rising interest rates and inflation fears. The Federal Reserve 
indicated they may raise rates sooner than previously expected to combat inflation.
"""

result = processor.process_text(business_article, "Stock Markets Fall on Fed News")

print("=== BUSINESS NEWS EXAMPLE ===")
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Original Headline: {result['original_headline']}")
print(f"Generated Headline: {result['generated_headline']}")
print(f"Word Count: {result['word_count']}")

## Example 4: World News

In [None]:
# World news example
world_article = """
The United Nations Security Council convened an emergency session today to address the ongoing 
humanitarian crisis in the region. Representatives from multiple countries called for immediate 
action to provide aid to affected populations. Secretary-General António Guterres emphasized 
the urgent need for international cooperation and condemned the violence against civilians. 
Several nations have already pledged financial support and resources for relief efforts. 
The situation remains fluid with diplomatic efforts continuing around the clock.
"""

result = processor.process_text(world_article, "UN Security Council Addresses Crisis")

print("=== WORLD NEWS EXAMPLE ===")
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Original Headline: {result['original_headline']}")
print(f"Generated Headline: {result['generated_headline']}")
print(f"Word Count: {result['word_count']}")

## Example 5: Multiple Headline Generation

In [None]:
# Generate multiple headline options for the same article
from models.headline_generator import HeadlineGenerator

headline_gen = HeadlineGenerator()

sample_article = """
Scientists at MIT have developed a new quantum computing breakthrough that could revolutionize 
data processing and encryption. The research team, led by Dr. Sarah Johnson, created a quantum 
processor that maintains coherence for significantly longer periods than previous designs. 
This advancement brings practical quantum computing applications closer to reality.
"""

headlines = headline_gen.generate_multiple(sample_article, num_headlines=5)

print("=== MULTIPLE HEADLINE OPTIONS ===")
for i, headline in enumerate(headlines, 1):
    print(f"{i}. {headline}")

## Example 6: Classification Only

In [None]:
# Just classify without generating headlines
test_texts = [
    "The Federal Reserve announced new interest rate policies affecting the economy.",
    "Manchester United signed a new striker for the upcoming football season.",
    "Researchers developed a new AI algorithm for medical diagnosis.",
    "The European Union announced new environmental regulations."
]

print("=== CLASSIFICATION EXAMPLES ===")
for i, text in enumerate(test_texts, 1):
    result = processor.classify_text(text)
    print(f"{i}. Text: {text[:60]}...")
    print(f"   Category: {result['category']} (Confidence: {result['confidence']:.2%})")
    print()

## Example 7: Batch Processing Visualization

In [None]:
# Process multiple articles and visualize results
import matplotlib.pyplot as plt
import pandas as pd

# Sample articles for batch processing
articles = [
    "Apple released a new MacBook with M3 chip and improved performance.",
    "The Lakers won their game against the Celtics 115-109 last night.",
    "Stock market indices fell due to concerns about inflation rates.",
    "Scientists discovered a new exoplanet in the habitable zone.",
    "The FIFA World Cup final attracted millions of viewers worldwide.",
    "Tesla announced plans to build new manufacturing facilities.",
    "The United Nations called for peace talks in the region.",
    "Google unveiled new AI capabilities for their search engine."
]

# Process all articles
results = []
for article in articles:
    result = processor.classify_text(article)
    results.append(result)

# Create DataFrame for analysis
df = pd.DataFrame(results)

# Plot category distribution
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
category_counts = df['category'].value_counts()
plt.pie(category_counts.values, labels=category_counts.index, autopct='%1.1f%%')
plt.title('Distribution of Article Categories')

plt.subplot(1, 2, 2)
plt.hist(df['confidence'], bins=10, alpha=0.7, color='skyblue', edgecolor='black')
plt.xlabel('Confidence Score')
plt.ylabel('Frequency')
plt.title('Distribution of Classification Confidence')

plt.tight_layout()
plt.show()

print("\n=== BATCH PROCESSING SUMMARY ===")
print(f"Total articles processed: {len(articles)}")
print(f"Average confidence: {df['confidence'].mean():.2%}")
print(f"Most common category: {df['category'].mode()[0]}")

## Example 8: URL Processing (Optional - requires internet)

In [None]:
# Example of processing a real URL (uncomment to test with actual URLs)

# IMPORTANT: This requires internet connection and a valid news article URL
# Replace with an actual news article URL to test

# sample_url = "https://www.example-news-site.com/article"
# try:
#     result = processor.process_url(sample_url)
#     print("=== URL PROCESSING EXAMPLE ===")
#     print(f"URL: {sample_url}")
#     print(f"Title: {result['original_headline']}")
#     print(f"Category: {result['category']}")
#     print(f"Generated Headline: {result['generated_headline']}")
#     print(f"Content Preview: {result['content_preview'][:200]}...")
# except Exception as e:
#     print(f"Error processing URL: {e}")

print("URL processing example is commented out.")
print("To test with real URLs, uncomment the code above and provide a valid news article URL.")

## Summary

This notebook demonstrated the key capabilities of our AI-Powered News Classifier and Headline Detector:

1. **Accurate Classification**: The system can categorize news articles into different domains with high confidence
2. **Creative Headlines**: Generated headlines are concise and engaging
3. **Versatile Input**: Supports both text input and URL processing
4. **Batch Processing**: Can handle multiple articles efficiently
5. **Detailed Analysis**: Provides confidence scores and metadata

### Next Steps
- Try the web interface: `streamlit run app.py`
- Use the command line: `python cli.py --help`
- Experiment with your own news articles!