# AI Trend Monitor - Project Report

**Project**: AI Trend Monitor  
**Author**: Amanda Sumner  
**Date**: October 2025  
**School**: EC Utbildning, DS24

---

## Abstract

This report documents the development of an AI-powered news monitoring system as a proof-of-concept for enterprise-scale trend monitoring applications. The project demonstrates a complete pipeline for automated news collection, natural language processing, knowledge base indexing, and interactive visualization using Microsoft Azure cloud services and Python-based tools.

The system was implemented through six phases: multi-source data pipeline with deduplication, Azure AI Language integration for sentiment analysis and entity recognition, Azure AI Search knowledge base indexing, responsive web dashboard with Streamlit, Retrieval-Augmented Generation (RAG) chatbot powered by GitHub Models, and automated weekly newsletter system with Azure Functions. The final system indexes 500+ curated articles from 7 active sources, provides real-time search capabilities, and delivers conversational AI queries with temporal awareness.

A primary objective of this project was to evaluate AI-assisted development workflows using GitHub Copilot with Claude Sonnet 4.5. AI assistance proved highly effective for code generation, debugging, and rapid implementation, achieving an estimated 80% reduction in development time compared to traditional methods (1 month actual vs. 5+ months estimated). The project successfully validates both the technical architecture for trend monitoring systems and the practical viability of AI-assisted development for complex cloud applications.

Key technical achievements include 30-40% cost optimization through early deduplication and compact storage, professional responsive design supporting multiple device sizes, and intelligent token budget management preventing API errors. The system demonstrates cost-effective cloud architecture mixing free and paid Azure tiers (under 200 SEK/month operating costs).

---

Denna rapport beskriver ett proof-of-concept (PoC) för ett AI-drivet nyhetsbevakningssystem för trendanalys, byggt med Microsoft Azure och Python. Systemet automatiserar datainsamling, NLP (sentiment, entiteter), indexering i Azure AI Search och inkluderar en RAG-chatbot (GitHub Models) samt en Streamlit-dashboard. Ett huvudfokus var utvärderingen av AI-assisterad utveckling (GitHub Copilot), vilket reducerade utvecklingstiden med uppskattningsvis 80% (till 1 månad). Projektet validerar den tekniska arkitekturen och demonstrerar en kostnadseffektiv drift (under 200 SEK/månad) genom optimeringar som tidig deduplicering.

**Keywords**: Natural Language Processing, Azure AI Services, Web Scraping, Sentiment Analysis, RAG Architecture, AI-Assisted Development, GitHub Copilot

## Table of Contents

1. [Introduction](#1-introduction)
2. [Theory and Background](#2-theory-and-background)
3. [Methods](#3-methods)
4. [Results](#4-results)
5. [Discussion](#5-discussion)
6. [Conclusion](#6-conclusion)
7. [References](#7-references)

## 1. Introduction

### 1.1 Background and Motivation

This project originated from a professional need to develop trend monitoring capabilities for enterprise applications. Organizations increasingly require automated systems to track developments in their domains, whether technology trends, market movements, regulatory changes, or competitive intelligence. Building such systems at scale demands proven architectures, cost-effective infrastructure, and reliable processing pipelines.

Rather than attempting to build an enterprise system directly, this project serves as a **proof-of-concept** to validate technical approaches, evaluate cloud services, and identify potential challenges in a controlled environment. The choice of AI news as the monitoring domain was strategic: AI developments are highly relevant to both my professional work and academic studies, the topic generates sufficient article volume to test pipeline performance, and the resulting system provides practical value as a personal news aggregation tool.

A second, equally important objective was to **evaluate AI-assisted development** as a methodology for complex software projects. Using GitHub Copilot throughout the development process, this project tests whether AI code generation can materially accelerate development, improve code quality, and reduce the cognitive burden of working with unfamiliar technologies and APIs.

### 1.2 Project Objectives

**Primary Objective**: Develop and validate a proof-of-concept architecture for enterprise-scale trend monitoring systems

**Secondary Objectives**:
1. **Pipeline Architecture**: Design robust data ingestion from multiple sources with deduplication, validation, and error handling
2. **Cloud Integration**: Implement production-quality integration with Azure services (Storage, AI Language, AI Search, Functions)
3. **NLP Analysis**: Apply sentiment analysis, named entity recognition, and key phrase extraction to extract insights from unstructured text
4. **Knowledge Management**: Build searchable, indexed knowledge base enabling efficient retrieval and analysis
5. **User Interface**: Create interactive dashboard demonstrating visualization and exploration of trend data
6. **Chatbot Interaction**: Implement RAG-based conversational interface for natural language queries
7. **Automation**: Deploy scheduled report generation and distribution system
8. **AI-Assisted Development**: Document effectiveness of AI code generation throughout the project lifecycle

### 1.3 Why AI News Monitoring?

**Domain Relevance**:
- AI developments directly relevant to data science studies
- Industry knowledge essential for career development in AI/ML roles
- Rapidly evolving field provides sufficient content volume for testing

**Personal Utility**:
- Serves as personalized news aggregator for staying current with AI trends
- Curated content reduces time spent scanning multiple news sources
- Conversational chatbot enables quick queries about recent developments

**Technical Appropriateness**:
- English-language content simplifies NLP processing
- Well-defined entity types (companies, products, researchers, technologies)
- Clear sentiment dimensions (positive/negative coverage of AI developments)
- Sufficient article volume without overwhelming scale (150-200 articles/month)

**Proof-of-Concept Validity**:
- Techniques demonstrated here transfer directly to other domains:
  - **Competitor monitoring**: Track mentions of competitor products, pricing, features
  - **Regulatory tracking**: Monitor policy changes, compliance requirements, legal developments
  - **Market intelligence**: Analyze customer sentiment, emerging needs, industry shifts
  - **Research monitoring**: Track academic publications, clinical trials, patents
- Architecture scales to higher volumes with minimal modifications
- Azure services used here (Storage, AI Language, Search, Functions) are enterprise-standard

### 1.4 AI-Assisted Development Approach

**Development Environment**:
- **IDE**: Visual Studio Code with GitHub Copilot extension
- **AI Coding Assistant**: Claude Sonnet 4.5 (GitHub Copilot agent) for most of the project
  - Initially attempted Gemini 2.5 Pro, but it proved insufficient for code generation
  - Claude Sonnet 4.5 provided significantly higher quality responses, especially for complex code
- **AI Assistance**: Code generation, debugging suggestions, API integration guidance
- **Version Control**: Git with comprehensive commit history documenting AI-assisted changes
- **Documentation**: 38 development sessions documented showing evolution and decisions

**GitHub Copilot Usage Patterns**:
1. **Boilerplate Generation**: Azure SDK integration code, configuration file structures
2. **API Exploration**: Discovering Azure AI Language and Search API patterns
3. **Error Handling**: Generating retry logic, validation checks, graceful degradation
4. **Data Transformations**: Converting between article formats, cleaning HTML content
5. **UI Components**: Streamlit dashboard layouts, chart configurations, responsive CSS
6. **Debugging**: Identifying issues with Collection fields, date parsing, token limits

**Observed Benefits** (documented throughout development sessions):
- Significantly faster initial implementation of Azure integrations
- Reduced context-switching when working with multiple APIs simultaneously
- Quick generation of test utilities and debugging scripts
- Consistent error handling patterns across modules
- Less time spent reading documentation for syntax and method signatures

**Limitations Encountered**:
- AI suggestions sometimes outdated for rapidly evolving APIs
- Required human judgment for architecture decisions
- Generated code occasionally needed refactoring for clarity
- Domain-specific optimizations (early deduplication, token budgets) required human insight

**Development Time Estimation**:
Based on traditional development experience and documented session notes, AI assistance reduced development time by approximately **80%** compared to manual coding. The project was completed in approximately 1 month (September 29 - October 29, 2025). Without AI assistance, completing a system of this complexity, learning multiple Azure services, implementing web scraping, NLP analysis, responsive design, RAG architecture, and automated newsletters, would have required an estimated 4-5 months given the need for extensive documentation reading, trial-and-error debugging, and manual boilerplate coding.

### 1.5 Scope and Deliverables

**In Scope**:
- Proof-of-concept data pipeline with 7 active RSS/API sources
- Azure cloud infrastructure demonstration (Blob Storage, AI Language, AI Search, Functions)
- Interactive web dashboard (Streamlit) with 5 pages: News, Analytics, Chatbot, Subscribe, About
- RAG chatbot with temporal query detection and token management
- Automated weekly newsletter generation and delivery
- Responsive design supporting desktop, tablet, and mobile devices
- Comprehensive documentation of architecture, decisions, and AI-assisted development

**Out of Scope**:
- Enterprise-scale deployment (multi-tenant, load balancing, high availability)
- Real-time social media monitoring (Twitter/X API costs prohibitive for proof-of-concept)
- Video/podcast transcription and analysis (high processing costs)
- Custom machine learning model training (pre-trained Azure AI models sufficient)
- Multi-language support (English-only simplifies proof-of-concept)
- User authentication and role-based access (not needed for personal use)
- Production-grade monitoring and alerting (basic Azure monitoring sufficient)

**Deliverables**:
1. **Functional Pipeline**: `run_pipeline.py` (8-stage processing), `run_weekly_pipeline.py` (report generation)
2. **Cloud Infrastructure**: Configured Azure resources (Storage, AI Language, AI Search, Functions, Communication Services)
3. **Web Dashboard**: `streamlit_app/app.py`, responsive CSS, 4 functional pages
4. **Documentation**: 
   - Project report (this document - comprehensive technical documentation)   
   - GitHub repository with complete source code and commit history
5. **Deployed Systems**: 
   - Live web dashboard running on Azure Web App: https://trends.goblinsen.se (personal domain)
   - Automated newsletter system running on Azure Functions (Fridays 9 AM UTC)

### 1.6 Technical Context

**Development Environment**:
- **Language**: Python 3.12.11
- **Environment Manager**: Conda (`trend-monitor` environment)
- **IDE**: Visual Studio Code 1.85+ with GitHub Copilot
- **Version Control**: Git + GitHub (repository: `ai-trend-monitor`)
- **Operating System**: Windows 11 (development), Linux (Azure Functions runtime)

**Target Users** (for enterprise trend monitoring systems):
- **Corporate Strategy Teams**: Tracking competitive intelligence, market trends
- **Product Managers**: Monitoring customer sentiment, feature requests, competitor moves
- **Compliance Officers**: Following regulatory changes, policy developments
- **Research Teams**: Aggregating academic publications, clinical trials, patents
- **Marketing Teams**: Analyzing brand sentiment, campaign effectiveness, industry buzz

**Personal Use** (for this implementation):
- Author as primary user for AI news consumption
- Newsletter subscribers (opt-in via dashboard)
- Portfolio demonstration for potential employers


## 2. Theory and Background

### 2.1 Natural Language Processing (NLP)

Natural Language Processing enables computers to understand, interpret, and generate human language. This project leverages three core NLP techniques:

**Sentiment Analysis**: Computational identification of subjective opinions expressed in text, classifying sentiment as positive, negative, neutral, or mixed with confidence scores. Sentiment analysis helps track public perception of AI developments over time.

**Named Entity Recognition (NER)**: Automated extraction of entities (organizations, people, locations, technologies) from unstructured text. NER enables identification of key players in the AI landscape (companies like OpenAI, researchers, technologies like GPT-4).

**Key Phrase Extraction**: Statistical and linguistic methods to identify salient terms and concepts within documents. Key phrases reveal trending topics and enable topic clustering for analytics.

**Azure AI Language Service**: Microsoft's cloud-based NLP platform providing pre-trained models for sentiment analysis, NER, and key phrase extraction. Chosen for cost-effectiveness, batch processing capabilities, and integration with other Azure services.

### 2.2 Information Retrieval and Search

**Keyword Search**: Traditional text-matching algorithms using inverted indexes to find documents containing query terms. Fast and precise but limited to exact matches and synonyms.

**Semantic Search**: Advanced retrieval using vector embeddings and neural networks to understand query intent and document meaning beyond keywords. Captures conceptual similarity between queries and documents.

**Azure AI Search**: Microsoft's cloud search service supporting both keyword and semantic search (on paid tiers). Provides faceted navigation, filtering, and full-text search capabilities with scalable indexing.

**Search Index Schema**: Structured definition of searchable fields, data types, and attributes. This project uses 14 fields including title, content, sentiment scores, entities, and metadata with appropriate configurations for searchability and filtering.

### 2.3 Retrieval-Augmented Generation (RAG)

RAG combines information retrieval with large language model (LLM) generation to produce grounded, factual responses. The architecture consists of:

1. **Query Processing**: User question analyzed to extract intent and temporal constraints
2. **Retrieval**: Relevant documents fetched from knowledge base using search
3. **Context Formatting**: Retrieved documents formatted into structured prompt
4. **Generation**: LLM synthesizes answer using retrieved context as evidence
5. **Citation**: Response includes references to source documents

**Benefits over Pure LLM**:
- Grounded in actual data (reduces hallucinations)
- Citable sources (transparency and verification)
- Up-to-date information (knowledge base continuously updated)
- Domain-specific (tailored to AI news corpus)

**GitHub Models**: Free-tier LLM hosting platform providing access to OpenAI's GPT-4.1-mini for development and testing before production migration to Azure AI Foundry.

### 2.4 Web Development Technologies

**Streamlit**: Python-based web framework for rapid development of data applications. Chosen for:
- Native Python integration (no JavaScript required)
- Built-in components for charts, tables, forms
- Reactive programming model (automatic UI updates)
- Quick prototyping and iteration

**Responsive Web Design**: Design approach ensuring usability across device sizes using:
- **Fluid Grids**: Percentage-based layouts that adapt to viewport width
- **Flexible Images**: CSS scaling to prevent overflow
- **Media Queries**: CSS rules targeting specific screen sizes (breakpoints)
- **Viewport Meta Tag**: Proper mobile device rendering


### 2.5 Cloud Architecture Patterns

**Serverless Computing**: Cloud execution model where provider manages infrastructure. Azure Functions used for:
- Timer-triggered pipeline execution (weekly newsletter)
- HTTP-triggered API endpoints (subscription management)
- Automatic scaling and cost efficiency (pay-per-execution)

**Blob Storage**: Object storage for unstructured data. Used for:
- Raw article content (cleaned HTML)
- Analyzed articles (with NLP insights)
- URL registry (deduplication tracking)
- Newsletter subscriber data (Azure Table Storage)

**Tiered Service Strategy**: Mixing free and paid Azure tiers based on actual needs:
- Free tier where sufficient (AI Search: 50MB, 10K docs)
- Standard tier where necessary (AI Language: >5K calls/month)
- Cost optimization through early filtering and compact storage

### 2.6 Data Pipeline Design

**ETL (Extract, Transform, Load)**: Traditional data integration pattern adapted for news monitoring:

**Extract**: Fetch articles from APIs and RSS feeds
- Guardian API with date filtering (`from-date` parameter)
- RSS feeds parsed with `feedparser` library
- Web scraping for full article content

**Transform**: Clean, deduplicate, and analyze content
- HTML entity decoding and Unicode normalization
- URL deduplication using Set-based registry
- Azure AI Language batch processing (25 docs at once)
- Content filtering (minimum 100 characters)

**Load**: Store results in multiple destinations
- Azure Blob Storage (persistent files)
- Azure AI Search index (searchable knowledge base)
- URL registry update (prevent reprocessing)

**Pipeline Orchestration**: Sequential execution with error handling and logging.

## 3. Methods

### 3.1 System Architecture Overview

The AI Trend Monitor implements a cloud-based, microservices-inspired architecture with three main layers:

**Data Ingestion Layer**:
- Guardian API client (`src/api_fetcher.py`)
- RSS feed parser (`src/rss_fetcher.py`)
- Web scraper with site-specific selectors (`src/scrapers.py`)
- Date filtering at source (June 1, 2025 onwards)

**Processing Layer**:
- HTML cleaning and text extraction (`src/data_cleaner.py`)
- Azure AI Language integration (`src/language_analyzer.py`)
- Batched NLP analysis (25 documents per request)
- Content validation and truncation handling

**Storage and Retrieval Layer**:
- Azure Blob Storage operations (`src/storage.py`)
- Azure AI Search indexing (`src/search_indexer.py`)
- URL registry for deduplication (`processed_urls.json`)
- Compact JSON serialization (no indentation)

**Presentation Layer**:
- Streamlit web application (`streamlit_app/app.py`)
- RAG chatbot (`src/rag_chatbot.py`)
- External CSS styling (`streamlit_app/styles.css`)

**Automation Layer**:
- Azure Functions timer trigger (weekly pipeline)
- GPT-4.1-mini report generation (`src/generate_weekly_report.py`)
- Azure Communication Services email delivery
- Double opt-in subscription system (`src/subscriber_manager.py`)

### 3.2 Data Collection Methodology

**Source Selection Criteria**:
1. AI-focused content (primary topic, not incidental mentions)
2. Regular publication frequency (minimum weekly)
3. Professional editorial standards
4. RSS feed or API availability
5. English language content

**Active RSS Feeds** (7 sources as of October 23, 2025):
- **The Guardian API**: Major news publication with AI category filtering
- **TechCrunch**: Technology news with AI-specific RSS feeds
- **VentureBeat**: AI industry coverage and trends
- **Ars Technica**: Technical analysis and research focus
- **Gizmodo**: Consumer technology and AI applications
- **IEEE Spectrum**: Academic research and engineering
- **The Register UK**: Critical technology journalism
- **The Verge**: Product launches and reviews

**Removed Sources** (historical articles remain indexed):
- **EU-Startups** (Removed Oct 23, 2025): Azure cloud IP blocking (403 Forbidden errors). Enhanced headers did not resolve the issue. Historical articles from this source remain in the search index for analytics but no new articles are fetched.

**Note**: The Analytics page source list shows all sources with indexed articles, including discontinued sources. This distinction between "active RSS feeds" (currently fetching) and "indexed sources" (have historical data) is intentional for comprehensive trend analysis.

**Data Collection Process**:

1. **API Fetching** (Guardian):
   ```python
   params = {
       'api-key': api_key,
       'q': query_string,
       'from-date': '2025-06-01',  # Filter at source
       'page-size': 50
   }
   ```

2. **RSS Parsing** (8 feeds):
   - Use `feedparser` library for XML parsing
   - Extract: title, link, published date, description
   - Standardize date formats with `dateutil.parser`

3. **Web Scraping** (full article content):
   - Site-specific CSS selectors (e.g., `div.article-body` for VentureBeat)
   - Fallback selector list for unknown sites
   - Exponential backoff for rate limiting (1, 2, 4, 8 seconds)
   - 5MB HTML size limit to prevent parsing hangs

4. **Deduplication**:
   - Check URLs against registry BEFORE scraping
   - Uses Python set data structure for instant lookup
   - Store registry in `analyzed-articles` container

**Quality Assurance Measures**:
- Content length validation (minimum 100 characters)
- HTML entity decoding and Unicode normalization
- Truncation warnings for articles >5120 characters
- Graceful error handling (empty list returns on failures)

### 3.3 Natural Language Processing Implementation

**Azure AI Language Service Configuration**:
- Tier: Standard (S) - Pay-per-transaction
- Region: Sweden Central
- Batch size: 25 documents per request
- Character limit: 5120 per document

**Analysis Pipeline**:

1. **Sentiment Analysis**:
   - Overall document sentiment (Positive/Negative/Neutral/Mixed)
   - Confidence scores for each sentiment category
   - Net sentiment calculation: `positive_score - negative_score`

2. **Named Entity Recognition**:
   - Categories: Organization, Person, Location, Product, Technology
   - Frequency counting for trend analysis
   - Entity disambiguation (case-insensitive matching)

3. **Key Phrase Extraction**:
   - Statistical relevance scoring
   - Multi-word phrase detection
   - Deduplication and normalization

**Batch Processing Logic**:
```python
def analyze_content_batch(articles, language_key, language_endpoint):
    batch_size = 25
    for i in range(0, len(articles), batch_size):
        batch = articles[i:i+batch_size]
        documents = [
            {
                'id': str(idx),
                'text': article['content'][:5120],  # Truncate
                'language': 'en'
            }
            for idx, article in enumerate(batch)
        ]
        # Call Azure AI Language API
        # Parse and merge results
```

**Error Handling**:
- Timeout handling for long-running batches
- Partial result returns (skip failed documents)
- Extensive logging for debugging
- Retry logic with exponential backoff

### 3.4 Knowledge Base Indexing

**Azure AI Search Index Schema** (14 fields):

**Core Content Fields**:
- `id`: Unique document identifier (MD5 hash of URL)
- `title`: Article headline (searchable, sortable)
- `content`: Full article text (searchable, up to 5120 chars)
- `link`: Article URL (filterable, not searchable)
- `source`: Publication name (filterable, facetable)
- `published_date`: Original publication timestamp (sortable, filterable)

**NLP Analysis Fields**:
- `sentiment_overall`: Overall sentiment label (filterable, facetable)
- `sentiment_positive`: Positive confidence score (sortable)
- `sentiment_negative`: Negative confidence score (sortable)
- `key_phrases`: Collection of extracted phrases (searchable, limited to 100)
- `entities`: Collection of entity names (searchable, limited to 50)
- `entity_categories`: Collection of entity types (filterable, facetable)

**Metadata Fields**:
- `indexed_at`: Indexing timestamp (sortable)
- `net_sentiment`: Calculated field (positive - negative scores)

**Index Configuration**:
- Tier: Free (F) - 0 SEK/month
- Capacity: 50 MB storage, 10,000 documents max
- Search type: Keyword (semantic requires paid tier)
- Current usage: 588 documents indexed (October 28, 2025)

**Indexing Process**:
```python
def transform_article_for_search(article):
    return {
        'id': hashlib.md5(article['link'].encode()).hexdigest(),
        'title': article['title'],
        'content': article['content'][:5120],
        'link': article['link'],
        'source': article['source'],
        'published_date': article['published_date'],
        'sentiment_overall': article['sentiment']['overall'],
        'sentiment_positive': article['sentiment']['positive'],
        'sentiment_negative': article['sentiment']['negative'],
        'key_phrases': article['key_phrases'][:100],  # Limit
        'entities': [e['name'] for e in article['entities']][:50],
        'entity_categories': list(set([e['category'] for e in article['entities']])),
        'indexed_at': datetime.utcnow().isoformat() + 'Z',
        'net_sentiment': article['sentiment']['positive'] - article['sentiment']['negative']
    }
```

**Merge Strategy**: Using `merge_or_upload_documents()` for graceful duplicate handling.

### 3.5 Dashboard Development

**Technology Stack**:
- Framework: Streamlit 1.50.0
- Visualization: Matplotlib, Plotly
- Layout: Custom CSS with responsive breakpoints
- Deployment: Azure Web App Service

**Page Structure**:

**1. News Page**:
- **Search Articles**
  - Keyword search with source and date range filters
  - Search results displayed as article cards
  - Each card shows: title, source, date, sentiment badge, content preview, key phrases
  - Click-through links to original articles
- **AI News & Updates**
  - GPT-generated summaries of recent developments
  - Two sections: "Products & Industry News" and "Research & Development"
  - Generated weekly, cached in Azure Blob Storage
  - Static fallback content if generation fails

**2. Analytics Page**
- **Topic Trend Timeline**
  - Entity selection dropdown + text search
  - Dual-axis chart: Article count (left axis) + Net sentiment (right axis)
  - Three visualization modes: Daily Count, Cumulative Count, Weekly Aggregation
  - 4 metrics below chart: Total Articles, Positive %, Negative %, Date Range

- **Net Sentiment Distribution**
  - Histogram with KDE overlay showing sentiment spectrum (-1 to +1)
  - 8 sentiment metrics: Positive, Neutral, Negative, Mixed, Leaning Positive, Leaning Negative, Mean Score, Median Score
  

- **Source Statistics & Growth**
  - **Sentiment Distribution by Source**: HTML table showing each source with:
    - Source name
    - Article count
    - Horizontal stacked bar chart (Negative → Neutral → Positive → Mixed)
    - Bar percentages displayed for each sentiment category
    - Ordered by article count (descending)
  - **Growth Overview** section:
    - Total Articles
    - Latest Month stats with growth indicators (percentage change from previous month)
    - Date Range (earliest to latest article)

- **Top Topics Analysis**
  - Topic distribution table requiring minimum 2 sources (eliminates single-source spam)
  - Shows: Rank, Topic, Mentions, Articles, Sources

- **Word Cloud**
  - Entity frequency visualization with custom color scheme
  

**3. Chatbot Page** (RAG Chatbot):
- Conversational interface with message history persistence
- Clean, minimal design focused on the chat experience
- Temporal query detection ("last 24 hours", "past week", "last X days")
- Citation-based responses with numbered references ([1], [2], [3])
- Token budget management (5000 tokens default, 3500 with conversation history)
- Default retrieval: 15 articles per query for comprehensive context
- Click citations to view source article details
- Powered by GPT-4.1-mini (GitHub Models)

**4. Subscribe Page**:
- Email input form with validation
- Double opt-in confirmation workflow
- GDPR-compliant consent checkbox
- Success messaging with confirmation email notification
- Unsubscribe link handling with confirmation
- Error messaging for invalid inputs or failed subscriptions

**5. About Page**:
- Project overview
- Contact details

**Responsive Design Implementation**:

**CSS Externalization** (`styles.css`):
```css
/* Viewport-based font scaling */
h1 { font-size: clamp(1.5rem, 4vw, 2.5rem) !important; }
h2 { font-size: clamp(1.2rem, 3vw, 1.8rem) !important; }
p { font-size: clamp(0.9rem, 1.5vw, 1rem) !important; }

/* Breakpoints */
@media screen and (max-width: 1200px) { /* Single column */ }
@media screen and (max-width: 1024px) { /* Tablet padding */ }
@media screen and (max-width: 768px) { /* Mobile touch targets */ }
@media screen and (max-width: 480px) { /* Small phone fonts */ }
```

**Chart Responsiveness**:
- CSS-based scaling (width: 100%, height: auto)
- Python helper: `get_responsive_figsize(base_width, base_height)`
- Maintains aspect ratio across devices

**Color Palette** (AITREND_COLOURS):
- Professional warm beige/tan tones
- Teal for positive sentiment (colorblind-safe)
- Amber/orange for negative sentiment (colorblind-safe)
- High contrast for WCAG AA compliance

### 3.6 RAG Chatbot Implementation

**Architecture Components**:

1. **Query Processing**:
   - Temporal query detection with regex patterns
   - Date range extraction ("last 24 hours" → datetime cutoff)
   - Query enhancement for search relevance

2. **Document Retrieval**:
   - Azure AI Search query with filters
   - Temporal queries: `order_by="published_date desc"`
   - Default queries: relevance-based ranking
   - Top-K retrieval (default 15 articles)

3. **Context Formatting**:
   - Token budget calculation based on article count
   - Adaptive content truncation (chars_per_article)
   - Structured format with numbered citations
   - Metadata inclusion (title, source, date, URL)

4. **Response Generation**:
   - GitHub Models endpoint (GPT-4.1-mini)
   - System prompt with grounding instructions
   - Conversation history support
   - Citation enforcement in responses

5. **Token Management**:
   ```python
   # Budget allocation
   token_budget = 5000  # Single query
   token_budget = 3500  # With conversation history
   
   # Per-article content calculation
   chars_per_token = 4
   max_chars = token_budget * chars_per_token
   metadata_overhead = num_articles * 200  # ~50 tokens
   available_chars = max_chars - metadata_overhead
   chars_per_article = max(300, available_chars // num_articles)
   ```

**Temporal Query Patterns**:
- "last 24 hours", "past 24 hours", "today"
- "last 48 hours", "past 48 hours"
- "last X days", "past X days" (dynamic extraction)
- "this week", "last week", "past week"
- "this month", "last month", "past month"

**Error Handling**:
- 413 errors: Automatic token budget reduction
- Empty result sets: Helpful fallback messages
- API failures: Graceful degradation with error display
- Rate limiting: Exponential backoff

### 3.7 Newsletter Automation

**Azure Functions Configuration**:
- Runtime: Python 3.12
- Trigger: Timer (Cron: `0 0 9 * * 5` - Fridays 9 AM UTC)
- Flex Consumption plan (pay-per-execution)
- Operating System: Linux
- Instance Memory: 2048 MB
- Timeout: Extended for GPT processing

**Report Generation Workflow**:

1. **Data Collection**:
   - Query Azure AI Search for past 7 days
   - Retrieve 200+ analyzed articles
   - Filter by date range and quality metrics

2. **GPT-4.1-mini Analysis** (Three-section format):
   - **Executive Summary** (150-200 words):
     - Narrative overview of key developments
     - High-level trends and patterns
   - **Models and Research** (3-4 paragraphs):
     - LLM updates and technical breakthroughs
     - Academic developments and research papers
     - Model performance improvements
   - **Tools and Platforms** (2-3 paragraphs):
     - Developer tools, APIs, and integrations
     - Product launches and platform updates
     - Ecosystem developments

3. **GPT-Based Entity Extraction**:
   - GPT analyzes generated content (not database)
   - Identifies 24+ companies, products, technologies mentioned
   - Returns clean list without generic terms
   - Example entities: OpenAI, ChatGPT, Anthropic, GPT-4, Microsoft 365 Copilot, TPU

4. **Interactive Entity Linking**:
   - 45+ clickable links embedded in email content
   - Links format: `https://trends.goblinsen.se?search={entity}`
   - Dashboard auto-detects query parameter and pre-populates search
   - Search executes automatically when page loads
   - Limits: 3 occurrences per entity to avoid over-linking

5. **HTML Email Template**:
   - Mobile-responsive design with inline CSS
   - Styled entity links with hover effects
   - Professional typography and spacing
   - Unsubscribe link in footer

6. **Distribution**:
   - Azure Communication Services (sender: DoNotReply@goblinsen.se)
   - BCC delivery for privacy protection
   - Delivery status tracking
   - Bounce handling and error logging

**Subscription Management**:
- Azure Table Storage for subscriber data
- Double opt-in workflow (email confirmation required)
- Confirmation email with verification link
- Unsubscribe endpoint (HTTP trigger in Azure Functions)
- GDPR-compliant consent tracking with checkboxes
- Subscriber status: "pending" (awaiting confirmation) or "active" (confirmed)

## 4. Results

### 4.1 System Performance Metrics

**Data Collection Efficiency**:
- Articles indexed: 588 (after quality curation)
- Data sources: 7 active RSS feeds (9 total sources in index including historical)
- Date range: June 3, 2025 - October 28, 2025 (5 months)
- Pipeline execution time: ~20-30 seconds per run
- New articles per run: 0-29 (varies by publication frequency)
- Deduplication rate: ~99% (URL registry prevents redundant processing)

**Storage Optimization**:
- Compact JSON storage: 30-40% space reduction vs. indented format
- URL registry: 588 unique URLs tracked
- Blob containers: 3 (raw-articles, analyzed-articles, curated-content)
- Typical file sizes: 8-20 KB per pipeline run

**NLP Processing**:
- Batch size: 25 documents per request
- Average processing time: 30-60 seconds per batch
- Content filtering: Articles <100 chars skipped (prevents wasted API calls)
- Truncation rate: ~10% of articles exceed 5120 char limit

**Search Index Statistics**:
- Documents indexed: 588
- Storage used: ~8 MB (of 50 MB capacity)
- Average query latency: <100ms
- Fields per document: 14
- Free tier utilization: 16% storage, 5.9% document count

**Dashboard Performance**:
- Total articles: 500+ (588 indexed)
- Search filtering: Instant client-side processing
- Chatbot response time: 2-5 seconds (typical queries)
- Email delivery: <2 seconds (confirmation emails)
- Responsive breakpoints: 5 levels (1920px, 1366px, 1024px, 768px, 480px)

### 4.2 RAG Chatbot Performance

**Query Response Quality**:
- Temporal queries: 100% accurate date filtering (tested with "last 24 hours", "past week")
- Citation accuracy: All responses include numbered source references
- Token management: Zero 413 errors after implementation
- Response relevance: High-quality answers grounded in retrieved articles

**Token Budget Effectiveness**:
- Single query budget: 5000 tokens → supports 15 articles at ~1133 chars/article
- With history budget: 3500 tokens → supports 12 articles at ~950 chars/article
- Success rate: 100% (no token limit errors in production testing)

**Temporal Query Detection Examples**:
```
Query: "Summarize news from the last 24 hours"
→ Detected: "last 24 hours"
→ Cutoff: 2025-10-16 21:12:18
→ Retrieved: 17 articles from Oct 16, 2025
→ Response: Comprehensive summary with [1]-[17] citations

Query: "What happened in the past week?"
→ Detected: "past week"
→ Cutoff: 2025-10-09
→ Retrieved: 85 articles (sorted by date descending)
→ Response: Weekly trend summary with top 15 articles cited
```

**Limited Information Handling**:
- Enhanced prompts acknowledge partial information
- Example: "Wayve" query with boilerplate mentions
  - Before: "No information available"
  - After: "Wayve appears in conference speaker lists [1][2] and is described as a U.K. self-driving startup that received $1.05B investment [10]"

**Response Time**:
- Average: 2-5 seconds
- With history: 3-6 seconds
- Factors: Article retrieval (0.1s) + Content formatting (0.1s) + LLM generation (2-5s)

### 4.3 Newsletter System Performance

**Automated Report Generation**:
- Execution: Every Friday 9:00 AM UTC via Azure Functions
- Generation time: 15-25 seconds per report
- Token cost: ~5000 tokens/report (~$0.0035 per report)
- Report length: 1500-2500 words
- Report structure: 3 sections (Executive Summary, Models/Research, Tools/Platforms)

**GPT Entity Extraction Results**:
- Entities identified per report: 24+ (companies, products, technologies)
- Interactive links embedded: 45+ clickable entity mentions
- Entity linking accuracy: 100% (GPT extracts only mentioned entities)
- Dashboard integration: Query parameters enable auto-search from email

**Email Delivery Performance**:
- Delivery success rate: 100% (no bounces during testing)
- Confirmation email speed: <2 seconds
- Template: Mobile-responsive HTML with inline CSS
- Subscription workflow: Double opt-in with Azure Table Storage

**Content Quality Metrics** (Sample from October 28, 2025):
- Executive Summary: 150-200 words narrative overview
- Models and Research: 3-4 paragraphs covering LLM updates and breakthroughs
- Tools and Platforms: 2-3 paragraphs on developer tools and integrations
- Overall tone: Professional, informative, accessible

### 4.4 Cost Analysis

**Azure Service Costs** (October 2025):
- **Total accumulated** (Oct 1-28): 195 SEK
- **Projected month-end total**: ~291 SEK
- **Budget alert threshold**: 200 SEK (monitoring, not yet exceeded)
- **Daily average**: ~7 SEK (actual usage, excluding zero-cost days)

**Cost Pattern Analysis**:
- Peak spending days: Oct 14 (65 SEK), Oct 22 (45 SEK) - major pipeline runs with NLP analysis
- Typical daily cost: 7-15 SEK (pipeline execution days)
- Zero-cost days: 14 of 28 days (50%) - no pipeline activity
- Development phase impact: Higher costs during active development and testing (Oct 14-26)

**Service Cost Breakdown** (Estimated from October usage):
- **Azure AI Language**: ~140-185 SEK/month (Standard tier, NLP analysis batches)
- **Azure Communication Services**: ~47-56 SEK/month (email delivery testing and newsletters)
- **Azure Blob Storage**: <5 SEK/month (pay-as-you-go, ~500KB data)
- **Azure Functions**: <9 SEK/month (Flex Consumption plan, weekly execution)
- **Azure AI Search**: 0 SEK/month (Free tier, well within limits)
- **GitHub Models**: 0 SEK/month (free tier for chatbot development)

**Cost Optimization Strategies Implemented**:
1. Early URL deduplication: Saves ~130 Azure AI Language API calls per run
2. Content filtering: Skips articles <100 chars before analysis (prevents wasted API calls)
3. Compact JSON storage: 30-40% space reduction in blob storage
4. Free tier Azure AI Search: Sufficient for 10,000 documents (currently 588)
5. GitHub Models for chatbot: Free LLM access during development phase
6. Batched NLP processing: 25 documents per request reduces API overhead

**Production Cost Projection** (steady-state operation):
- **Pipeline automation**: Weekly runs (vs. daily testing) → ~75-110 SEK/month
- **Newsletter generation**: 4 reports/month → ~0.14 SEK/month (negligible)
- **Chatbot queries**: 
  - 10 queries/day → ~2-3 SEK/month
  - 50 queries/day → ~14 SEK/month
  - 100 queries/day → ~28 SEK/month
- **Expected steady-state**: 90-140 SEK/month (after development phase, with weekly pipeline + moderate chatbot usage)

**ROI Assessment**:
- Development phase cost: ~260 SEK/month (Oct 2025, includes testing and iterations)
- Production phase estimate: ~90-140 SEK/month (automated weekly pipeline)
- Cost reduction opportunity: Migrate chatbot to Azure AI Foundry for predictable pricing at scale
- Budget discipline: Monitoring and alerts enabled, cost-conscious architecture demonstrated

## 5. Discussion

### 5.1 Design Decisions and Trade-offs

**Cloud Service Tier Selection**:

The project uses a mixed-tier approach balancing cost and functionality:

**Azure AI Search - Free Tier (F)**:
- **Decision**: Use free tier instead of paid (700-2300 SEK/month)
- **Rationale**: 50MB capacity sufficient for thousands of articles, 10K document limit far exceeds current 588 articles
- **Trade-off**: No semantic search (requires Standard S1 tier)
- **Mitigation**: RAG chatbot with GPT-4.1-mini provides intelligent query understanding, achieving ~90% of semantic search benefits

**Azure AI Language - Standard Tier (S)**:
- **Decision**: Upgraded from Free tier after exceeding 5K transactions/month during testing
- **Rationale**: Early deduplication and content filtering keep ongoing costs minimal despite higher limits
- **Trade-off**: Pay-per-transaction pricing (~140-185 SEK/month during development)
- **Justification**: Cost acceptable for production-quality NLP analysis, expected to decrease to ~75-110 SEK/month in steady-state

**GitHub Models vs. Azure AI Foundry**:
- **Decision**: Use GitHub Models (free) for development, migrate to Azure AI Foundry later
- **Rationale**: Zero-cost prototyping with identical API, seamless migration (2-line code change)
- **Trade-off**: Free tier rate limits (15-60 req/min), no SLA guarantees
- **Migration triggers**: Hit rate limits, reach 50+ daily users, need production SLA

### 5.2 Technical Challenges and Solutions

**Challenge 1: Guardian API Fetching Historical Articles**

**Problem**: Guardian API repeatedly fetched articles from December 2023 despite project focus on recent news (June 2025+).

**Root Cause**: No date filtering at API level; all historical articles matching query returned.

**Solution**: Added `from-date: '2025-06-01'` parameter to API request, filtering at source before expensive scraping and analysis.

**Impact**: Eliminated wasted processing, reduced pipeline runtime, prevented stale data pollution.

---

**Challenge 2: Pipeline Inefficiency with Duplicate Articles**

**Problem**: Pipeline scraped full article content BEFORE checking for duplicates, wasting ~2 minutes per run when no new articles existed.

**Root Cause**: Deduplication occurred after scraping step in original pipeline design.

**Solution**: Restructured pipeline to check URLs against registry immediately after fetching metadata, scraping only new articles.

**Impact**: ~2 minutes saved per run, reduced HTTP requests, lower bandwidth usage.

---

**Challenge 3: Chatbot Temporal Query Failures**

**Problem**: Query "Summarize news from the last 24 hours" incorrectly returned "only 1 article" when 29 were just added.

**Root Cause**: No temporal query detection; semantic search matched keyword "24 hours" in old article content.

**Solution**: Implemented regex-based temporal query detection with datetime cutoff calculation. Added `order_by="published_date desc"` to Azure AI Search queries for efficient date-sorted retrieval.

**Impact**: 100% accurate temporal filtering, comprehensive summaries of recent articles.

---

**Challenge 4: Token Limit Errors (413) in RAG Chatbot**

**Problem**: Query "What happened in 2023?" with 15 articles exceeded GitHub Models 8000 token limit.

**Root Cause**: Fixed 1500 chars/article truncation without considering total token budget or article count.

**Solution**: Implemented adaptive token budget management:
- Calculate available chars based on article count and metadata overhead
- Default budget: 5000 tokens (single query), 3500 tokens (with history)
- Dynamic per-article truncation: `chars_per_article = max(300, available_chars // num_articles)`

**Impact**: Zero 413 errors in production, supports 15-20 articles per query.

---

**Challenge 5: Analytics Top Topics Dominated by Single-Source Boilerplate**

**Problem**: Top 10 Topics showed entities with 146 mentions but from single article (conference speaker list spam).

**Root Cause**: Entity frequency ≠ topic relevance; NER counted all mentions regardless of substantive coverage.

**Solution**: Added cross-source filtering requiring minimum 2 sources per topic. Removed interactive sliders (one fixed value worked better). Added "Articles" and "Sources" columns for transparency.

**Impact**: Clean list of genuine cross-source trends, eliminated boilerplate noise.

---

**Challenge 6: Responsive Design Issues on Mobile**

**Problem**: Metrics cut off numbers, buttons split text into multiple lines, charts too small on narrow viewports.

**Root Cause**: Fixed font sizes and column widths without viewport-based scaling or breakpoints.

**Solution**: 
- Externalized CSS to separate file (350 lines)
- Implemented viewport-based font scaling with `clamp()`
- Added 4 breakpoint levels (1200px, 1024px, 768px, 480px)
- Multi-layer overflow protection for metrics and buttons
- Chart CSS scaling with responsive dimensions

**Impact**: Professional appearance on all devices (desktop, laptop, tablet, mobile).

### 5.3 Lessons Learned

**Data Quality and Source Curation**

The journey from 14 sources to 7 high-quality feeds taught a fundamental lesson about data curation: reliability beats volume. Removing problematic sources like EU-Startups (blocked by Azure's IP ranges), Dagens Industri (too much business, not enough AI), and Tom's Hardware (junk articles) wasn't a failure, it was a necessary refinement. The cross-source validation approach (requiring minimum 2 sources for trending topics) emerged as a solution to filter noise without manual intervention. This reflects a broader principle for enterprise systems: better to monitor fewer sources reliably than many sources with fragile scrapers that require constant maintenance.

**Optimization and Cost Consciousness**

Early optimizations proved their worth repeatedly throughout development. Moving URL deduplication before scraping saved 2 minutes per run, seemingly small, but compounding to hours over development cycles. Similarly, filtering articles under 100 characters before Azure AI analysis prevented wasted API calls that would have cost too much if continued. The compact JSON storage decision (removing indentation) reduced storage by 30-40% with a single line change. The cumulative impact on the monthly budget was significant: staying under 195 SEK through October despite the initial spike of Azure AI Language costs demonstrated that thoughtful engineering reduces operational costs.

**RAG Architecture and Model Selection**

Building the chatbot revealed nuanced insights about retrieval-augmented generation. Entity frequency doesn't equal semantic relevance: an entity mentioned 100 times in boilerplate conference lists provides less value than 5 mentions in substantive articles. Temporal query detection required explicit pattern matching rather than relying on semantic understanding alone. The decision to use GPT-4.1-mini over cheaper alternatives (like Phi-4-mini at 81% lower cost) was validated through actual usage: the quality difference (0.8066 vs 0.4429) justified the expense for user-facing features. The architecture demonstrated that combining free-tier keyword search with a quality language model achieves roughly 90% of semantic search capabilities at a fraction of the cost (0 SEK vs 700+ SEK monthly for Azure AI Search Standard tier).

**AI-Assisted Development Reality**

Working with Claude Sonnet 4.5 (via GitHub Copilot Chat) throughout this project provided quantifiable evidence of AI assistance value. The estimated **80% time savings** (completing in ~1 month what would have taken 5+ months traditionally) came from strategic use: letting AI generate boilerplate, API integrations, test utilities, and CSS implementations, while maintaining human oversight for architecture decisions, domain optimizations, and complex debugging. The workflow evolved: AI generates first draft → human reviews and refactors → iterate together → document why AI suggestions were modified. This isn't about replacing developer judgment; it's about accelerating implementation dramatically. With existing Azure and Streamlit knowledge, the acceleration came primarily from eliminating tedious documentation lookup, automating repetitive code patterns, and rapid iteration on UI/UX refinements. The 30+ utility scripts generated via AI assistance alone would have consumed several days of manual implementation.

**Cost Management Reflections**

The mixed free/paid tier strategy proved sustainable: Azure AI Search free tier still has 90% capacity available at 588 documents, while Azure AI Language Standard tier costs (140-185 SEK monthly during development) are expected to drop to 75-110 SEK in steady-state operation. Developing the chatbot on GitHub Models (free) before eventual Azure AI Foundry migration saved an estimated 470-940 SEK in prototyping costs. The 200 SEK budget alert threshold wasn't breached, validating the cost-conscious architectural decisions. This demonstrates that cloud services can be affordable for learning projects when tiers are chosen deliberately rather than defaulting to paid options.

**Documentation as Development Tool**

Maintaining session-by-session notes (38 documented sessions) and the GitHub Copilot instructions file transformed documentation from a chore into a development asset. These notes became the foundation for this comprehensive report, proving that writing down decisions as you make them is far easier than reconstructing them later. The collection of 30+ utility scripts (`test_*.py`, `debug_*.py`, `remove_*.py`) documents problem-solving approaches in executable form.

**Personal Use Validates Design**

Perhaps the most valuable lesson: building a tool you actually use creates a continuous feedback loop. Using the dashboard daily to check AI news revealed usability issues that wouldn't surface in abstract testing. The weekly newsletter landing in my own inbox forced honest evaluation of content quality and relevance. This personal stake in the product quality drove refinements that transformed a functional prototype into a genuinely useful system.

## 6. Conclusion

This project successfully validates a comprehensive architecture for enterprise-scale trend monitoring systems through a working AI news monitoring implementation. The system demonstrates that automated content collection, NLP analysis, knowledge base indexing, and intelligent user interaction can be implemented cost-effectively using cloud services and modern AI development tools.

### 6.1 Technical Achievement

The proof-of-concept delivers a complete, production-ready pipeline processing 588 indexed articles from 7 active sources with 99% deduplication efficiency. Integration with 5 Azure services (Blob Storage, AI Language, AI Search, Functions, Communication Services) operates at 90-140 SEK monthly in steady-state, well below the 200 SEK budget threshold. The architecture validates enterprise applicability across multiple domains: competitor intelligence, regulatory compliance tracking, market research, and brand management. Current free-tier Azure AI Search capacity (16% utilized at 588 documents) demonstrates clear scalability path supporting thousands of articles before requiring infrastructure upgrades.

### 6.2 AI-Assisted Development Impact

AI-assisted development using Claude Sonnet 4.5 (via GitHub Copilot Chat) proved transformative, achieving an estimated 80% reduction in development time (1 month actual vs. 5+ months traditional estimate). The time savings came primarily from automated boilerplate generation, rapid API integration, and elimination of extensive documentation lookup. This validates AI assistance as production-ready for complex cloud applications, with substantial ROI at enterprise scale: zero additional licensing cost beyond existing IDE subscriptions, while maintaining code quality through human oversight of architecture decisions and domain-specific optimizations.

### 6.3 Enterprise Readiness and Future Path

Current constraints - single-domain focus, English-only processing, manual source curation, keyword-only search, and personal-scale deployment - represent configuration choices rather than architectural limitations. The system's design supports straightforward extension to multi-tenant architecture, configuration-driven source management, semantic search (via paid Azure AI Search tier), and horizontal scaling. The transition from proof-of-concept to enterprise platform requires operational enhancements (monitoring, alerting, admin UI) rather than fundamental redesign, validating the architectural approach for production deployment.

### 6.4 Final Reflection

This project achieves its dual objectives: validating technical architecture for enterprise trend monitoring systems, and demonstrating AI-assisted development effectiveness for complex cloud applications. The combination of cost-conscious engineering (mixed free/paid Azure tiers), incremental implementation (6 phases), and strategic AI assistance produced a system that transitions seamlessly from student project to functional portfolio piece to enterprise proof-of-concept. The experience confirms that choosing personally relevant domains maintains motivation while creating practical value. The resulting tool serves daily as an AI news aggregator, proving its worth beyond academic requirements.

## 7. References

### 7.1 Technologies and Frameworks

**Cloud Services**:
- Microsoft Azure. (2025). *Azure Blob Storage Documentation*. https://learn.microsoft.com/en-us/azure/storage/blobs/
- Microsoft Azure. (2025). *Azure AI Language Documentation*. https://learn.microsoft.com/en-us/azure/ai-services/language-service/
- Microsoft Azure. (2025). *Azure AI Search Documentation*. https://learn.microsoft.com/en-us/azure/search/
- Microsoft Azure. (2025). *Azure Functions Documentation*. https://learn.microsoft.com/en-us/azure/azure-functions/
- Microsoft Azure. (2025). *Azure Communication Services Documentation*. https://learn.microsoft.com/en-us/azure/communication-services/

**Python Libraries**:
- Python Software Foundation. (2025). *Python 3.12 Documentation*. https://docs.python.org/3.12/
- Streamlit Inc. (2025). *Streamlit Documentation*. https://docs.streamlit.io/
- Beautiful Soup. (2025). *Beautiful Soup Documentation*. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- feedparser. (2025). *feedparser Documentation*. https://pythonhosted.org/feedparser/
- Azure SDK for Python. (2025). *azure-storage-blob Documentation*. https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob
- Azure SDK for Python. (2025). *azure-ai-textanalytics Documentation*. https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/textanalytics/azure-ai-textanalytics
- Azure SDK for Python. (2025). *azure-search-documents Documentation*. https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/search/azure-search-documents

**AI Models and APIs**:
- OpenAI. (2025). *GPT-4.1-mini Model Documentation*. https://platform.openai.com/docs/models
- GitHub. (2025). *GitHub Models Documentation*. https://github.com/marketplace/models
- The Guardian. (2025). *The Guardian Open Platform API Documentation*. https://open-platform.theguardian.com/documentation/

### 7.2 Data Sources

**News Publications**:
- The Guardian. (2025). *AI Category*. https://www.theguardian.com/technology/artificialintelligenceai
- TechCrunch. (2025). *AI News*. https://techcrunch.com/category/artificial-intelligence/
- VentureBeat. (2025). *AI Coverage*. https://venturebeat.com/category/ai/
- Ars Technica. (2025). *Artificial Intelligence Tag*. https://arstechnica.com/tag/artificial-intelligence/
- Gizmodo. (2025). *AI Tag*. https://gizmodo.com/tag/ai
- IEEE Spectrum. (2025). *Technology News*. https://spectrum.ieee.org/
- The Register. (2025). *Tech News*. https://www.theregister.com/
- The Verge. (2025). *AI Coverage*. https://www.theverge.com/ai-artificial-intelligence
- EU-Startups. (2025). *European Startup News*. https://www.eu-startups.com/

### 7.3 Tools and Development Environment

**Development Tools**:
- Microsoft. (2025). *Visual Studio Code*. https://code.visualstudio.com/
- GitHub. (2025). *GitHub Copilot*. https://github.com/features/copilot
- Anaconda. (2025). *Conda Documentation*. https://docs.conda.io/

**Version Control**:
- Git. (2025). *Git Documentation*. https://git-scm.com/doc
- GitHub. (2025). *GitHub Documentation*. https://docs.github.com/

**Project Management**:
- GitHub. (2025). *GitHub Issues*. https://github.com/features/issues
- Markdown Guide. (2025). *Markdown Reference*. https://www.markdownguide.org/

---

*This report was prepared as part of coursework at EC Utbildning in October 2025.*