# AI News Analyzer

Keeping up with AI news is overwhelming — new models, funding rounds, research breakthroughs and industry shifts happen every single day. This tool automates that process by scraping top AI news websites, extracting the most important stories, and generating a clean structured digest so you can stay informed in minutes instead of hours.

## What this tool does
- Scrapes multiple AI news sources automatically
- Extracts the top 10 most important AI stories
- Generates a clean structured digest with headlines, summaries and sources
- Runs completely locally using Ollama — no API costs, no data leaving your machine

## How to Use
1. Make sure Ollama is running locally on your machine
2. Pull any Ollama supported model of your choice: `ollama pull mistral` or `ollama pull llama3.2` or any other model from [Ollama's model library](https://ollama.com/library)
3. Update the `model` parameter in the last cell to match your chosen model
4. Add or remove news sources in the `news_sources` list
5. Run all cells
6. Read your AI news digest!

## Requirements
- Ollama installed and running locally
- Any Ollama supported model pulled and ready
- Install dependencies: `uv pip install beautifulsoup4 requests openai`

---
*Built as a Day 2 homework extension for the Udemy course: AI Engineer Core Track — LLM Engineering, RAG, QLoRA, Agents by Ed Donner.*

In [None]:
# Install required libraries
!uv pip install beautifulsoup4 requests openai

In [None]:
#Verify Ollama is running locally
import requests
requests.get("http://localhost:11434").content

In [None]:
# Pull your preferred model — you can use any Ollama supported model
!ollama pull mistral

In [None]:
from bs4 import BeautifulSoup          # parses HTML content from scraped web pages
from openai import OpenAI              # connects to Ollama using OpenAI compatible endpoint
from IPython.display import Markdown, display  # renders the news digest as formatted Markdown
from datetime import datetime          # gets today's date for the digest header

In [None]:
# Connect to local Ollama using OpenAI compatible endpoint
OLLAMA_BASE_URL = "http://localhost:11434/v1"

ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

In [None]:
# Scrapes headlines and paragraph text from a given news website URL

def scrape_news(url):
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers, timeout=10)
    soup = BeautifulSoup(response.content, "html.parser")
    for tag in soup(["script", "style", "nav", "footer"]):
        tag.decompose()
    
    # Extract article headlines from h1, h2, h3 tags
    titles = []
    for tag in soup.find_all(['h1', 'h2', 'h3']):
        title = tag.get_text().strip()
        if len(title) > 20:  # filter out short irrelevant headings
            titles.append(title)
    
    # Also get paragraph text as backup
    paragraphs = soup.find_all("p")
    text = " ".join([p.get_text() for p in paragraphs])
    
    # Combine headlines and text
    combined = "ARTICLE HEADLINES:\n" + "\n".join(titles[:30]) + "\n\nARTICLE TEXT:\n" + text[:20000]
    return combined

In [None]:
# Add or remove news sources as needed
news_sources = [
    "https://news.ycombinator.com",
    "https://techcrunch.com/category/artificial-intelligence",
    "https://venturebeat.com/ai"
]

all_news = ""
for url in news_sources:
    print(f"Scraping: {url}")
    try:
        content = scrape_news(url)
        all_news += f"\nSource: {url}\n{content}\n"
        print(f"Done!")
    except Exception as e:
        print(f"Error scraping {url}: {e}")

print("\nAll scraping complete!")

In [None]:
# Get today's date for the digest header
today = datetime.today().strftime("%B %d, %Y")

# System prompt instructs Ollama how to format the news digest
system_prompt = f"""
You are an AI news analyst. Extract the TOP 10 most important AI stories from the content below.
Do not add any introduction or preamble — start directly with the digest.
Every story must be complete — do not include stories without a Summary and Why it matters.
Format your response EXACTLY as shown below with each field (Summary, Why it matters and Source) on its own separate line:

Source should contain only the website name — no URLs.
Each field (Headline, Summary, Why it matters, Source) must be on its own separate line.


TOP AI NEWS TODAY — {today}


1. **[Headline]**\n\n**Summary:** 2-3 sentences explaining what happened\n\n**Why it matters:** one sentence on significance\n\n**Source:** [website name]\n\n---

2. **[Headline]**\n\n**Summary:** ...\n\n**Why it matters:** ...\n\n**Source:** ...\n\n---

Focus only on AI related stories. Ignore unrelated content.
Keep it concise and factual. No emojis.
"""

# Send scraped news to Ollama and display the digest

response = ollama.chat.completions.create(
    model="mistral",       # CHANGE THIS to your preferred Ollama model e.g. llama3.2, mistral, gemma
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Here is today's AI news:\n\n{all_news}"}
    ]
)

display(Markdown(response.choices[0].message.content))