## Kaggle Notebook Structure for RAG-based News Sentiment Analysis.

This script scrapes news headlines and URLs from KLSEScreener,
fetches the content of new articles from their original sources within a defined time window,
analyzes them using the Gemini API for sentiment, summary, and
mentioned companies/symbols, and sends results/errors to Discord webhooks.

### Install and import libraries

In [55]:
# Install google-genai SDK
!pip install -U -q google-genai

# Install web scraping packages
!pip install -U -q beautifulsoup4

# Install misc. packages
!pip install -U -q requests pandas python-dateutil

In [56]:
from google import genai
from google.genai import types

import json
import traceback
import requests
import pandas as pd

# For scraping news articles
from bs4 import BeautifulSoup

# To handle relative URLs
from urllib.parse import urlparse, urljoin

import time
import pytz
import datetime
from datetime import timedelta

import re # For potentially parsing Gemini output
import os # Might still be useful for environment variables

genai.__version__

'1.14.0'

### Set up your API key

In [57]:
# Before running, set up your Gemini API key in Kaggle Secrets
# Example: Accessing a secret named 'GEMINI_API_KEY'
try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    GEMINI_API_KEY = user_secrets.get_secret("GEMINI_API_KEY")    
    DISCORD_WEBHOOK = user_secrets.get_secret("DISCORD_WEBHOOK")
    print("Successfully retrieved Gemini API Key and Discord Webhooks from Kaggle Secrets.")

except Exception as e:
    print(f"Kaggle Secrets not found or API key/Webhooks not set: {e}")
    print("Please ensure the secrets 'GEMINI_API_KEY', 'DISCORD_WEBHOOK' are correctly named and added.")
    
    # Fallback values for local testing
    GEMINI_API_KEY = "YOUR_API_KEY"
    DISCORD_WEBHOOK = "YOUR_DISCORD_WEBHOOK_URL"
    if GEMINI_API_KEY == "YOUR_API_KEY" or DISCORD_WEBHOOK == "YOUR_DISCORD_WEBHOOK_URL":
        print("WARNING: Using placeholder API Key or Webhook URLs. Replace or set up Kaggle Secrets.")

DISCORD_BOT_USERNAME = "Stock News Bot"
DISCORD_MESSAGE_CHAR_LIMIT = 1950 # Slightly less than 2000 to be safe

Successfully retrieved Gemini API Key and Discord Webhooks from Kaggle Secrets.


### Automated retry

In [58]:
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
    genai.models.Models.generate_content = retry.Retry(predicate=is_retriable)(genai.models.Models.generate_content)
    print("Retry logic applied to generate_content.")

### Gemini API Configuration

In [59]:
# Initialize client variable
client = None 
selected_model_name = 'gemini-1.5-flash-001'

# Configure the Gemini client library
try:
    if GEMINI_API_KEY and GEMINI_API_KEY != "YOUR_API_KEY":
        client = genai.Client(api_key=GEMINI_API_KEY)
        print(f"Successfully configured Gemini client to use model '{selected_model_name}'.")
    else:
        print("Gemini API key is missing or seems to be the placeholder.")
        print("Please ensure the 'GEMINI_API_KEY' secret is set correctly in Kaggle.")

except Exception as e:
    print(f"Error configuring Gemini API: {e}")
    client = None

Successfully configured Gemini client to use model 'gemini-1.5-flash-001'.


### Web Scraper Configuration

In [60]:
# --- KLSEScreener Source Configuration ---
KLSESCREENER_NEWS_URL = "https://www.klsescreener.com/v2/news"  # Base URL for fetching news

# --- CSS Selectors for News List Page ---
NEWS_ITEM_SELECTOR = "div.item"          # Element containing one news item
HEADLINE_SELECTOR = "h2.figcaption > a"  # Element for the headline text within a news item
URL_SELECTOR = "h2.figcaption > a"       # Element for the link (<a> tag) within a news item
URL_ATTRIBUTE = "href"                   # Attribute of the link tag that holds the URL

# --- Timestamp Configuration ---
TIMESTAMP_SELECTOR = "span.moment-date"  # Element containing the timestamp
TIMESTAMP_ATTRIBUTE = "data-date"        # Attribute with ISO timestamp (e.g., 2025-05-03T15:41:30+08:00)

# --- Article Content Configuration ---
ARTICLE_CONTENT_SELECTOR = "div.content.text-justify"  # Main article content container
STOCK_TABLE_SELECTOR = "div.stock-list > table"  # Selector for the stock table

# --- Processing Parameters ---
PROCESSING_TIME_WINDOW_HOURS = 1  # Only process articles published in the last hour
REQUESTS_TIMEOUT = 15             # Maximum time (seconds) to wait for HTTP responses

# --- HTTP Request Headers ---
# Custom User-Agent to mimic a browser and avoid being blocked by websites
REQUESTS_HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# --- Gemini Prompt Configuration ---
# Multi-task prompt template for Gemini API
# Parameters:
# - {url}: The article URL
# - {article_text}: The extracted article content
GEMINI_PROMPT_TEMPLATE = """
Analyze the following news article text obtained from the URL {url}. 
Perform these tasks:
1.  **Summary:** Provide a concise 1-2 sentence summary of the main points.
2.  **Sentiment:** Classify the overall sentiment of the article as 'positive', 'negative', or 'neutral'.

Format your response *exactly* like this, with each field on a new line:
Summary: [Your summary here]
Sentiment: [Positive/Negative/Neutral]

Article Text:
{article_text}
"""

### Helper Functions

In [61]:
def send_to_discord(webhook_url, message_content=None, embed=None):
    """
    Sends a message or an embed to a specified Discord webhook URL.

    Args:
        webhook_url (str): The Discord webhook URL.
        message_content (str, optional): The plain text message content. Defaults to None.
        embed (dict, optional): A Discord embed object dictionary. Defaults to None.

    Returns:
        bool: True if the message was sent successfully (status 2xx), False otherwise.

    Raises:
        No exceptions - all errors are caught and handled internally
    """
    # Validate inputs
    if not webhook_url or webhook_url.startswith("YOUR_"):
        print("  Skipping Discord notification: Webhook URL is not configured.")
        return False

    if not message_content and not embed:
        print("  Skipping Discord notification: No content or embed provided.")
        return False

    # Prepare payload
    payload = {}
    if message_content:
        # Discord message length limit is 2000 characters
        payload['content'] = message_content[:2000]

    if embed:
        payload['embeds'] = [embed]

    try:
        # Send the request
        response = requests.post(
            webhook_url,
            json=payload,
            headers={'Content-Type': 'application/json'},
            timeout=REQUESTS_TIMEOUT
        )
        response.raise_for_status() # Raises HTTPError for bad responses (4xx or 5xx)
        print(f"  Successfully sent notification to Discord (Status: {response.status_code}).")
        return True

    except requests.exceptions.RequestException as e:
        print(f"  Error sending notification to Discord: {type(e).__name__}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"  Response status code: {e.response.status_code}")
            print(f"  Response text: {e.response.text}")
        return False

    except Exception as e:
        print(f"  An unexpected error occurred while sending to Discord: {type(e).__name__} - {str(e)}")
        return False

def scrape_klsescreener_news(base_url):
    """
    Scrapes the KLSEScreener news list page for headlines, absolute URLs,
    and timestamps.

    Args:
        base_url (str): The URL of the KLSEScreener news list page.

    Returns:
        list: A list of dictionaries, each containing 'headline', 'url' (absolute),
              and 'timestamp' (datetime object), or an empty list if scraping fails.
              Returns None for timestamp if parsing fails for an item.
    """
    news_list = []
    print(f"Scraping news list from: {base_url}")
    
    try:
        # Fetch the page
        response = requests.get(base_url, headers=REQUESTS_HEADERS, timeout=REQUESTS_TIMEOUT)
        response.raise_for_status()

        # Parse the HTML
        soup = BeautifulSoup(response.text, 'html.parser')
        news_items = soup.select(NEWS_ITEM_SELECTOR)
        print(f"Found {len(news_items)} potential news items.")

        # Extract data from each news item
        for item in news_items:
            # Get elements containing our data
            headline_tag = item.select_one(HEADLINE_SELECTOR)
            url_tag = item.select_one(URL_SELECTOR)
            timestamp_tag = item.select_one(TIMESTAMP_SELECTOR)

            # Initialize variables
            headline = None
            absolute_url = None
            timestamp_dt = None

            # Extract headline
            if headline_tag:
                headline = headline_tag.get_text(strip=True)
            else:
                print("  Missing headline tag in news item. Skipping...")
                continue

            # Extract and normalize URL
            if url_tag and URL_ATTRIBUTE in url_tag.attrs:
                relative_url = url_tag[URL_ATTRIBUTE]
                absolute_url = urljoin(base_url, relative_url)
            else:
                print("  Missing URL tag in news item. Skipping...")
                continue

            # Extract and parse timestamp
            if timestamp_tag and TIMESTAMP_ATTRIBUTE in timestamp_tag.attrs:
                timestamp_str = timestamp_tag[TIMESTAMP_ATTRIBUTE]
                try:
                    # Parse ISO 8601 format (e.g., 'YYYY-MM-DDTHH:MM:SS+HH:MM')
                    timestamp_dt = datetime.datetime.fromisoformat(timestamp_str).replace(tzinfo=None)
                except ValueError:
                    # Fallback: Standard format ('YYYY-MM-DD HH:MM:SS')
                    try:
                        timestamp_dt = datetime.datetime.strptime(timestamp_str, "%Y-%m-%d %H:%M:%S")
                        print(f"  Info: Parsed timestamp '{timestamp_str}' using fallback format '%Y-%m-%d %H:%M:%S'.")
                    except ValueError:
                        # If both formats fail, log a warning
                        print(f"  Warning: Could not parse timestamp '{timestamp_str}' using ISO or fallback format. Skipping timestamp for this news item.")
                        timestamp_dt = None
            else:
                # Log missing timestamp
                print(f"  Missing timestamp for news item: {headline}")
                timestamp_dt = None

            # Add the item to our results list
            if headline and absolute_url:
                news_list.append({
                    'headline': headline,
                    'url': absolute_url,
                    'timestamp': timestamp_dt
                })

    except requests.exceptions.RequestException as e:
        error_message = f"Failed to fetch KLSEScreener news page {base_url}. Error: {str(e)}"
        print(error_message)
        # send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return []
        
    except Exception as e:
        error_message = f"Failed to parse KLSEScreener news page {base_url}. Error: {str(e)}"
        print(error_message)
        # send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return []

    print(f"Successfully scraped {len(news_list)} news items.")

    # Remove duplicates (based on URL)
    unique_news = []
    seen_urls = set()
    for item in news_list:
        if item['url'] not in seen_urls:
            unique_news.append(item)
            seen_urls.add(item['url'])
    
    # Report duplicates if any were found
    duplicates = len(news_list) - len(unique_news)
    if duplicates > 0:
        print(f"  Removed {duplicates} duplicate URLs from results")
        
    return unique_news


def fetch_and_parse_article(url):
    """
    Fetches an article page and parses its main content using BeautifulSoup.

    Args:
        url (str): The URL of the article to fetch
        
    Returns:
        tuple: (article_text, article_domain) or (None, article_domain) if failed
    """
    print(f"  Fetching article: {url}")
    article_domain = urlparse(url).netloc
    stock_table_data = None
    
    try:
        # Fetch the article page
        response = requests.get(url, headers=REQUESTS_HEADERS, timeout=REQUESTS_TIMEOUT)
        response.raise_for_status()

        # Parse the HTML
        soup = BeautifulSoup(response.text, 'html.parser')        
        content_container = soup.select_one(ARTICLE_CONTENT_SELECTOR)

        # Extract and validate content
        if content_container:
            article_text = content_container.get_text(separator=' ', strip=True)
            if article_text:
                 print(f"  Successfully extracted text using selector '{ARTICLE_CONTENT_SELECTOR}'.")
            else:
                 print(f"  Warning: Found container '{ARTICLE_CONTENT_SELECTOR}' but it contained no text.")
                 return None, article_domain, None
        else:
            print(f"  Error: Could not find the article content container using selector '{ARTICLE_CONTENT_SELECTOR}'.")
            print(f"  Please inspect the HTML of {url} and update ARTICLE_CONTENT_SELECTOR.")
            return None, article_domain, None

        # Extract stock table data (if it exists)
        table = soup.select_one(STOCK_TABLE_SELECTOR)
        if table:
            stock_table_data = []
            for row in table.find_all('tr'): 
                cols = row.find_all('td')
                if len(cols) == 2:
                    symbol = cols[0].find('a').text.strip() if cols[0].find('a') else cols[0].text.strip()
                    price_text = cols[1].text.strip().replace(',', '')
                    try:
                        price = float(price_text)
                        stock_table_data.append({'Symbol': symbol, 'Price': f"{price:.3f}"})
                    except ValueError:
                        print(f"   Warning: Could not parse price '{price_text}' for symbol '{symbol}'. Skipping row.")
        else:
            print(f"  Info: No stock table found using selector '{STOCK_TABLE_SELECTOR}'.")

        return article_text, article_domain, stock_table_data
    
    except requests.exceptions.RequestException as e:
        error_message = f"  Network error fetching article {url}. Error: {str(e)}"
        print(error_message)
        # send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return None, article_domain, None
        
    except Exception as e:
        error_message = f"  Unexpected error parsing article page {url}. Error: {str(e)}"
        print(error_message)
        # send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return None, article_domain, None


def analyze_text_with_gemini(client, selected_model_name, text, url):
    """
    Sends text to Gemini API for analysis (summary, sentiment, companies, symbols).
    
    Args:
        client: Initialized Gemini API client
        selected_model_name (str): Name of the Gemini model to use
        text (str): Article text to analyze
        url (str): URL of the article being analyzed
        
    Returns:
        dict: Analysis results with keys 'Summary', 'Sentiment', 'Companies', 'Symbols'
              or error information if analysis fails
    """
    # Define default error result
    error_result = {
        'Summary': 'Parsing Failed', 'Sentiment': 'Parsing Failed'
    }

    # Validate inputs
    if client is None:
        print("  Skipping Gemini analysis: Client not configured (check API key setup).")
        return None
    if not selected_model_name:
        print("  Skipping Gemini analysis: Model name not specified.")
        return None
    if not text:
        print("  Skipping Gemini analysis: No text provided.")
        return None
    if 'GEMINI_PROMPT_TEMPLATE' not in globals():
        print("  Error: GEMINI_PROMPT_TEMPLATE is not defined.")
        return None

    # Create prompt and sent to Gemini
    prompt = GEMINI_PROMPT_TEMPLATE.format(article_text=text, url=url)
    print(f"  Sending text to Gemini model '{selected_model_name}' for analysis.")

    try:
        # Call Gemini API
        response = client.models.generate_content(
            model=selected_model_name,
            contents=prompt
            # Optional: Add safety_settings and generation_config if needed
        )

        # Extract raw response text
        raw_response_text = ""
        if hasattr(response, 'text'):
            raw_response_text = response.text
        elif response.candidates and response.candidates[0].finish_reason != 'STOP':
            reason = response.candidates[0].finish_reason
            print(f"  Warning: Gemini generation finished due to {reason}, not STOP.")
            raw_response_text = f"Generation stopped: {reason}"
        elif not response.parts:
             print(f"  Warning: Gemini response has no 'text' or 'parts'. Response: {response}")
             raw_response_text = "Empty or unexpected response structure."
        else:
             # Fallback for alternative response structure
             try:
                 raw_response_text = "".join(part.text for part in response.parts)
             except Exception as e:
                 print(f"  Warning: Could not extract text from response parts. Response: {response}")
                 raw_response_text = "Could not extract text from response."

        if not raw_response_text or "Parsing Failed" in raw_response_text or "Generation stopped" in raw_response_text or "Empty or unexpected" in raw_response_text or "Could not extract text" in raw_response_text:
             print(f"  Skipping parsing: Gemini response indicates failure or is empty.'")
             error_result['Summary'] = raw_response_text[:1000] # Include partial response as summary
             return error_result

        # Parse the response
        analysis = {
            'Summary': 'Parsing Failed', 'Sentiment': 'Parsing Failed'
        }

        # Process response line by line
        lines = raw_response_text.strip().split('\n')
        current_key = None
        
        for line in lines:
            line = line.strip()
            if not line: 
                continue

            # Check for key:value lines
            parts = line.split(':', 1)
            if len(parts) == 2:
                key = parts[0].strip().lower()
                value = parts[1].strip()

                # Map keys to our result dictionary
                if 'summary' in key: 
                    analysis['Summary'] = value
                    current_key = 'Summary'
                elif 'sentiment' in key: 
                    analysis['Sentiment'] = value.lower()
                    current_key = 'Sentiment'
                else:
                    if current_key and current_key in analysis: 
                        analysis[current_key] += " " + line
                    else: 
                        print(f"  Warning: Unrecognized line in Gemini response: {line}")
            elif current_key and current_key in analysis: 
                # Continuation of previous key's value
                analysis[current_key] += " " + line
            else: 
                print(f"  Warning: Unrecognized line format in Gemini response: {line}")

        # Validate sentiment value
        valid_sentiments = ['positive', 'negative', 'neutral', 'parsing failed']
        if analysis['Sentiment'] not in valid_sentiments:
            found_sentiment = None

            # Check for sentiment words within the text
            for s in ['positive', 'negative', 'neutral']:
                if s in analysis['Sentiment']: 
                    found_sentiment = s
                    break
                    
            if found_sentiment:
                 print(f"  Warning: Extracted partial sentiment '{found_sentiment}' from '{analysis['Sentiment']}'.")
                 analysis['Sentiment'] = found_sentiment
            else:
                 print(f"  Warning: Unexpected sentiment value '{analysis['Sentiment']}'. Setting to 'unknown'.")
                 analysis['Sentiment'] = 'unknown'

        # Check for parsing failures
        if 'Parsing Failed' in analysis.values():
            print(f"  Warning: Some fields failed parsing from Gemini response for {url}.")
        else: 
            print(f"  Successfully parsed all fields from Gemini response")

        return analysis

    except Exception as e:
        error_message = f"  Error calling Gemini API or processing response. Error: {str(e)}"
        print(error_message)
        # send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return {'Summary': error_summary, 'Sentiment': 'Error'}
        

### Main Processing Logic

In [62]:
# --- Main Processing Logic ---
def main(client, selected_model_name):
    """
    Main function to orchestrate the news scraping, filtering, analysis, and notification workflow.
    
    Args:
        client (genai.Client): The configured Gemini API client 
        selected_model_name (str): The Gemini model to use for analysis
        
    Returns:
        None: Results are sent to Discord and logged to console
    """
    # Check for Gemini Client
    if not client:
        error_message = "Gemini Client is not configured."
        print(error_message)
        send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return
        
    print("\n--- Starting News Analysis Run ---")
    
    # Setup timezone and timing
    local_timezone = pytz.timezone('Asia/Singapore')
    start_time = datetime.datetime.now().astimezone(local_timezone)
    print(f"Run started at: {start_time.strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Processing articles from the last {PROCESSING_TIME_WINDOW_HOURS} hour(s).")

    # Calculate the cutoff time for filtering articles
    cutoff_time = start_time - timedelta(hours=PROCESSING_TIME_WINDOW_HOURS)
    print(f"Processing news published after: {cutoff_time.strftime('%Y-%m-%d %H:%M:%S')}")

    # -------------------------------------------------------------------
    # STEP 1: SCRAPE KLSESCREENER NEWS FEED
    # -------------------------------------------------------------------
    latest_news = scrape_klsescreener_news(KLSESCREENER_NEWS_URL)

    if not latest_news:
        error_message = "News scraping run completed: No articles found or scraping failed."
        print(error_message)
        send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return

    # -------------------------------------------------------------------
    # STEP 2: FILTER ARTICLES BY TIME WINDOW
    # -------------------------------------------------------------------
    articles_to_process = []
    for item in latest_news:
        if item.get('timestamp'):
            # Ensure the timestamp is timezone-aware before comparison
            # If the timestamp is naive, attach the local timezone
            item_timestamp = item['timestamp']
            if not item_timestamp.tzinfo:
                try:
                    item_timestamp = local_timezone.localize(item_timestamp)
                except (ValueError, AttributeError) as e:
                    print(f"  Error localizing timestamp for article '{item.get('headline', 'N/A')}': {str(e)}")
                    continue
                
            # Now compare the timezone-aware timestamps
            if item_timestamp >= cutoff_time:
                articles_to_process.append(item)
                
        elif not item.get('timestamp'):
            print(f"  Skipping article (no timestamp): {item.get('headline', 'N/A')}")

    print(f"Found {len(articles_to_process)} articles within the time window out of {len(latest_news)} scraped.")

    if not articles_to_process:
        error_message = f"News filtering run completed: No new articles found after {cutoff_time.strftime('%Y-%m-%d %H:%M:%S')}."
        print(error_message)
        send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
        return

    # -------------------------------------------------------------------
    # STEP 3: PROCESS EACH ARTICLE
    # -------------------------------------------------------------------
    processed_count = 0
    error_count = 0
    
    for item in articles_to_process:
        headline = item.get('headline', 'N/A')
        url = str(item.get('url', ''))
        timestamp = item.get('timestamp')
        timestamp_str = timestamp.strftime("%Y-%m-%d %H:%M:%S") if timestamp else "N/A"
        
        if not url:
            error_message = f"Skipped item with missing URL. Headline: {headline}"
            print(error_message)
            send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
            error_count += 1
            continue # Move to next article

        print(f"\n  Processing: '{headline}'")
        print(f"  Published: {timestamp_str}")

        # Fetch and parse article content
        article_text, article_domain, stock_table_data = fetch_and_parse_article(url)

        if article_text is None:
            error_message = f"Failed to fetch or parse content.\nURL: {url}\nHeadline: {headline}"
            print(error_message)
            send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
            error_count += 1
            continue # Move to next article

        # Analyze with Gemini      
        analysis_result = analyze_text_with_gemini(client, selected_model_name, article_text, url)

        if analysis_result is None or analysis_result.get('Sentiment') == 'Error':
             # Handle case where analysis failed (either None or explicit error)
             error_detail = analysis_result.get('Summary', 'Analysis function returned None') if analysis_result else 'Analysis function returned None'
             error_message = f"Gemini analysis failed.\nURL: {url}\nHeadline: {headline}\nError: {error_detail}"
             print(error_message)
             send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
             error_count += 1
             continue # Move to next article
            
        if 'Parsing Failed' in analysis_result.values():
             # Handle case where analysis ran but parsing failed for some fields
             error_message = f"Gemini analysis ran but parsing failed for some fields.\nURL: {url}\nHeadline: {headline}"
             print(error_message)
             send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
             error_count += 1
             continue # Move to next article

        # Send successful result to Discord
        print("  Successfully processed. Sending to Discord results channel.")

        # Format message for Discord (using embeds for better formatting)
        sentiment_color = {
            "positive": 0x00FF00, # Green
            "negative": 0xFF0000, # Red
            "neutral": 0x808080,  # Grey
            "unknown": 0xFFA500,  # Orange
            "parsing failed": 0xFFA500 # Orange
        }.get(analysis_result.get('Sentiment', 'unknown').lower(), 0x808080) # Default to grey

        # Change embed color to blue for general news 
        embed_color = sentiment_color
        if not stock_table_data:
            embed_color = 0x0000FF  # Blue if no stock table 
        
        embed = {
            "title": headline[:250], # Embed title limit
            "url": url,
            "color": embed_color,
            "fields": [
                {"name": "Sentiment", "value": analysis_result.get('Sentiment', 'N/A').capitalize(), "inline": True},
                {"name": "Published", "value": timestamp_str, "inline": True},
                {"name": "Summary", "value": analysis_result.get('Summary', 'N/A')[:1000], "inline": False}
            ],
            "footer": {"text": f"Source: {article_domain or urlparse(url).netloc}"}
        }

        # Add stock table data to the embed if it exists
        if stock_table_data:
            stock_table_str = "\n".join([f"{item['Symbol']}: {item['Price']}" for item in stock_table_data])
            embed["fields"].append({"name": "Related Stocks", "value": stock_table_str[:1000], "inline": False})

        if send_to_discord(DISCORD_WEBHOOK, embed=embed):
            processed_count += 1
        else:
            # If sending the result failed, log it as an error
            error_message = f"Failed to send successful analysis result to Discord.\nURL: {url}\nHeadline: {headline}"
            print(error_message)
            send_to_discord(DISCORD_WEBHOOK, message_content=error_message)
            error_count += 1        

    # -------------------------------------------------------------------
    # STEP 4: GENERATE AND SEND SUMMARY
    # -------------------------------------------------------------------
    end_time = datetime.datetime.now().astimezone(local_timezone)
    duration = end_time - start_time

    # Print summary
    summary_message = (
        f"\n--- News Analysis Run Complete ---\n"
        f"Run finished at: {end_time.strftime('%Y-%m-%d %H:%M:%S')}\n"
        f"Duration: {duration}\n"
        f"Successfully processed {processed_count} articles.\n"
        f"Encountered {error_count} errors.\n"
    )

    # Send to results channel or a dedicated status channel
    print(summary_message)
    # send_to_discord(DISCORD_WEBHOOK, message_content=summary_message)


### Script Entry Point

In [63]:
if __name__ == "__main__":
    main(client, selected_model_name)


--- Starting News Analysis Run ---
Run started at: 2025-05-09 10:01:05
Processing articles from the last 1 hour(s).
Processing news published after: 2025-05-09 09:01:05
Scraping news list from: https://www.klsescreener.com/v2/news
Found 20 potential news items.
Successfully scraped 20 news items.
Found 8 articles within the time window out of 20 scraped.

  Processing: '获7.4亿乙烯合约　乐天化学挤入上升榜前三'
  Published: 2025-05-09 09:46:00
  Fetching article: https://www.klsescreener.com/v2/news/view/1519589/%E8%8E%B77-4%E4%BA%BF%E4%B9%99%E7%83%AF%E5%90%88%E7%BA%A6-%E4%B9%90%E5%A4%A9%E5%8C%96%E5%AD%A6%E6%8C%A4%E5%85%A5%E4%B8%8A%E5%8D%87%E6%A6%9C%E5%89%8D%E4%B8%89
  Successfully extracted text using selector 'div.content.text-justify'.
  Info: No stock table found using selector 'div.stock-list > table'.
  Sending text to Gemini model 'gemini-1.5-flash-001' for analysis.
  Successfully parsed all fields from Gemini response
  Successfully processed. Sending to Discord results channel.
  Successfully 