## Reddit GenAI Trend Analysis with ReAct Agent Framework

Author: Amanda Milberg, Principal Solutions Engineer @ TitanML

üéØ **Main Purpose**:
- Analyzes r/technology subreddit posts to identify and summarize GenAI-related content
- Generates professional summaries of AI trends and developments to send to downstream users who want to stay up to date on the latest trends

üîë **Key Components**:
1. Reddit API Integration to scrape relevant posts in a given subreddit (e.g. r/technology)
2. LLM-powered analysis to:
   - Determine GenAI relevance based on the thread title
   - Summarize key themes and content for each article
   - Generate trend analysis summary reports for all the GenAI related articles 

üìä **Process Flow**:
1. Fetches hot posts from r/technology 
2. Filters for GenAI-related content
3. Extracts and summarizes article content
4. Creates comprehensive trend analysis
5. Generates formatted report with sources ready to email to downstream users 

üõ†Ô∏è **Technologies Used**:
- PRAW (Reddit API)
- OpenAI API/Self-hosted LLM
- BeautifulSoup for web scraping
- Markdown for report formatting
- ReAct agent framework

_Note: Requires Reddit API credentials and access to a LLM to function._


## Why Use an Agent Framework?

- Implements the ReAct (Reasoning + Acting) paradigm for more transparent and controlled AI behavior
- Provides explicit thinking and action steps for complex tasks
- Enables better debugging and monitoring of the AI's decision process

üß† **ReAct Framework Benefits**:
1. **Reasoning Transparency**
   - Agent explicitly shows its thinking process before actions
   - Helps track decision-making logic
   - Makes debugging easier

2. **Structured Actions**
   - Clear separation between thinking and execution
   - Each action has defined inputs and outputs
   - Better error handling and recovery

3. **Process Monitoring**
   - Logs each step of the analysis pipeline
   - Tracks success/failure of individual components
   - Maintains history of decisions and actions

_The agent framework transforms what could be a simple script into a more robust, observable, and maintainable system for AI analysis. The agent approach provides better structure, transparency, and reliability for complex AI tasks compared to a simple main function._


# Why Self-Host?

üåü **Key Benefits of Self-Hosting** 

1. **Cost-Effective Performance**
   - Reduced operational costs for high-volume processing
   - No ongoing API fees or usage limits

2. **Privacy & Data Control** 
   - Complete control over data processing and storage
   - No data sharing with external providers
   - Compliance with internal security policies
   - Ability to air-gap for sensitive applications & sensitive data 

3. **Deployment Flexibility**
   - Run locally on your own infrastructure
   - Scale resources based on actual needs


# Why Deep Seek?

1. **Specialized Reasoning Capabilities**
   - Optimized for logical reasoning and analysis tasks
   - Efficient chain-of-thought processing
   - Ideal for structured analytical workflows
2. **Open Source Technology + Self-Hosting Stack = üòç**  
   - Deepseek broke the internet 
   - Firm believer in owning your AI stack 
   - Smaller / specalized models for a given application  

_Note: In this demo we are running a self-hosted [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) deployed on 4xL4 GPUs using the [TitanML's Takeoff Stack](https://docs.titanml.co/). If you want to try this on your own you can pull this repository and swap in an OpenAI model. The code uses OpenAI compatiable endpoints so any model should be able to be swapped in. If you have any questions please reach out to: amanda.milberg@titanml.co_

# Utility Functions in AI Agent Architecture (or the "Doing")

üîß **Service Functions**
Functions that handle specific, specialized tasks like:
- API interactions (init_reddit, init_llm)
- Web scraping (extract_article_content)
- Data parsing & formatting (parse_llm_response)
- LLM analysis (analyze_genai_relevance, summarize_content, create_email_summary)

In [24]:
import praw
import os
from datetime import datetime
from typing import List, Dict, Optional
from openai import OpenAI
from bs4 import BeautifulSoup
import json
import re
import requests
import functools
from IPython.display import display, Markdown, HTML

# --- Helper Functions ---

def retry(func, max_retries=3, delay=5):
    """Retry decorator with exponential backoff."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        for attempt in range(max_retries):
            try:
                return func(*args, **kwargs)
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay} seconds...")
                time.sleep(delay)
                delay *= 2  # Exponential backoff
    return wrapper

def extract_article_content(url: str) -> str:
    """Extract main content from article URL with proper headers and error handling"""
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Connection': 'keep-alive',
        }

        response = requests.get(url, headers=headers, timeout=15)
        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)

        soup = BeautifulSoup(response.text, 'html.parser')
        for script in soup(["script", "style"]):
            script.decompose()
        text = soup.get_text(separator=' ', strip=True)
        return ' '.join(text.split())

    except requests.exceptions.RequestException as e:
        print(f"Error extracting content: {e}")  # More specific error
        return ""
    except Exception as e:
        print(f"Unexpected error extracting content: {e}")
        return ""

def analyze_genai_relevance(llm: OpenAI, title: str) -> dict:
    """Analyze if title is GenAI-related using LLM, returning JSON directly."""

    system_prompt = """You are a helpful AI assistant. Determine if the given article title relates to Generative AI.

    Return a JSON object in the following format:
    {
        "is_genai_related": true/false,
        "relevance_type": "direct/indirect/none",
        "reasoning": "Your reasoning here..."
    }
    """
    try:
        response = llm.chat.completions.create(
            model="meta/llama-3.1-405b-instruct",  # Corrected model name (EXAMPLE!)
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": title}
            ],
            temperature=0.2,
            top_p=0.7,
            max_tokens=200
        )
        response_text = response.choices[0].message.content

        # Extract JSON by finding braces
        start = response_text.find('{')
        end = response_text.rfind('}') + 1  # +1 to include the closing brace
        if start == -1 or end == -1:
            raise json.JSONDecodeError("No valid JSON object found", response_text, 0)
        json_str = response_text[start:end]
        json_data = json.loads(json_str)

    except json.JSONDecodeError as e:
        print(f"Error: Invalid JSON returned from LLM: {response_text}\nError Detail: {e}")
        json_data = {"is_genai_related": False, "relevance_type": "none", "reasoning": ""}
    except Exception as e: # Catch API errors here
        print(f"Error in GenAI relevance: {e}")
        json_data = {"is_genai_related": False, "relevance_type": "none", "reasoning": ""}

    return json_data # Return the dictionary directly

def summarize_trend(llm: OpenAI, title: str, content: str) -> str:
    """Summarize a single trend using LLM with <think> tags."""

    system_prompt = """You are a helpful AI assistant tasked with summarizing 
    technology trends.  Provide a concise summary of the given article content.
    Structure your response with a clear separation between your reasoning 
    process and the final summary.
    
    Use this format:
    
    <think>
    Your step-by-step reasoning process.
    </think>
    
    The final summary of the trend.
    """
    
    user_prompt = f"""Title: {title}\n\nContent:\n{content}"""
    
    try:
        response = llm.chat.completions.create(
            model = "meta/llama-3.1-405b-instruct", ##switch to OpenAI model (e.g. gpt-4) for OpenAI implementation 
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.2,
            top_p=0.7,
            max_tokens=2000
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        print(f"Error in trend summarization: {str(e)}")
        return ""

def create_email_summary(trends_list: List[Dict], llm: OpenAI) -> str:
    """Create a high-level email summary of the identified GenAI trends."""

    if not trends_list:
        return "No GenAI trends were identified in the current batch."

    system_prompt = """You are a helpful AI assistant tasked with creating a high-level 
    email summary of Generative AI trends. Analyze the provided trends and generate 
    a concise summary suitable for an email. Focus on key themes, technologies, 
    and public sentiment.
    """

    # Prepare a summary of each trend for the LLM
    trend_summaries = ""
    for trend in trends_list:
        trend_summaries += f"- **{trend['title']}**: {trend['summary']}\n"

    user_prompt = f"""Analyze the following GenAI trends and provide a high-level email summary:

    {trend_summaries}
    """

    try:
        response = llm.chat.completions.create(
            model="meta/llama-3.1-405b-instruct",  # Or your preferred model
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.2,
            top_p=0.7,
            max_tokens=2000
        )

        llm_response = response.choices[0].message.content

        # Extract thinking and summary (similar to before, but could be adjusted)
        end_think_pos = llm_response.find('</think>')
        thinking_response = llm_response[:end_think_pos]
        summary = llm_response[end_think_pos+9:]
        f_thinking_response = "### Deepseek Reasoning\n\n" + thinking_response + "\n\n---\n\n"

        # Add Further Reading section
        further_reading = "\n\n---\n\n### Further Reading\n\n"
        for trend in trends_list:
            further_reading += f"**{trend['title']}**\n"
            further_reading += f"- Source: {trend['url']}\n\n"

        # Combine AI analysis with Further Reading
        complete_email = f_thinking_response + summary + further_reading

        return display(Markdown(complete_email))

    except Exception as e:
        print(f"Error in trends summarization: {str(e)}")
        return ""

In [25]:
def init_reddit(client_id: str, client_secret: str, user_agent: str) -> praw.Reddit:
    """Initialize and return a Reddit client instance."""
    return praw.Reddit(
        client_id=client_id,
        client_secret=client_secret,
        user_agent=user_agent
    )

def init_llm(api_key: str) -> OpenAI:
    """Initialize and return an OpenAI client instance."""
    return OpenAI(api_key=api_key, base_url="https://api.llm.ngc.nvidia.com/v1")

## Main Agent Logic

In [26]:
def get_reddit_trends(reddit: praw.Reddit, llm: OpenAI) -> List[Dict]:
    """Get and analyze Reddit trends, returning only GenAI-related ones."""
    trends = []
    print("üìä Fetching posts from r/technology...")
    print("üéØ ACTION: Fetching 20 most popular threads:")
    print("=" * 50)
    try:
        for submission in reddit.subreddit("technology").hot(limit=20):
            print(submission.title)

            relevance = analyze_genai_relevance(llm, submission.title)
            print(f"GenAI Relevance: {relevance['is_genai_related']}")

            if relevance['is_genai_related']:
                print("üéØ ACTION: üìñ Reading Article Details at", submission.url)
                print("=" * 50)
                content = extract_article_content(submission.url)

                if content:  # Only proceed if content was extracted
                    summary = summarize_trend(llm, submission.title, content)
                    trends.append({
                        "title": submission.title,
                        "url": submission.url,
                        "summary": summary,
                        "relevance_reasoning": relevance["reasoning"],  # Include reasoning
                    })
                else:
                    print(f"Skipping summarization due to empty content for: {submission.title}")
            else:
                print("=" * 50)

    except Exception as e:
        print(f"Error during trend gathering: {e}")
        return []

    return trends

class RedditAIAnalysisAgent:
    def __init__(self, reddit_creds: dict, openai_api_key: str):
        self.reddit_creds = reddit_creds
        self.openai_api_key = openai_api_key
        self.reddit = None
        self.llm = None
        self.thought_history = []
        print("\nü§ñ Initializing Reddit AI Analysis Agent...\n")

    def think(self, thought: str):
        """Record agent's thinking process"""
        self.thought_history.append({"thought": thought, "timestamp": datetime.now().isoformat()})
        print(f"\nü§î THINKING: {thought}")

    def act(self, action: str, result: any):
        """Record agent's actions and results"""
        self.thought_history.append({
            "action": action,
            "result": result,
            "timestamp": datetime.now().isoformat()
        })
        print(f"üéØ ACTION: {action}")
        print(f"üìù RESULT: {result}\n")
        print("=" * 50)

    def initialize_clients(self) -> bool:
        """Initialize Reddit and LLM clients"""
        try:
            print("\nüì° INITIALIZING CLIENTS...")
            self.think("Need to initialize Reddit and LLM client")

            self.reddit = init_reddit(
                self.reddit_creds['client_id'],
                self.reddit_creds['client_secret'],
                self.reddit_creds['user_agent']
            )
            self.act("Initialize Reddit client", "‚úÖ Reddit client initialized successfully")

            self.llm = init_llm(self.openai_api_key)
            self.act("Initialize LLM client", "‚úÖ LLM client initialized successfully.")

            return True

        except Exception as e:
            self.act("Initialize clients", f"‚ùå Failed: {str(e)}")
            return False

    def analyze_trends(self) -> Optional[Dict]:
        """Get and analyze Reddit trends"""
        try:
            print("\nüîç ANALYZING REDDIT TRENDS...")
            self.think("Fetching Reddit trends for analysis")

            # Get trends (only GenAI-related ones)
            trends = get_reddit_trends(self.reddit, self.llm)

            if not trends:
                self.think("No GenAI trends found in current batch")
                self.act("Analyze trends", "‚ö†Ô∏è No relevant trends found")
                return {
                    "success": True,
                    "timestamp": datetime.now().isoformat(),
                    "analysis": "No GenAI trends found.",
                    "trends": [],
                    "count": 0
                }

            # Log initial processing
            print(f"‚úÖ Summarization complete for {len(trends)} trends")

            self.think(f"Creating high level email summary for overall GenAI trends found")
            analysis = create_email_summary(trends, self.llm)

            # Log completion
            self.act("Create analysis", f"‚úÖ Analysis complete for {len(trends)} trends")

            return {
                "success": True,
                "timestamp": datetime.now().isoformat(),
                "analysis": analysis,
                "trends": trends,
                "count": len(trends),
                "thought_process": self.thought_history
            }

        except Exception as e:
            self.act("Analyze trends", f"‚ùå Failed: {str(e)}")
            return {
                "success": False,
                "error": str(e),
                "timestamp": datetime.now().isoformat(),
                "thought_process": self.thought_history
            }

    def run(self) -> Dict:
        """Main execution flow with ReAct framework"""
        print("\nüöÄ STARTING REDDIT AI TREND ANALYSIS\n")
        print("=" * 50)

        self.think("Starting Reddit AI trend analysis")

        # Initialize clients
        if not self.initialize_clients():
            print("\n‚ùå Failed to initialize clients. Aborting...")
            return {
                "success": False,
                "error": "Failed to initialize clients",
                "timestamp": datetime.now().isoformat(),
                "thought_process": self.thought_history
            }

        # Analyze trends
        result = self.analyze_trends()

        if result["success"]:
            self.think("Analysis complete, final report generated")
            print("\n‚úÖ ANALYSIS COMPLETE")
            print("=" * 50)
            print("\nFinal report has been generated in the response.")
        else:
            print("\n‚ùå Analysis failed. Check error details.")

        return result
    
def main(reddit_creds: dict, openai_api_key: str) -> dict:
    """Main function using ReAct agent"""
    agent = RedditAIAnalysisAgent(reddit_creds, openai_api_key)
    return agent.run()

## Live Demo Example

In [27]:
from dotenv import load_dotenv
import certifi
import os

# Load environment variables from .env file in current directory
load_dotenv()

reddit_creds = {
    "client_id": os.getenv("REDDIT_CLIENT_ID"),
    "client_secret": os.getenv("REDDIT_CLIENT_SECRET"),
    "user_agent": os.getenv("REDDIT_USER_AGENT")
}

nvidia_api_key = os.getenv("NVIDIA_API_KEY")  # Or your OpenAI key

# For debugging, you can print the certifi path:
# print(certifi.where())

result = main(reddit_creds, nvidia_api_key)


ü§ñ Initializing Reddit AI Analysis Agent...


üöÄ STARTING REDDIT AI TREND ANALYSIS


ü§î THINKING: Starting Reddit AI trend analysis

üì° INITIALIZING CLIENTS...

ü§î THINKING: Need to initialize Reddit and LLM client
üéØ ACTION: Initialize Reddit client
üìù RESULT: ‚úÖ Reddit client initialized successfully

üéØ ACTION: Initialize LLM client
üìù RESULT: ‚úÖ LLM client initialized successfully.


üîç ANALYZING REDDIT TRENDS...

ü§î THINKING: Fetching Reddit trends for analysis
üìä Fetching posts from r/technology...
üéØ ACTION: Fetching 20 most popular threads:
Laid-off Meta employees blast Zuckerberg in forums for running the ‚Äòcruelest tech company out there‚Äô
Error in GenAI relevance: 404 page not found
GenAI Relevance: False
Mexico‚Äôs Sheinbaum Threatens to Sue Google Over ‚ÄòGulf of America‚Äô Maps Change
Error in GenAI relevance: 404 page not found
GenAI Relevance: False
Anyone Can Push Updates to the DOGE.gov Website
Error in GenAI relevance: 404 page not foun