## Reddit GenAI Trend Analysis with ReAct Agent Framework

Author: Amanda Milberg, Principal Solutions Engineer @ TitanML

🎯 **Main Purpose**:
- Analyzes r/technology subreddit posts to identify and summarize GenAI-related content
- Generates professional summaries of AI trends and developments to send to downstream users who want to stay up to date on the latest trends

🔑 **Key Components**:
1. Reddit API Integration to scrape relevant posts in a given subreddit (e.g. r/technology)
2. LLM-powered analysis to:
   - Determine GenAI relevance based on the thread title
   - Summarize key themes and content for each article
   - Generate trend analysis summary reports for all the GenAI related articles 

📊 **Process Flow**:
1. Fetches hot posts from r/technology 
2. Filters for GenAI-related content
3. Extracts and summarizes article content
4. Creates comprehensive trend analysis
5. Generates formatted report with sources ready to email to downstream users 

🛠️ **Technologies Used**:
- PRAW (Reddit API)
- OpenAI API/Self-hosted LLM
- BeautifulSoup for web scraping
- Markdown for report formatting
- ReAct agent framework

_Note: Requires Reddit API credentials and access to a LLM to function._


## Why Use an Agent Framework?

- Implements the ReAct (Reasoning + Acting) paradigm for more transparent and controlled AI behavior
- Provides explicit thinking and action steps for complex tasks
- Enables better debugging and monitoring of the AI's decision process

🧠 **ReAct Framework Benefits**:
1. **Reasoning Transparency**
   - Agent explicitly shows its thinking process before actions
   - Helps track decision-making logic
   - Makes debugging easier

2. **Structured Actions**
   - Clear separation between thinking and execution
   - Each action has defined inputs and outputs
   - Better error handling and recovery

3. **Process Monitoring**
   - Logs each step of the analysis pipeline
   - Tracks success/failure of individual components
   - Maintains history of decisions and actions

_The agent framework transforms what could be a simple script into a more robust, observable, and maintainable system for AI analysis. The agent approach provides better structure, transparency, and reliability for complex AI tasks compared to a simple main function._


# Why Self-Host?

🌟 **Key Benefits of Self-Hosting** 

1. **Cost-Effective Performance**
   - Reduced operational costs for high-volume processing
   - No ongoing API fees or usage limits

2. **Privacy & Data Control** 
   - Complete control over data processing and storage
   - No data sharing with external providers
   - Compliance with internal security policies
   - Ability to air-gap for sensitive applications & sensitive data 

3. **Deployment Flexibility**
   - Run locally on your own infrastructure
   - Scale resources based on actual needs


# Why Deep Seek?

1. **Specialized Reasoning Capabilities**
   - Optimized for logical reasoning and analysis tasks
   - Efficient chain-of-thought processing
   - Ideal for structured analytical workflows
2. **Open Source Technology + Self-Hosting Stack = 😍**  
   - Deepseek broke the internet 
   - Firm believer in owning your AI stack 
   - Smaller / specalized models for a given application  

_Note: In this demo we are running a self-hosted [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) deployed on 4xL4 GPUs using the [TitanML's Takeoff Stack](https://docs.titanml.co/). If you want to try this on your own you can pull this repository and swap in an OpenAI model. The code uses OpenAI compatiable endpoints so any model should be able to be swapped in. If you have any questions please reach out to: amanda.milberg@titanml.co_

# Utility Functions in AI Agent Architecture (or the "Doing")

🔧 **Service Functions**
Functions that handle specific, specialized tasks like:
- API interactions (init_reddit, init_llm)
- Web scraping (extract_article_content)
- Data parsing & formatting (parse_llm_response)
- LLM analysis (analyze_genai_relevance, summarize_content, create_email_summary)

In [2]:
import praw
import os
from datetime import datetime
from typing import List, Dict, Optional
from openai import OpenAI
from bs4 import BeautifulSoup
import json 
import re
import requests
from IPython.display import display, Markdown, HTML


def init_reddit(client_id: str, client_secret: str, user_agent: str) -> praw.Reddit:
    """Initialize Reddit API client"""
    return praw.Reddit(
        client_id=client_id,
        client_secret=client_secret,
        user_agent=user_agent
    )

def init_llm(api_key: str) -> OpenAI:
    ## For practice at home, you can sub the self-hosted LLM for openAI LLM
    """Initialize OpenAI LLM Note: Need access to OpenAI Key
    os.environ['OPENAI_API_KEY'] = api_key
    client = OpenAI(temperature=0.7)
    """
    ## In our demo we will use a self-hosted LLM 
    client = OpenAI(
    base_url="http://rag-demo:3003/v1",
    api_key="not needed"
    )

    return client


def extract_article_content(url: str) -> str:
    """Extract main content from article URL with proper headers"""
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Connection': 'keep-alive',
        }
        
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()  # Raise exception for bad status codes
        
        soup = BeautifulSoup(response.text, 'html.parser')
        for script in soup(["script", "style"]):
            script.decompose()
        text = soup.get_text(separator=' ', strip=True)
        return ' '.join(text.split())
    except Exception as e:
        print(f"Error extracting content: {str(e)}")
        return ""

def analyze_genai_relevance(llm: OpenAI, title: str) -> dict:
    """Analyze if title is GenAI-related using LLM"""

    system_prompt = """You are a helpful AI assistant. Based on the title 
    of the article provide a suggestion if this content relates to Generative AI:
    
    Return JSON:
        {{
            "is_genai_related": true/false,
            "relevance_type": "direct/indirect/none",
        }}"""    
    try:
        response = llm.chat.completions.create(
            model = "internvl", ##switch to OpenAI model (e.g. gpt-4) for OpenAI implementation 
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": title}
            ],
            max_tokens = 2000
        )
        
        # Extract the response content
        response_dict = parse_llm_response(response.choices[0].text)
        return response_dict
        
    except Exception as e:
        print(f"Error in GenAI relevance: {str(e)}")
        return ""
    
def parse_llm_response(response_text: str) -> dict:
    """
    Parse LLM response to separate thinking process and JSON response from 
    analyze_genai_relevance()
    """
    # Pattern for think tags
    think_pattern = r'<think>(.*?)</think>'
    
    # Pattern for JSON (anything between triple backticks and json)
    json_pattern = r'```json\n(.*?)```'
    
    # Extract thinking process
    thinking = re.search(think_pattern, response_text, re.DOTALL)
    thinking = thinking.group(1).strip() if thinking else ""
    
    # Extract JSON response
    json_match = re.search(json_pattern, response_text, re.DOTALL)
    json_str = json_match.group(1).strip() if json_match else "{}"
    json_data = json.loads(json_str)
    
    return {
        "thinking": thinking,
        "response": json_data
    }


def summarize_content(llm: OpenAI, content: str) -> str:
    """
    Summarize input text using the chat completions model directly
    """
    system_prompt = """You are a helpful AI assistant. Given a piece of text, analyze its content and provide a concise summary.
    Focus on extracting key information and main ideas.
    If the text contains technical terms, explain them in simple language.
    Format your response in a clear, organized manner."""
    
    try:
        response = llm.chat.completions.create(
            model = "internvl", ##switch to OpenAI model (e.g. gpt-4) for OpenAI implementation
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": content}
            ],
            max_tokens = 2000
        )
        
        # Parse the response content
        response_summary_dict = parse_llm_summary(response.choices[0].text)

        return response_summary_dict
        
    except Exception as e:
        print(f"Error in summarization: {str(e)}")
        return ""

def parse_llm_summary(response_text: str) -> dict:
    """
    Parse LLM response to separate thinking process and summary after 
    summarize_content()
    """
    # Pattern for think tags
    think_pattern = r'<think>(.*?)</think>'
    
    # Extract thinking process (everything between think tags)
    thinking = re.search(think_pattern, response_text, re.DOTALL)
    thinking = thinking.group(1).strip() if thinking else ""
    
    # Get summary (everything after </think>)
    summary = re.split(r'</think>\s*', response_text)[-1].strip()
    
    return {
        "thinking": thinking,
        "summary": summary
    }


def get_reddit_trends(reddit: praw.Reddit, llm: OpenAI, limit: int = 10) -> List[Dict]:
    """Fetch and analyze Reddit trends"""
    trends = []
    print(f"🎯 ACTION: Fetching {limit} most popular threads:")
    print("=" * 50)
    for submission in reddit.subreddit('technology').hot(limit=limit):
        content = extract_article_content(submission.url) or submission.selftext
        print(submission.title)
        relevance = analyze_genai_relevance(llm, submission.title)
        print(f"GenAI Relevance: {relevance['response']['is_genai_related']}")
        if relevance['response']['is_genai_related']:
            print(f"🎯 ACTION: 📖 Reading Article Details at {submission.url}")
            llm_summary = summarize_content(llm, content) if content else None
            trends.append({
                'title': submission.title,
                'subreddit': submission.subreddit.display_name,
                'score': submission.score,
                'comments': submission.num_comments,
                'url': submission.url,
                'relevance': relevance,
                'summary': llm_summary['summary']
            })
        print("=" * 50)
    return trends


def create_email_summary(trends_list: list, llm: OpenAI) -> str:
    """
    Create an email-style summary from a structured trends dictionary
    """
    # First, let's format the trends data into a more digestible format for the model
    formatted_input = "Recent AI Trends Analysis:\n\n"
    for trend in trends_list:
        formatted_input += f"Title: {trend['title']}\n"
        formatted_input += f"Engagement: {trend['score']} points, {trend['comments']} comments\n"
        formatted_input += f"Summary: {trend['summary']}\n\n"

    system_prompt = """You are an AI analyst creating clear, professional  summaries of AI news and trends. 
    Analyze the provided structured data about AI trends and create a well-organized summary that covers:

    1. Main Technologies Discussed
    - Extract and categorize key AI technologies mentioned across all trends
    - Focus on technical implementations and capabilities

    2. Key Trends
    - Synthesize patterns across all articles
    - Identify emerging themes and industry movements
    - Include relevant metrics and engagement data

    3. Public Sentiment
    - Analyze reactions based on comments and scoring
    - Note any controversial or highly-engaged topics
    - Identify areas of public concern or interest

    4. Notable Developments
    - Highlight significant announcements or findings
    - Include specific numbers, statistics, or metrics
    - Note any regulatory or policy changes

    Format your response as a professional summary with clear headers and bullet points.
    Use engagement metrics (score and comments) to help gauge importance of different topics."""
    try:
        response = llm.chat.completions.create(
            model = "internvl", ##switch to OpenAI model (e.g. gpt-4) for OpenAI implementation
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": formatted_input}
            ],
            max_tokens = 2000
        )
        llm_response = response.choices[0].text

        # Split Thinking
        end_think_pos = llm_response.find('</think>')
        thinking_response = llm_response[:end_think_pos]
        summary = llm_response[end_think_pos+9:]
        f_thinking_response = "### Deepseek Reasoning\n\n" + thinking_response + "\n\n---\n\n"

        
        # Add Further Reading section
        further_reading = "\n\n---\n\n### Further Reading\n\n"
        for trend in trends_list:
            further_reading += f"**{trend['title']}**\n"
            further_reading += f"- Source: {trend['url']}\n\n"

        # Combine AI analysis with Further Reading
        complete_email = f_thinking_response + summary + further_reading
        
        return display(Markdown(complete_email))
    
    except Exception as e:
        print(f"Error in trends summarization: {str(e)}")
        return ""

## AI Agent (the Orchestrator)

In [3]:
class RedditAIAnalysisAgent:
    def __init__(self, reddit_creds: dict, openai_api_key: str):
        self.reddit_creds = reddit_creds
        self.openai_api_key = openai_api_key
        self.reddit = None
        self.llm = None
        self.thought_history = []
        print("\n🤖 Initializing Reddit AI Analysis Agent...\n")
        
    def think(self, thought: str):
        """Record agent's thinking process"""
        self.thought_history.append({"thought": thought, "timestamp": datetime.now().isoformat()})
        print(f"\n🤔 THINKING: {thought}")
        
    def act(self, action: str, result: any):
        """Record agent's actions and results"""
        self.thought_history.append({
            "action": action,
            "result": result,
            "timestamp": datetime.now().isoformat()
        })
        print(f"🎯 ACTION: {action}")
        print(f"📝 RESULT: {result}\n")
        print("=" * 50)

    def initialize_clients(self) -> bool:
        """Initialize Reddit and LLM clients"""
        try:
            print("\n📡 INITIALIZING CLIENTS...")
            self.think("Need to initialize Reddit and LLM client")
            
            self.reddit = init_reddit(
                self.reddit_creds['client_id'],
                self.reddit_creds['client_secret'],
                self.reddit_creds['user_agent']
            )
            self.act("Initialize Reddit client", "✅ Reddit client initialized successfully")
            
            self.llm = init_llm(self.openai_api_key)
            self.act("Initialize LLM client", "✅ LLM client initialized successfully. DeepSeek-R1-Distill-Llama-8B running on 4xL4 Machine")
            
            return True
            
        except Exception as e:
            self.act("Initialize clients", f"❌ Failed: {str(e)}")
            return False

    def analyze_trends(self) -> Optional[Dict]:
        """Get and analyze Reddit trends"""
        try:
            print("\n🔍 ANALYZING REDDIT TRENDS...")
            self.think("Fetching Reddit trends for analysis")
            
            # Get trends
            print("\n📊 Fetching posts from r/technology...")
            trends = get_reddit_trends(self.reddit, self.llm)
            
            if not trends:
                self.think("No GenAI trends found in current batch")
                self.act("Analyze trends", "⚠️ No relevant trends found")
                return {
                    "success": True,
                    "timestamp": datetime.now().isoformat(),
                    "analysis": "No GenAI trends found.",
                    "trends": [],
                    "count": 0
                }
            
            # Log initial processing
            print(f"✅ Summarization complete for {len(trends)} trends")
            
            self.think(f"Creating high level email summary for overall GenAI trends found")
            analysis = create_email_summary(trends, self.llm)
            
            # Log completion without printing details
            self.act("Create analysis", f"✅ Analysis complete for {len(trends)} trends")
            
            return {
                "success": True,
                "timestamp": datetime.now().isoformat(),
                "analysis": analysis,
                "trends": trends,
                "count": len(trends),
                "thought_process": self.thought_history
            }
            
        except Exception as e:
            self.act("Analyze trends", f"❌ Failed: {str(e)}")
            return {
                "success": False,
                "error": str(e),
                "timestamp": datetime.now().isoformat(),
                "thought_process": self.thought_history
            }

    def run(self) -> Dict:
        """Main execution flow with ReAct framework"""
        print("\n🚀 STARTING REDDIT AI TREND ANALYSIS\n")
        print("=" * 50)
        
        self.think("Starting Reddit AI trend analysis")
        
        # Initialize clients
        if not self.initialize_clients():
            print("\n❌ Failed to initialize clients. Aborting...")
            return {
                "success": False,
                "error": "Failed to initialize clients",
                "timestamp": datetime.now().isoformat(),
                "thought_process": self.thought_history
            }
        
        # Analyze trends
        result = self.analyze_trends()
        
        if result["success"]:
            self.think("Analysis complete, final report generated")
            print("\n✅ ANALYSIS COMPLETE")
            print("=" * 50)
            print("\nFinal report has been generated in the response.")
        else:
            print("\n❌ Analysis failed. Check error details.")
        
        return result

def main(reddit_creds: dict, openai_api_key: str) -> dict:
    """Main function using ReAct agent"""
    agent = RedditAIAnalysisAgent(reddit_creds, openai_api_key)
    return agent.run()

## Live Demo Example

In [5]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file in current directory
load_dotenv()

reddit_creds = {
    "client_id": os.getenv("REDDIT_CLIENT_ID"),
    "client_secret": os.getenv("REDDIT_CLIENT_SECRET"), 
    "user_agent": os.getenv("REDDIT_USER_AGENT")
}

openai_api_key = "no api needed" ##switch to openAI key when for OpenAI implementation

result = main(reddit_creds, openai_api_key)


🤖 Initializing Reddit AI Analysis Agent...


🚀 STARTING REDDIT AI TREND ANALYSIS


🤔 THINKING: Starting Reddit AI trend analysis

📡 INITIALIZING CLIENTS...

🤔 THINKING: Need to initialize Reddit and LLM client
🎯 ACTION: Initialize Reddit client
📝 RESULT: ✅ Reddit client initialized successfully

🎯 ACTION: Initialize LLM client
📝 RESULT: ✅ LLM client initialized successfully. DeepSeek-R1-Distill-Llama-8B running on 4xL4 Machine


🔍 ANALYZING REDDIT TRENDS...

🤔 THINKING: Fetching Reddit trends for analysis

📊 Fetching posts from r/technology...
🎯 ACTION: Fetching 10 most popular threads:
As the Trump admin deletes online data, scientists and digital librarians rush to save it
GenAI Relevance: False
Workers at NASA Told to ‘Drop Everything’ to Scrub Mentions of Indigenous People, Women from Its Websites | "This is a drop everything and reprioritize your day request," a directive "per NASA HQ direction" stated.
GenAI Relevance: False
Federal Workers Sue to Disconnect DOGE Server
GenAI Re

### Deepseek Reasoning

<think>
Okay, so I need to create a summary of the provided AI news and trends. Let me start by reading through each article carefully.

First, the California bill that makes AI companies remind kids that chatbots aren't people. It has 1295 points and 50 comments. That's a decent engagement, but the relevance type is indirect. I should note that as a key point under public sentiment, maybe highlighting the concern around AI's role in misleading users, especially children.

Next, Google removing their pledge against using AI for weapons. This has 311 points and 38 comments. It's a significant development. I need to categorize this under main technologies as it involves AI ethics and military applications. The summary mentions the removal of a specific pledge and the updated principles, so I should extract that. Also, the public sentiment here is negative, with concerns about ethics and military use.

I should structure the summary with clear headers: Main Technologies, Key Trends, Public Sentiment, and Notable Developments. Under each, I'll list bullet points. For main technologies, I'll include NLP for chatbots and AI ethics. For trends, the main themes are ethical concerns and regulatory issues. Public sentiment will cover both the bill and Google's decision, noting the negative reactions. Notable developments will focus on the bill and Google's change, including the engagement metrics to show their importance.

I need to make sure each section is concise and uses bullet points for clarity. Also, I should ensure that the engagement metrics are included where relevant to highlight the importance of each topic. I should avoid any markdown and keep the language professional.


---


### Summary of AI Trends Analysis

#### 1. **Main Technologies Discussed**
- **Natural Language Processing (NLP):** Used in chatbots, highlighting the need for clear distinctions between AI and human interaction, especially for children.
- **AI Ethics and Military Applications:** Google's use of AI in military contexts and the ethical implications of such applications.

#### 2. **Key Trends**
- **Ethical Concerns and Regulatory Issues:** The push for clearer ethical guidelines and regulations in AI development and use.
- **Public Awareness and Misleading Interactions:** Concerns about AI's role in misleading users, particularly children, as seen in the California bill.

#### 3. **Public Sentiment**
- **Negative Sentiment on Google's Decision:** Public concern over Google's involvement in military AI applications, with 311 points and 38 comments.
- **Support for Ethical AI Use:** The California bill received 1295 points and 50 comments, indicating strong public support for ethical AI use and protecting users, especially children.

#### 4. **Notable Developments**
- **California Bill:** A new law requiring AI companies to inform users that chatbots are not people, with significant engagement of 1295 points and 50 comments.
- **Google's AI Principles Update:** Removal of the pledge against AI for weapons, leading to 311 points and 38 comments, highlighting ethical and military application concerns.

This analysis captures the key technologies, trends, public reactions, and significant developments in the AI landscape, providing a comprehensive overview of current AI-related news.

---

### Further Reading

**California bill would make AI companies remind kids that chatbots aren’t people**
- Source: https://www.theverge.com/news/605728/california-chatbot-bill-child-safety

**Google removes pledge to not use AI for weapons from website**
- Source: https://techcrunch.com/2025/02/04/google-removes-pledge-to-not-use-ai-for-weapons-from-website/



🎯 ACTION: Create analysis
📝 RESULT: ✅ Analysis complete for 2 trends


🤔 THINKING: Analysis complete, final report generated

✅ ANALYSIS COMPLETE

Final report has been generated in the response.
