Based on the following video: [How-to Use The Reddit API in Python by James Briggs](https://www.youtube.com/watch?v=FdjVoOf9HN4)
1. Create a reddit application [here](https://www.reddit.com/prefs/apps) to get a client_id and client_secret


# Example workflow

## Step 1: Find a Trending Post
Use various sorting techniques such as hot, trending, controversial to select a post with insightful information. [Example post](https://www.reddit.com/r/mcp/comments/1nculrw/why_are_mcps_needed_for_basic_tools_like/)

## Step 2: Convert the Post Into a YAML Summary
Apply a rule-based system to generate a snapshot of the post. Ex:
- score each comment: score = upvotes + replys * 10
- output the top 10 comments + any parents
- generate the yaml to use as input for the llm

```yaml
post_title: "Why are MCPs needed for basic tools like filesystem access, git etc?"
post_body: "Im puzzled why an MCP (that too written in TS) is required for something as basic as reading, listing files etc. In my experience, the LLM has no problem in doing these tasks without any reliability concerns without any MCP. Likewise for git and even gh."
children:
    - comment: "MCP servers aren't NEEDED for that, but by using it, then you can re-use that functionality across clients vs implementing it each time in the client. That being said, the embedded functionality would likely work faster/better because of the tighter integration. So basically, you pick your sensitivity to effort cost and take the appropriate route."
    - comment: "LLMs can generate commands but can't actually execute them locally. They have no access to your local resources (e.g. files, apps). Simply put, MCP bridges that gap so the LLM can actually interact with your real environment instead of you having to copy paste commands back and forth. Additionally, they serve as the building blocks for agentic workflows."
      children:
      - comment: "this is the right answer, truly. u/rm-rf-rm your fundamental assumption is incorrect - \"LLM ha sno problem in doing these\" is incorrect. LLMs can say \"list directories in ...\" but it then has to be a tool call by the MCP client that performs that. the question maybe then becomes - should capabilities like file system be implemented by MCP clients by default or be provided via MCPs."
      - comment: "u/rm-rf-rm might have eval {llm_response} where llm_respose is something like git log. In that case s/he might be correct that llm + sh love can do that"
    - comment: "Sure but I think the bigger question is why a separate MCP for Git, LS etc. and not just give the ability to run Bash commands"
      children:
        - comment: "One reason is that the MCP server is self-contained, therefore decoupled from the user's environment and OS specifics. As an example, a git command execution via shell would fail if git is not installed or is not in the path. Another is that the MCP layer allows the LLM to operate at the git/version abstraction level not only the shell invocation one, allowing for more precision in terms of selecting the correct tool and its proper usage."
    - comment: "Maybe you are referring to using an AI powered coding tool? Those have AI tools built for reading files, accessing the shell, etc. the power of MCP is you don't need to build a tool that is restricted to a specific AI coding tool, you can just expose the MCP server remotely and any LLM that supports MCP can access it."
    - comment: "They're not needed; most clients have built-in tools for filesystem access. For git, you can just tell the agent to use the \"gh\" command. Same for anything where a command exists."
      children:
        - comment: "For git, you can just tell the agent to use the \"gh\" command gh does not perform most git actions (add, commit, push, etc.) - instead, it is for github-specific actions which aren't normally handled by the 'git' CLI app. Same for anything where a command exists. Agents may be able to use CLI commands such as git, yes, but they work better with the git MCP etc. most clients have built-in tools for filesystem access. You may be correct about this though."
```

## Step 3: Query the LLM:
```yaml
prompt: "generate a short linkedin post from this conversation. Extract the debate or lightbulb moment from this thread and present it."
context: {article_and_comments}
```

## Step 4: Render the Output

**ðŸ”§ The Great MCP Debate: Building Blocks vs. Built-ins**

Had a fascinating discussion about why we need Model Context Protocol (MCP) servers for "basic" tasks like file operations and git commands. The lightbulb moment? It's not about what LLMs *can* generateâ€”it's about execution and reusability.

**The core insight:** LLMs can write perfect bash commands, but they can't actually *run* them. That's where MCPs bridge the gap between AI suggestions and real system interaction.

**But here's the real debate:** Should we have specialized MCPs for git, filesystem, etc., or just give LLMs direct shell access?

**The case for specialized MCPs:**
â€¢ Abstraction over raw commands (git operations vs. shell invocations)  
â€¢ Environment-agnostic (works regardless of OS or installed tools)
â€¢ Reusable across different AI clients
â€¢ Better error handling and context

**The case for shell access:**
â€¢ Simpler implementation
â€¢ Leverages existing tooling
â€¢ More flexible for edge cases

The thread revealed a fundamental tension in AI tooling: granular, safe abstractions vs. powerful, direct access. Both approaches have merit depending on your use case and risk tolerance.

What's your take? Specialized tools or universal shell access for AI agents?

#AI #MCP #DeveloperTools #LLM #Automation


In [1]:
import praw
import dotenv
import os
import pandas as pd

dotenv.load_dotenv()
REDDIT_CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
REDDIT_USERNAME = os.getenv("REDDIT_USERNAME")
REDDIT_PASSWORD = os.getenv("REDDIT_PASSWORD")

# Authenticate with reddit api
reddit = praw.Reddit(
    client_id = REDDIT_CLIENT_ID,
    client_secret = REDDIT_CLIENT_SECRET,
    user_agent = REDDIT_USERNAME
)

print(reddit)

<praw.reddit.Reddit object at 0xff6e9747b410>


## Retrieve a Random Submission

In [18]:
from random import randrange


def select_random_submission(subreddits ,limit, min_comments):
    # Are there any exceptions we should be catching here?
    sub = subreddit_list[randrange(0,len(subreddits))]
    submissions_list = []
    
    
    for submission in reddit.subreddit(sub).hot(limit=limit):
        if submission.num_comments > min_comments:
            submissions_list.append({
                "subreddit": sub,
                "title": submission.title,
                "score": submission.score,
                "id": submission.id,
                "comments": submission.num_comments
            })
        
    if len(submissions_list) > 0:
        return submissions_list[randrange(0, len(submissions_list))]
    else:
        raise Exception(f"No submissions found in {sub} matching filter")

# Select a random sub and retrieve a list of hot posts.
subreddit_list = ['mcp', 'vibecoding', 'buildinpublic', 'aws', 'LlamaFarm', 'AgentsOfAI', 'ClaudeAI', 'Buildathon']
random_submission = select_random_submission(subreddit_list, 10, 10)
print(random_submission["title"])

Google just dropped an ace 64-page guide on building AI Agents


In [None]:
def extract_comment_recursively(comment):
    """Recursively extract comment data including nested replies"""
    if not hasattr(comment, 'body'):  # Skip non-comment objects
        return None
    
    # Get children IDs recursively
    children_ids = []
    if hasattr(comment, 'replies') and comment.replies:
        for reply in comment.replies:
            if hasattr(reply, 'body'):  # Make sure it's a comment
                children_ids.append(reply.id)
    
    return {
        "id": comment.id,
        "author": comment.author.name if comment.author else '[deleted]',
        "body": comment.body,
        "score": comment.score,
        "children": children_ids
    }

def extract_all_comments_recursively(comment, all_comments):
    """Recursively extract all comments and their nested replies"""
    if not hasattr(comment, 'body'):  # Skip non-comment objects
        return
    
    # Extract current comment
    comment_data = extract_comment_recursively(comment)
    if comment_data:
        all_comments.append(comment_data)
    
    # Recursively process replies
    if hasattr(comment, 'replies') and comment.replies:
        for reply in comment.replies:
            extract_all_comments_recursively(reply, all_comments)


# get the example post
submission = reddit.submission(id=random_submission['id'])
# Extract all comments recursively
comments_list = []
submission.comments.replace_more(limit=0)  # Remove "more comments" objects
for comment in submission.comments:
    extract_all_comments_recursively(comment, comments_list)

print(comments_list)

[{'id': 'nfm6ay2', 'author': 'Hans_lilly_Gruber', 'body': "Careful, clicking the link automatically launches a download of a file and there's no way of seeing what.", 'score': 3, 'children': ['nfmdkmj', 'nfmuwv6', 'nfqgdzz']}, {'id': 'nfmdkmj', 'author': 'SlyThought', 'body': 'If you open it on Brave, it opens as a PDF without needing to download it(as it does for pdf files in general).', 'score': 2, 'children': ['nfmuvel']}, {'id': 'nfmuvel', 'author': 'Valuable_Simple3860', 'body': "Yes. Pls open in browser. It's a pdf. No need to download.", 'score': 2, 'children': ['nfqgj0j', 'nfqun3s']}, {'id': 'nfqgj0j', 'author': 'littlemetal', 'body': 'How does the browser display it without downloading it?', 'score': 1, 'children': []}, {'id': 'nfqun3s', 'author': 'exiadf19', 'body': 'Bruh, when you open pdf in browser, it already download to your pc / mobile phone. Unless you have an embedded system that using a viewer', 'score': 1, 'children': []}, {'id': 'nfmuwv6', 'author': 'Valuable_Simpl

In [28]:
import yaml
import os
from langchain_google_genai import ChatGoogleGenerativeAI

def generate_llm_content(submission, comments_list, top_n_comments: int = 10) -> str:
    """
    Generate LLM-ready content by extracting top N comments and building a hierarchical structure
    """
    
    # Sort comments by score and get top N
    sorted_comments = sorted(comments_list, key=lambda x: x['score'], reverse=True)[:top_n_comments]
    
    # Create a lookup dictionary for all comments
    comment_lookup = {comment['id']: comment for comment in comments_list}
    
    def build_comment_structure(comment_data: dict) -> dict:
        """Build a comment structure with its children"""
        structure = {
            "comment": comment_data['body']
        }
        
        # Get children comments if they exist
        if comment_data['children']:
            # Build children structures recursively
            children_structures = []
            for child_id in comment_data['children']:
                if child_id in comment_lookup:
                    child_structure = build_comment_structure(comment_lookup[child_id])
                    children_structures.append(child_structure)
            
        
        return structure
    
    # Build the main structure
    llm_structure = {
        "post_title": submission.title,
        "post_body": submission.selftext,
        "children": []
    }
    
    # Find top-level comments (those that are in our top N and don't appear as children of others)
    all_child_ids = set()
    for comment in comments_list:
        all_child_ids.update(comment['children'])
    
    top_level_comments = [c for c in sorted_comments if c['id'] not in all_child_ids]
    
    # Build comment structures for top-level comments
    for comment in top_level_comments:
        comment_structure = build_comment_structure(comment)
        llm_structure["children"].append(comment_structure)
    
    # Convert to YAML and return
    return yaml.dump(llm_structure, default_flow_style=False, sort_keys=False, allow_unicode=True)

def query_gemini_with_content(yaml_content: str, prompt: str = None) -> str:
    """
    Query Google Gemini API with the YAML content using LangChain
    """
    # Get API key from environment
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        raise ValueError("GEMINI_API_KEY environment variable not found")
    
    # Initialize the Gemini model
    llm = ChatGoogleGenerativeAI(
        model="gemini-2.5-pro",
        google_api_key=api_key,
        temperature=0.7
    )
    
    # Default prompt if none provided
    if not prompt:
        prompt = "Generate a short LinkedIn post from this conversation. Extract the debate or lightbulb moment from this thread and present it."
    
    # Combine prompt with YAML content
    full_prompt = f"{prompt}\n\nContext:\n{yaml_content}"
    
    # Query the model
    response = llm.invoke(full_prompt)
    
    return response.content

# Generate the LLM content using the new structure
llm_content = generate_llm_content(submission, comments_list, top_n_comments=10)
print("Generated YAML content:")
print("=" * 50)
print(llm_content)

# Query Gemini with the content
print("\nQuerying Gemini API...")
print("=" * 50)
try:
    gemini_response = query_gemini_with_content(
        llm_content, 
        "Generate a short LinkedIn post from this conversation. Extract the debate or lightbulb moment from this thread and present it."
    )
    print("Gemini Response:")
    print(gemini_response)
except Exception as e:
    print(f"Error querying Gemini: {e}")
    print("Make sure GEMINI_API_KEY is set in your environment and langchain-google-genai is installed")

Generated YAML content:
post_title: Google just dropped an ace 64-page guide on building AI Agents
post_body: '[Source](https://services.google.com/fh/files/misc/startup_technical_guide_ai_agents_final.pdf)'
children:
- comment: Careful, clicking the link automatically launches a download of a file
    and there's no way of seeing what.
- comment: '[Direct Link](https://services.google.com/fh/files/misc/startup_technical_guide_ai_agents_final.pdf)


    Checksums don''t match between the official direct link and the "Source" above,
    but the "Source" is substantially smaller than the official link for some reason.'


Querying Gemini API...


E0000 00:00:1758907841.134340  282749 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


Gemini Response:
Here are a few options for a short LinkedIn post, each with a slightly different focus.

### Option 1 (Focus on Community & Security)

**Headline:** A great resource, and an even better lesson in digital trust.

Google just dropped a fantastic 64-page guide on building AI Agents, but the real lightbulb moment came from the community's response to a shared link.

The initial link being passed around wasn't the official oneâ€”it auto-downloaded a smaller, unverified file. Community members quickly flagged it, compared checksums, and provided the direct, official source.

Itâ€™s a powerful reminder: Always verify your sources, even when the content seems legitimate. A quick check is your first line of defense.

Kudos to the vigilant tech community for keeping everyone safe.

Here's the verified guide: [https://services.google.com/fh/files/misc/startup_technical_guide_ai_agents_final.pdf](https://services.google.com/fh/files/misc/startup_technical_guide_ai_agents_final.pdf