Based on the following video: [How-to Use The Reddit API in Python by James Briggs](https://www.youtube.com/watch?v=FdjVoOf9HN4)
1. Create a reddit application [here](https://www.reddit.com/prefs/apps) to get a client_id and client_secret


# Example workflow

## Step 1: Find a Trending Post
Use various sorting techniques such as hot, trending, controversial to select a post with insightful information. [Example post](https://www.reddit.com/r/mcp/comments/1nculrw/why_are_mcps_needed_for_basic_tools_like/)

## Step 2: Convert the Post Into a YAML Summary
Apply a rule-based system to generate a snapshot of the post. Ex:
- score each comment: score = upvotes + replys * 10
- output the top 10 comments + any parents
- generate the yaml to use as input for the llm

```yaml
post_title: "Why are MCPs needed for basic tools like filesystem access, git etc?"
post_body: "Im puzzled why an MCP (that too written in TS) is required for something as basic as reading, listing files etc. In my experience, the LLM has no problem in doing these tasks without any reliability concerns without any MCP. Likewise for git and even gh."
children:
    - comment: "MCP servers aren't NEEDED for that, but by using it, then you can re-use that functionality across clients vs implementing it each time in the client. That being said, the embedded functionality would likely work faster/better because of the tighter integration. So basically, you pick your sensitivity to effort cost and take the appropriate route."
    - comment: "LLMs can generate commands but can't actually execute them locally. They have no access to your local resources (e.g. files, apps). Simply put, MCP bridges that gap so the LLM can actually interact with your real environment instead of you having to copy paste commands back and forth. Additionally, they serve as the building blocks for agentic workflows."
      children:
      - comment: "this is the right answer, truly. u/rm-rf-rm your fundamental assumption is incorrect - \"LLM ha sno problem in doing these\" is incorrect. LLMs can say \"list directories in ...\" but it then has to be a tool call by the MCP client that performs that. the question maybe then becomes - should capabilities like file system be implemented by MCP clients by default or be provided via MCPs."
      - comment: "u/rm-rf-rm might have eval {llm_response} where llm_respose is something like git log. In that case s/he might be correct that llm + sh love can do that"
    - comment: "Sure but I think the bigger question is why a separate MCP for Git, LS etc. and not just give the ability to run Bash commands"
      children:
        - comment: "One reason is that the MCP server is self-contained, therefore decoupled from the user's environment and OS specifics. As an example, a git command execution via shell would fail if git is not installed or is not in the path. Another is that the MCP layer allows the LLM to operate at the git/version abstraction level not only the shell invocation one, allowing for more precision in terms of selecting the correct tool and its proper usage."
    - comment: "Maybe you are referring to using an AI powered coding tool? Those have AI tools built for reading files, accessing the shell, etc. the power of MCP is you don't need to build a tool that is restricted to a specific AI coding tool, you can just expose the MCP server remotely and any LLM that supports MCP can access it."
    - comment: "They're not needed; most clients have built-in tools for filesystem access. For git, you can just tell the agent to use the \"gh\" command. Same for anything where a command exists."
      children:
        - comment: "For git, you can just tell the agent to use the \"gh\" command gh does not perform most git actions (add, commit, push, etc.) - instead, it is for github-specific actions which aren't normally handled by the 'git' CLI app. Same for anything where a command exists. Agents may be able to use CLI commands such as git, yes, but they work better with the git MCP etc. most clients have built-in tools for filesystem access. You may be correct about this though."
```

## Step 3: Query the LLM:
```yaml
prompt: "generate a short linkedin post from this conversation. Extract the debate or lightbulb moment from this thread and present it."
context: {article_and_comments}
```

## Step 4: Render the Output

**🔧 The Great MCP Debate: Building Blocks vs. Built-ins**

Had a fascinating discussion about why we need Model Context Protocol (MCP) servers for "basic" tasks like file operations and git commands. The lightbulb moment? It's not about what LLMs *can* generate—it's about execution and reusability.

**The core insight:** LLMs can write perfect bash commands, but they can't actually *run* them. That's where MCPs bridge the gap between AI suggestions and real system interaction.

**But here's the real debate:** Should we have specialized MCPs for git, filesystem, etc., or just give LLMs direct shell access?

**The case for specialized MCPs:**
• Abstraction over raw commands (git operations vs. shell invocations)  
• Environment-agnostic (works regardless of OS or installed tools)
• Reusable across different AI clients
• Better error handling and context

**The case for shell access:**
• Simpler implementation
• Leverages existing tooling
• More flexible for edge cases

The thread revealed a fundamental tension in AI tooling: granular, safe abstractions vs. powerful, direct access. Both approaches have merit depending on your use case and risk tolerance.

What's your take? Specialized tools or universal shell access for AI agents?

#AI #MCP #DeveloperTools #LLM #Automation


In [39]:
import praw
import dotenv
import os
import pandas as pd

dotenv.load_dotenv()
REDDIT_CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
REDDIT_CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
REDDIT_USERNAME = os.getenv("REDDIT_USERNAME")
REDDIT_PASSWORD = os.getenv("REDDIT_PASSWORD")

# Authenticate with reddit api
reddit = praw.Reddit(
    client_id = REDDIT_CLIENT_ID,
    client_secret = REDDIT_CLIENT_SECRET,
    user_agent = REDDIT_USERNAME
)

In [40]:
# get the example post
submission = reddit.submission(id="1nculrw")

In [None]:
# Convert the extracted tree of comments into a flattened list, so we can sort it


from pydantic import BaseModel, computed_field
from typing import List, Optional, Dict

class Comment(BaseModel):
    id: str
    author: str
    body: str
    score: int
    created_utc: float
    parent_id: Optional[str] = None
    depth: int = 0


class Post(BaseModel):
    title: str
    selftext: str
    score: int
    upvote_ratio: float
    num_comments: int
    created_utc: float
    url: str
    id: str
    comments: Dict[str, Comment] = {}
    
    def get_top_comments(self, n: int = 10) -> list[Comment]:
        """Get top N comments sorted by score descending"""
        return sorted(self.comments.values(), key=lambda x: x.score, reverse=True)[:n]
    
    def get_top_level_comments(self, n: int = 10) -> list[Comment]:
        """Get top N top-level comments (no parent) sorted by score"""
        top_level = [c for c in self.comments.values() if c.parent_id is None]
        return sorted(top_level, key=lambda x: x.score, reverse=True)[:n]
    
    def get_replies(self, parent_id: str) -> list[Comment]:
        """Get all direct replies to a specific comment"""
        return [c for c in self.comments.values() if c.parent_id == parent_id]
    
    def get_comment(self, comment_id: str) -> Optional[Comment]:
        """Get a specific comment by ID"""
        return self.comments.get(comment_id)


def extract_comments_flat(comment, parent_id=None, depth=0, comment_dict=None, max_depth=3):
    """Recursively extract comments into a flattened dictionary with parent references"""
    if depth >= max_depth or comment_dict is None:
        return
    
    # Add current comment to the flat dictionary
    comment_data = Comment(
        id = comment.id,
        author = str(comment.author) if comment.author else '[deleted]',
        body = comment.body,
        score = comment.score,
        created_utc = comment.created_utc,
        parent_id = parent_id,
        depth = depth
    )
    comment_dict[comment.id] = comment_data
    
    # Process replies if they exist and we haven't reached max depth
    if hasattr(comment, 'replies') and comment.replies and depth < max_depth - 1:
        for reply in comment.replies:
            if hasattr(reply, 'body'):  # Make sure it's actually a comment
                extract_comments_flat(reply, comment.id, depth + 1, comment_dict, max_depth)

post_data = Post(
    title = submission.title,
    selftext = submission.selftext,
    score = submission.score,
    upvote_ratio = submission.upvote_ratio,
    num_comments = submission.num_comments,
    created_utc = submission.created_utc,
    url = submission.url,
    id = submission.id,
    comments = {}
)

# Process top-level comments and flatten them
submission.comments.replace_more(limit=0)  # Remove "more comments" objects
for comment in submission.comments:
    if hasattr(comment, 'body'):  # Make sure it's actually a comment
        extract_comments_flat(comment, parent_id=None, depth=0, comment_dict=post_data.comments, max_depth=3)


In [None]:
# Find the top comments, then compile them into a string for the llm

import yaml

def generate_llm_content(post: Post, top_n_comments: int = 10) -> str:
    """
    Generate LLM-ready content by extracting top N comments and building a hierarchical structure
    """
    
    # Get top N comments sorted by score
    top_comments = post.get_top_comments(top_n_comments)
    
    # Build the hierarchical structure
    def build_comment_structure(comment: Comment) -> dict:
        """Build a comment structure with its children"""
        structure = {
            "comment": comment.body,
            "score": comment.score,
            "author": comment.author
        }
        
        # Get direct replies to this comment
        replies = post.get_replies(comment.id)
        if replies:
            # Sort replies by score and add them as children
            sorted_replies = sorted(replies, key=lambda x: x.score, reverse=True)
            structure["children"] = [build_comment_structure(reply) for reply in sorted_replies]
        
        return structure
    
    # Build the main structure
    llm_structure = {
        "post_title": post.title,
        "post_body": post.selftext,
        "post_score": post.score,
        "children": []
    }
    
    # Add top-level comments only (those without parents) from our top N list
    top_level_from_top = [c for c in top_comments if c.parent_id is None]
    
    for comment in top_level_from_top:
        comment_structure = build_comment_structure(comment)
        llm_structure["children"].append(comment_structure)
    
    # Convert to YAML and return
    return yaml.dump(llm_structure, default_flow_style=False, sort_keys=False, allow_unicode=True)

# Generate the LLM content
llm_content = generate_llm_content(post_data, top_n_comments=10)
print(llm_content)

post_title: Why are MCPs needed for basic tools like filesystem access, git etc?
post_body: 'Im puzzled why an MCP (that too written in TS) is required for something
  as basic as reading, listing files etc. In my experience, the LLM has no problem
  in doing these tasks without any reliability concerns without any MCP. Likewise
  for git and even gh. '
post_score: 3
children:
- comment: 'MCP servers aren''t NEEDED for that, but by using it, then you can re-use
    that functionality across clients vs implementing it each time in the client.    That
    being said, the embedded functionality would likely work faster/better because
    of the tighter integration.


    So basically, you pick your sensitivity to effort cost and take the appropriate
    route.'
  score: 9
  author: PhilWheat
- comment: LLMs can generate commands but can't actually execute them locally. They
    have no access to your local resources (e.g. files, apps). Simply put, MCP bridges
    that gap so the LLM can a