Reddit Scraper is an open-source Reddit scraping and intelligence tool that collects subreddit posts and comments from any topic or search query, then generates AI-powered reports with sentiment analysis, key themes, notable quotes, and top post rankings.
Built for developers, marketers, founders, and researchers, it combines Decodo Web Scraping API with LLM-based analysis to turn raw Reddit discussions into structured, actionable insights without manual browsing.
- Scrape subreddits, posts, and comments. Collect Reddit discussions from any topic, keyword, or search query.
- AI-powered executive summaries. Generate structured overviews of what Reddit users are saying.
- Sentiment analysis. Understand whether discussions are positive, negative, or mixed.
- Key theme extraction. Identify recurring talking points and dominant discussion patterns.
- Notable quotes and insights. Surface representative or impactful Reddit comments automatically.
- Top post ranking. Highlight the most relevant and upvoted posts across scraped results.
- Markdown and JSON export. Download structured reports for further analysis or sharing.
-
Enter a topic. Provide a prompt or research question such as "What do developers think about AI coding tools?".
-
Adjust analysis options. Optionally specify subreddits, choose a time range, and control how many Reddit posts should be analyzed.
-
Review the scraping plan. The LLM suggests relevant subreddits and search queries before scraping begins. You can review and edit the generated plan before running it.
-
Generate an intelligence report. Reddit posts and comments are scraped through the Decodo Web Scraping API, then analyzed by the LLM to produce a structured report with summaries, themes, sentiment analysis, notable quotes, and top discussions.
-
Export the results. Download the generated report as Markdown or JSON for further analysis, reporting, or sharing.
View example JSON report snippet
{
"id": "6a10077369749f74ff595437",
"plan": {
"prompt": "What do developers think about AI coding tools?",
"subreddits": [
"webscraping",
"llms",
"automation",
"MachineLearning"
],
"queries": [
"What do developers think about AI coding tools?",
"\"AI coding tools\"",
"developers opinion AI"
],
"timeRange": "month",
"maxPosts": 50
},
"posts": [
{
"id": "1px1agd",
"title": "[D] r/MachineLearning - a year in review",
"subreddit": "MachineLearning",
"author": "Everlier",
"upvotes": 202,
"commentCount": 16,
"url": "https://www.reddit.com/r/MachineLearning/comments/1px1agd/d_rmachinelearning_a_year_in_review/",
"selftext": "This is a review of most upvoted posts on this sub in 2025, loosely grouped into high-level themes...",
"createdAt": 1766851467
},
{
"id": "1tibgp5",
"title": "I built klura, a toolkit for an AI agent to reverse-engineer websites",
"subreddit": "webscraping",
"author": "rundfunk",
"upvotes": 50,
"commentCount": 19,
"url": "https://github.com/klura-ai/klura",
"selftext": "I've been working on klura — a free toolkit that gives a coding agent the ability to reverse-engineer a website...",
"createdAt": 1779252926
},
{
"id": "1qhbbll",
"title": "Software development ISN'T CHEAP! You might have unrealistic expectations for AI systems.",
"subreddit": "automation",
"author": "kshoneesh_chaudhary",
"upvotes": 10,
"commentCount": 14,
"url": "https://www.reddit.com/r/automation/comments/1qhbbll/software_development_isnt_cheap_you_might_have/",
"selftext": "AI has made it easier to build software, but it’s not as if you can magically do months worth of coding in days...",
"createdAt": 1768845240
},
{
"id": "1t9q58x",
"title": "🚨 Scrapling v0.4.8 is out with a new insane update🕷️",
"subreddit": "webscraping",
"author": "0xReaper",
"upvotes": 187,
"commentCount": 16,
"url": "https://github.com/D4Vinci/Scrapling/releases/tag/v0.4.8",
"selftext": "Introducing the new spiders templates feature to make generic spiders easier...",
"createdAt": 1778466134
}
]
}- Bun version 1.2.5 or newer
- Docker for running local MongoDB and Redis instances
- A Decodo Web Scraping API token – create an account at dashboard.decodo.com
- At least one supported LLM provider API key:
- Anthropic
- OpenAI
- Google Gemini
If you've just installed Bun, open a new terminal window or reload your shell configuration before running
bun install.Example:
- zsh:
source ~/.zshrc- bash:
source ~/.bashrc
git clone https://github.com/Decodo/reddit-scraper
cd reddit-scraperbun installcp .env.example .envAdd your:
- Decodo API token
- LLM provider selection
- Anthropic, OpenAI, or Gemini API key
bun db:upThis starts MongoDB and Redis through Docker Compose.
bun devFrontend:
http://localhost:5274
Backend API:
http://localhost:5002
After creating your .env file during installation, add your API credentials and provider settings:
# Decodo Scraping API
DECODO_BASIC_AUTH_TOKEN=your_decodo_token
# LLM provider (claude | openai | gemini)
LLM_PROVIDER=claude
LLM_MODEL=
# LLM API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...| Variable | Description |
|---|---|
DECODO_BASIC_AUTH_TOKEN |
Decodo Scraping API authentication token |
LLM_PROVIDER |
LLM provider to use (claude, openai, or gemini) |
LLM_MODEL |
Optional custom model override |
ANTHROPIC_API_KEY |
Anthropic API key |
OPENAI_API_KEY |
OpenAI API key |
GEMINI_API_KEY |
Google Gemini API key |
| Command | Description |
|---|---|
bun dev |
Start frontend and backend development servers |
bun build |
Build all application packages |
bun lint |
Run linting across all packages |
bun db:up |
Start MongoDB and Redis via Docker Compose |
bun db:down |
Stop local database containers |
| Layer | Technology |
|---|---|
| Frontend | React 19, TanStack Router, TanStack Query, Tailwind CSS v4, Radix UI |
| Backend | NestJS 11, MongoDB, Mongoose |
| Scraping | Decodo Web Scraping API |
| LLMs | Anthropic Claude, OpenAI GPT, Google Gemini |
.github/
images/ # README screenshots and repository assets
apps/
frontend/
src/
features/
tracker/ # Prompt flow, report generation, API hooks
queries/ # Query history features
settings/ # Runtime settings and API key management
routes/
_layout/
tracker.tsx
history.tsx
history.$id.tsx
settings.tsx
backend/
src/
features/
tracker/ # Scraping and report generation endpoints
decodo/ # Decodo Web Scraping API integration
llm/ # Claude, OpenAI, and Gemini abstraction layer
queries/ # Query history persistence
settings/ # Runtime provider and API configuration
shared/
# Shared TypeScript types
docs/
For additional information about scraping targets, parameters, and API behavior, see the Decodo Web Scraping API documentation.
| Method | Path | Description |
|---|---|---|
| POST | /tracker/plan |
Generate a scraping plan from a user prompt |
| POST | /tracker/analyze |
Scrape Reddit and generate an AI report |
| GET | /queries |
List saved query history |
| GET | /queries/:id |
Retrieve a full query result |
| DELETE | /queries/:id |
Delete a saved query |
| GET | /settings |
Retrieve current provider and API key status |
| PATCH | /settings |
Update API keys or LLM provider |
Each analysis combines three Decodo scraping targets to collect Reddit data at different levels:
| Step | Target | Purpose |
|---|---|---|
| 1 | universal |
Search Reddit globally across multiple subreddits |
| 2 | reddit_subreddit |
Collect trending and relevant subreddit posts |
| 3 | reddit_post |
Extract full comment threads for deeper analysis |
The LLM generates relevant subreddits and search queries based on the user's prompt. Up to 30 posts are collected, deduplicated, and ranked by engagement before the top results are scraped with full comment threads for AI-powered analysis and summarization.
- Decodo Web Scraping API
- Decodo Web Scraping API documentation
- Decodo SDK for TypeScript
- Decodo MCP Server
- Decodo OpenClaw Skill
MIT — see LICENSE

