Reddit Scraper

Reddit Scraper is an open-source Reddit scraping and intelligence tool that collects subreddit posts and comments from any topic or search query, then generates AI-powered reports with sentiment analysis, key themes, notable quotes, and top post rankings.

Built for developers, marketers, founders, and researchers, it combines Decodo Web Scraping API with LLM-based analysis to turn raw Reddit discussions into structured, actionable insights without manual browsing.

Features

Scrape subreddits, posts, and comments. Collect Reddit discussions from any topic, keyword, or search query.
AI-powered executive summaries. Generate structured overviews of what Reddit users are saying.
Sentiment analysis. Understand whether discussions are positive, negative, or mixed.
Key theme extraction. Identify recurring talking points and dominant discussion patterns.
Notable quotes and insights. Surface representative or impactful Reddit comments automatically.
Top post ranking. Highlight the most relevant and upvoted posts across scraped results.
Markdown and JSON export. Download structured reports for further analysis or sharing.

How it works

Enter a topic. Provide a prompt or research question such as "What do developers think about AI coding tools?".
Adjust analysis options. Optionally specify subreddits, choose a time range, and control how many Reddit posts should be analyzed.
Review the scraping plan. The LLM suggests relevant subreddits and search queries before scraping begins. You can review and edit the generated plan before running it.
Generate an intelligence report. Reddit posts and comments are scraped through the Decodo Web Scraping API, then analyzed by the LLM to produce a structured report with summaries, themes, sentiment analysis, notable quotes, and top discussions.
Export the results. Download the generated report as Markdown or JSON for further analysis, reporting, or sharing.

Example report

View example JSON report snippet

{
  "id": "6a10077369749f74ff595437",
  "plan": {
    "prompt": "What do developers think about AI coding tools?",
    "subreddits": [
      "webscraping",
      "llms",
      "automation",
      "MachineLearning"
    ],
    "queries": [
      "What do developers think about AI coding tools?",
      "\"AI coding tools\"",
      "developers opinion AI"
    ],
    "timeRange": "month",
    "maxPosts": 50
  },
  "posts": [
    {
      "id": "1px1agd",
      "title": "[D] r/MachineLearning - a year in review",
      "subreddit": "MachineLearning",
      "author": "Everlier",
      "upvotes": 202,
      "commentCount": 16,
      "url": "https://www.reddit.com/r/MachineLearning/comments/1px1agd/d_rmachinelearning_a_year_in_review/",
      "selftext": "This is a review of most upvoted posts on this sub in 2025, loosely grouped into high-level themes...",
      "createdAt": 1766851467
    },
    {
      "id": "1tibgp5",
      "title": "I built klura, a toolkit for an AI agent to reverse-engineer websites",
      "subreddit": "webscraping",
      "author": "rundfunk",
      "upvotes": 50,
      "commentCount": 19,
      "url": "https://github.com/klura-ai/klura",
      "selftext": "I've been working on klura — a free toolkit that gives a coding agent the ability to reverse-engineer a website...",
      "createdAt": 1779252926
    },
    {
      "id": "1qhbbll",
      "title": "Software development ISN'T CHEAP! You might have unrealistic expectations for AI systems.",
      "subreddit": "automation",
      "author": "kshoneesh_chaudhary",
      "upvotes": 10,
      "commentCount": 14,
      "url": "https://www.reddit.com/r/automation/comments/1qhbbll/software_development_isnt_cheap_you_might_have/",
      "selftext": "AI has made it easier to build software, but it’s not as if you can magically do months worth of coding in days...",
      "createdAt": 1768845240
    },
    {
      "id": "1t9q58x",
      "title": "🚨 Scrapling v0.4.8 is out with a new insane update🕷️",
      "subreddit": "webscraping",
      "author": "0xReaper",
      "upvotes": 187,
      "commentCount": 16,
      "url": "https://github.com/D4Vinci/Scrapling/releases/tag/v0.4.8",
      "selftext": "Introducing the new spiders templates feature to make generic spiders easier...",
      "createdAt": 1778466134
    }
  ]
}

View example report interface

Prerequisites

Bun version 1.2.5 or newer
Docker for running local MongoDB and Redis instances
A Decodo Web Scraping API token – create an account at dashboard.decodo.com
At least one supported LLM provider API key:
- Anthropic
- OpenAI
- Google Gemini

If you've just installed Bun, open a new terminal window or reload your shell configuration before running bun install.

Example:

zsh: source ~/.zshrc

bash: source ~/.bashrc

Installation

1. Clone the repository

git clone https://github.com/Decodo/reddit-scraper
cd reddit-scraper

2. Install dependencies

bun install

3. Configure environment variables

cp .env.example .env

Add your:

Decodo API token
LLM provider selection
Anthropic, OpenAI, or Gemini API key

4. Start local databases

bun db:up

This starts MongoDB and Redis through Docker Compose.

5. Start the Reddit Scraper

bun dev

Frontend:

http://localhost:5274

Backend API:

http://localhost:5002

Configuration

After creating your .env file during installation, add your API credentials and provider settings:

# Decodo Scraping API
DECODO_BASIC_AUTH_TOKEN=your_decodo_token

# LLM provider (claude | openai | gemini)
LLM_PROVIDER=claude
LLM_MODEL=

# LLM API keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...

Environment variables

Variable	Description
`DECODO_BASIC_AUTH_TOKEN`	Decodo Scraping API authentication token
`LLM_PROVIDER`	LLM provider to use (`claude`, `openai`, or `gemini`)
`LLM_MODEL`	Optional custom model override
`ANTHROPIC_API_KEY`	Anthropic API key
`OPENAI_API_KEY`	OpenAI API key
`GEMINI_API_KEY`	Google Gemini API key

Scripts

Command	Description
`bun dev`	Start frontend and backend development servers
`bun build`	Build all application packages
`bun lint`	Run linting across all packages
`bun db:up`	Start MongoDB and Redis via Docker Compose
`bun db:down`	Stop local database containers

Tech stack

Layer	Technology
Frontend	React 19, TanStack Router, TanStack Query, Tailwind CSS v4, Radix UI
Backend	NestJS 11, MongoDB, Mongoose
Scraping	Decodo Web Scraping API
LLMs	Anthropic Claude, OpenAI GPT, Google Gemini

Project structure

.github/
  images/         # README screenshots and repository assets

apps/
  frontend/
    src/
      features/
        tracker/      # Prompt flow, report generation, API hooks
        queries/      # Query history features
        settings/     # Runtime settings and API key management

      routes/
        _layout/
          tracker.tsx
          history.tsx
          history.$id.tsx
          settings.tsx

  backend/
    src/
      features/
        tracker/      # Scraping and report generation endpoints
        decodo/       # Decodo Web Scraping API integration
        llm/          # Claude, OpenAI, and Gemini abstraction layer
        queries/      # Query history persistence
        settings/     # Runtime provider and API configuration

  shared/
    # Shared TypeScript types

docs/

Documentation

For additional information about scraping targets, parameters, and API behavior, see the Decodo Web Scraping API documentation.

API endpoints

Method	Path	Description
POST	`/tracker/plan`	Generate a scraping plan from a user prompt
POST	`/tracker/analyze`	Scrape Reddit and generate an AI report
GET	`/queries`	List saved query history
GET	`/queries/:id`	Retrieve a full query result
DELETE	`/queries/:id`	Delete a saved query
GET	`/settings`	Retrieve current provider and API key status
PATCH	`/settings`	Update API keys or LLM provider

Scraping strategy

Each analysis combines three Decodo scraping targets to collect Reddit data at different levels:

Step	Target	Purpose
1	`universal`	Search Reddit globally across multiple subreddits
2	`reddit_subreddit`	Collect trending and relevant subreddit posts
3	`reddit_post`	Extract full comment threads for deeper analysis

The LLM generates relevant subreddits and search queries based on the user's prompt. Up to 30 posts are collected, deduplicated, and ranked by engagement before the top results are scraped with full comment threads for AI-powered analysis and summarization.

Related repositories

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/images		.github/images
apps		apps
docs		docs
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
README.md		README.md
bun.lock		bun.lock
docker-compose.yml		docker-compose.yml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Scraper

Features

How it works

Example report

Prerequisites

Installation

1. Clone the repository

2. Install dependencies

3. Configure environment variables

4. Start local databases

5. Start the Reddit Scraper

Configuration

Environment variables

Scripts

Tech stack

Project structure

Documentation

API endpoints

Scraping strategy

Related repositories

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reddit Scraper

Features

How it works

Example report

Prerequisites

Installation

1. Clone the repository

2. Install dependencies

3. Configure environment variables

4. Start local databases

5. Start the Reddit Scraper

Configuration

Environment variables

Scripts

Tech stack

Project structure

Documentation

API endpoints

Scraping strategy

Related repositories

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages