A professional, enterprise-grade platform for analyzing Reddit discussions with AI-powered insights and multi-subreddit support.
- Python 3.8+
- Node.js 14+
- Reddit API credentials
- Azure OpenAI credentials
Create a .env file in this directory:
# Reddit API Credentials (from https://www.reddit.com/prefs/apps)
CLIENT_ID=your_reddit_client_id
CLIENT_SECRET=your_reddit_client_secret
USER_AGENT=python:RedditPipeline:v4.0
# Azure OpenAI Credentials (from https://portal.azure.com)
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT=gpt-4o-miniBackend:
cd backend
pip install fastapi uvicorn pydantic
cd ..Pipeline Dependencies:
pip install praw tqdm python-dotenv openaiFrontend:
cd frontend
npm installTerminal 1 - Backend:
./start-backend.sh
# Or manually: cd backend && uvicorn app:app --reload --host 0.0.0.0 --port 8000Terminal 2 - Frontend:
./start-frontend.sh
# Or manually: cd frontend && npm run devOpen your browser: http://localhost:5173
- Run Reddit analysis pipelines directly from the UI
- No command-line required
- Real-time job status tracking
- Automatic result loading
- Analyze multiple subreddits with tab navigation
- Each subreddit maintains independent data
- Switch between dashboards instantly
- All data persists across restarts
- Update existing dashboards with one click
- System automatically calculates optimal date range
- Only need to set max posts - that's it!
- Fetches posts since last run with 7-day buffer
- Creates fresh independent clusters each time
- Modern gradient design
- Real-time status updates
- Interactive charts and visualizations
- Sortable tables
- Responsive layout
- Click "+ New Pipeline"
- Enter subreddit name (e.g., "MakeupAddiction")
- Configure parameters:
- Lookback Days: How far back to analyze (default: 365)
- Max Posts: Maximum posts to scrape (default: 700)
- Min Cluster Size: Minimum posts per cluster (default: 3)
- Click "Run Pipeline"
- Wait 5-15 minutes for completion
- Results appear automatically in a new tab
- Select a subreddit tab
- Click green "Update" button
- Set max posts (default: 700)
- System automatically calculates date range
- Fetches posts since your last run
- No date math required!
- Click "Update Dashboard"
- Wait for completion
- Updated insights appear automatically
- Home View: Browse all insights grouped by category
- Sort: By posts, engagement, or default order
- Expand Categories: Click to see insights in each category
- Detail View: Click any insight card for detailed analysis with charts
- Posts Table: View all source posts with clickable Reddit links
Each pipeline run (new or update) analyzes posts independently:
- Fresh LLM-based semantic clustering
- New themes automatically form new clusters
- No merging with previous runs
- Clean per-period analysis
Think of it as: "What are people discussing this period?" not "What changed from last period?"
Each run creates a timestamped directory:
pipeline_output/
βββ MakeupAddiction_20251113_213310/ β Run 1
βββ MakeupAddiction_20251114_102030/ β Run 2 (Update)
βββ SkincareAddiction_20251114_120000/ β Different subreddit
- Dashboard always shows latest version
- Old versions preserved for history
- Each run is self-contained
- Dashboard auto-refreshes every 10 seconds
- New tabs appear automatically after pipeline completion
- No manual refresh needed
- Multi-browser sync support
The backend provides these REST endpoints:
Pipeline Management:
POST /pipeline/run- Trigger pipeline executionGET /pipeline/status/{job_id}- Get job statusGET /pipeline/jobs- List all jobs
Data Access:
GET /subreddits- List all available subredditsGET /data/{subreddit}- Get full dashboard dataGET /data/{subreddit}/metadata- Get metadataGET /data/{subreddit}/categories- Get categoriesGET /data/{subreddit}/insights- Get all insightsGET /data/{subreddit}/insights/{id}- Get specific insightGET /data/{subreddit}/history- Get all historical runs
Health:
GET /health- Health checkGET /- API info
Check:
- All Python dependencies installed:
pip install fastapi uvicorn pydantic praw tqdm python-dotenv openai .envfile exists in the parent directory- Port 8000 is available
Test:
curl http://127.0.0.1:8000/healthCheck:
- Node dependencies installed:
cd frontend && npm install - Backend is running on port 8000
- Port 5173 is available
Check:
.envfile has valid credentials- Reddit API credentials are correct
- Azure OpenAI credentials are correct
Verify credentials:
python -c "import praw; import os; from dotenv import load_dotenv; load_dotenv(); reddit = praw.Reddit(client_id=os.getenv('CLIENT_ID'), client_secret=os.getenv('CLIENT_SECRET'), user_agent=os.getenv('USER_AGENT')); print('β Reddit auth OK' if reddit.read_only else 'β Auth failed')"Check:
- Backend logs for errors
- Look for
pipeline_output/{subreddit}_{timestamp}/directory - Check if
dashboard_data.jsonexists in the directory - Wait 10 seconds for auto-refresh or restart frontend
Check:
- Make sure a subreddit tab is selected
- Button only shows when activeSubreddit is set
Frontend (React + Vite)
β HTTP
Backend (FastAPI)
β Subprocess
Pipeline Script (Python)
β API Calls
Reddit API + Azure OpenAI
β Results
File System (pipeline_output/)
- Keep
.envfile secure and never commit it - The backend allows CORS from all origins (development only)
- For production: restrict CORS, add authentication
- GitHub tokens should never be shared publicly
Typical Pipeline Execution:
- 100 posts: ~2-5 minutes
- 500 posts: ~5-10 minutes
- 1000 posts: ~10-20 minutes
Factors:
- Number of posts to process
- Azure OpenAI API rate limits
- Reddit API rate limits
- Batch processing (10 posts per call)
- Brand Monitoring: Track sentiment about products
- Trend Analysis: Identify emerging topics
- Competitive Intelligence: Compare multiple communities
- Customer Research: Understand pain points and needs
- Content Strategy: Find high-engagement themes
- Product Development: Discover feature requests
Each pipeline run generates:
posts.jsonl- All scraped postsposts_summary.csv- Posts summary tablecategories.json- List of categoriesinsights.json- Generated insightsdashboard_data.json- Complete dashboard dataREPORT.md- Markdown report
For issues or questions:
- Check the troubleshooting section above
- Review backend terminal logs for errors
- Check browser console for frontend errors
- Verify
.envfile configuration
See LICENSE file in the repository.
Version: 2.3.1
Status: Production Ready
Last Updated: November 2024