Soluva crawls public platforms (Reddit, Quora, blogs), extracts real problem statements from posts using AI, clusters similar problems together, and surfaces them on a public feed as structured insights.
┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ source-reddit│────▶│ RabbitMQ │────▶│ pipeline │
│ source-quora │ │ (queues) │ │ (AI stages) │
└──────────────┘ └─────────────┘ └──────┬───────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌─────────┐
│ MongoDB │ │ PostgreSQL │ │ RabbitMQ│
│ (posts) │ │ + pgvector │ │ (events)│
└─────────┘ └─────┬─────┘ └─────────┘
│
┌─────┴─────┐
│ Fastify │
│ API │
└─────┬─────┘
│
┌─────┴─────┐
│ Next.js │
│ Frontend │
└───────────┘
apps/
web/ → Next.js 14 frontend (App Router, Tailwind CSS)
api/ → Fastify REST API (public, no auth)
pipeline/ → AI processing pipeline (5 stages)
source-reddit/ → Reddit source microservice
source-quora/ → Quora source scaffold
packages/
types/ → Shared TypeScript interfaces
db/ → Mongoose + Prisma clients
queue/ → RabbitMQ connection helpers
ai/ → Shared AI utilities (OpenAI/Anthropic)
config/ → Environment configuration
- Problem Extraction — LLM analyzes post text, extracts discrete problem statements
- Embedding Generation — OpenAI text-embedding-3-small (1536 dims)
- Cluster Matching — pgvector cosine similarity against existing clusters
- Cluster Regeneration — LLM regenerates cluster names at threshold intervals
- Mark Processed — Updates MongoDB post as processed
- Node.js 20+
- Docker & Docker Compose
- An OpenAI API key
git clone <repo-url> && cd soluva
cp .env.example .env
# Edit .env with your OPENAI_API_KEY
npm installdocker compose up -d postgres mongodb rabbitmqcd packages/db
npx prisma db push
cd ../..npm run build# Terminal 1: API
cd apps/api && npm run dev
# Terminal 2: Pipeline
cd apps/pipeline && npm run dev
# Terminal 3: Reddit source
cd apps/source-reddit && npm run dev
# Terminal 4: Frontend
cd apps/web && npm run devcp .env.example .env
# Edit .env with your OPENAI_API_KEY
docker compose up --buildServices will be available at:
- Frontend: http://localhost:3000
- API: http://localhost:4000
- RabbitMQ Management: http://localhost:15672 (guest/guest)
| Method | Path | Description |
|---|---|---|
| GET | /feed |
Paginated cluster feed (sort: trending/recent, industry filter) |
| GET | /feed/search?q=... |
Full-text + semantic search across clusters |
| GET | /clusters/:id |
Cluster detail with top 20 problems |
| GET | /industries |
List of all discovered industries |
| GET | /health |
Health check |
To add a new data source (e.g., Hacker News, Twitter):
- Create
apps/source-<name>/following the pattern inapps/source-reddit/ - Implement a fetcher that returns
SoluvaPost[] - The service should:
- Fetch posts from the platform on a cron schedule
- Normalize into
SoluvaPostwith the appropriatetypeandsource - Deduplicate via
urlagainst MongoDB - Store in MongoDB with
processed: false - Publish to
soluva.raw_postsqueue
- Add the service to
docker-compose.yml - The pipeline will automatically process posts from any source
| Variable | Required | Default | Description |
|---|---|---|---|
MONGODB_URI |
Yes | — | MongoDB connection string |
DATABASE_URL |
Yes | — | PostgreSQL connection string |
RABBITMQ_URL |
Yes | — | RabbitMQ connection string |
OPENAI_API_KEY |
Yes | — | OpenAI API key |
ANTHROPIC_API_KEY |
No | — | Anthropic API key (if using Claude) |
AI_PROVIDER |
No | openai |
LLM provider (openai or anthropic) |
REDDIT_SUBREDDITS |
No | startups,... |
Comma-separated subreddit list |
PIPELINE_CONCURRENCY |
No | 5 |
Parallel post processing |
CLUSTER_SIMILARITY_THRESHOLD |
No | 0.82 |
Cosine similarity threshold |
CLUSTER_REGEN_THRESHOLD |
No | 50 |
Cluster rename every N problems |
API_PORT |
No | 4000 |
API server port |
NEXT_PUBLIC_API_URL |
No | http://localhost:4000 |
API URL for frontend |
- Monorepo: Turborepo
- Frontend: Next.js 14 (App Router), Tailwind CSS
- API: Fastify
- Pipeline: Custom Node.js service
- Databases: PostgreSQL + pgvector, MongoDB
- Queue: RabbitMQ (amqplib)
- AI: OpenAI GPT-4o + text-embedding-3-small (Anthropic Claude optional)
- ORM: Prisma (PostgreSQL), Mongoose (MongoDB)
- Runtime: Node.js 20, TypeScript
MIT