Soluva — AI-Powered Problem Aggregator

Soluva crawls public platforms (Reddit, Quora, blogs), extracts real problem statements from posts using AI, clusters similar problems together, and surfaces them on a public feed as structured insights.

Architecture

┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│ source-reddit│────▶│  RabbitMQ   │────▶│   pipeline   │
│ source-quora │     │  (queues)   │     │  (AI stages) │
└──────────────┘     └─────────────┘     └──────┬───────┘
                                                │
                           ┌────────────────────┼────────────────────┐
                           ▼                    ▼                    ▼
                      ┌─────────┐        ┌───────────┐        ┌─────────┐
                      │ MongoDB │        │ PostgreSQL │        │ RabbitMQ│
                      │ (posts) │        │ + pgvector │        │ (events)│
                      └─────────┘        └─────┬─────┘        └─────────┘
                                               │
                                         ┌─────┴─────┐
                                         │  Fastify   │
                                         │   API      │
                                         └─────┬─────┘
                                               │
                                         ┌─────┴─────┐
                                         │  Next.js   │
                                         │  Frontend  │
                                         └───────────┘

Monorepo Structure

apps/
  web/              → Next.js 14 frontend (App Router, Tailwind CSS)
  api/              → Fastify REST API (public, no auth)
  pipeline/         → AI processing pipeline (5 stages)
  source-reddit/    → Reddit source microservice
  source-quora/     → Quora source scaffold

packages/
  types/            → Shared TypeScript interfaces
  db/               → Mongoose + Prisma clients
  queue/            → RabbitMQ connection helpers
  ai/               → Shared AI utilities (OpenAI/Anthropic)
  config/           → Environment configuration

Pipeline Stages

Problem Extraction — LLM analyzes post text, extracts discrete problem statements
Embedding Generation — OpenAI text-embedding-3-small (1536 dims)
Cluster Matching — pgvector cosine similarity against existing clusters
Cluster Regeneration — LLM regenerates cluster names at threshold intervals
Mark Processed — Updates MongoDB post as processed

Quick Start

Prerequisites

Node.js 20+
Docker & Docker Compose
An OpenAI API key

1. Clone and Install

git clone <repo-url> && cd soluva
cp .env.example .env
# Edit .env with your OPENAI_API_KEY
npm install

2. Start Infrastructure

docker compose up -d postgres mongodb rabbitmq

3. Set Up Database

cd packages/db
npx prisma db push
cd ../..

4. Build Packages

npm run build

5. Run Services (in separate terminals)

# Terminal 1: API
cd apps/api && npm run dev

# Terminal 2: Pipeline
cd apps/pipeline && npm run dev

# Terminal 3: Reddit source
cd apps/source-reddit && npm run dev

# Terminal 4: Frontend
cd apps/web && npm run dev

Or: Run Everything with Docker

cp .env.example .env
# Edit .env with your OPENAI_API_KEY

docker compose up --build

Services will be available at:

Frontend: http://localhost:3000
API: http://localhost:4000
RabbitMQ Management: http://localhost:15672 (guest/guest)

API Endpoints

Method	Path	Description
GET	`/feed`	Paginated cluster feed (sort: trending/recent, industry filter)
GET	`/feed/search?q=...`	Full-text + semantic search across clusters
GET	`/clusters/:id`	Cluster detail with top 20 problems
GET	`/industries`	List of all discovered industries
GET	`/health`	Health check

Adding a New Source Service

To add a new data source (e.g., Hacker News, Twitter):

Create apps/source-<name>/ following the pattern in apps/source-reddit/
Implement a fetcher that returns SoluvaPost[]
The service should:
- Fetch posts from the platform on a cron schedule
- Normalize into SoluvaPost with the appropriate type and source
- Deduplicate via url against MongoDB
- Store in MongoDB with processed: false
- Publish to soluva.raw_posts queue
Add the service to docker-compose.yml
The pipeline will automatically process posts from any source

Environment Variables

Variable	Required	Default	Description
`MONGODB_URI`	Yes	—	MongoDB connection string
`DATABASE_URL`	Yes	—	PostgreSQL connection string
`RABBITMQ_URL`	Yes	—	RabbitMQ connection string
`OPENAI_API_KEY`	Yes	—	OpenAI API key
`ANTHROPIC_API_KEY`	No	—	Anthropic API key (if using Claude)
`AI_PROVIDER`	No	`openai`	LLM provider (`openai` or `anthropic`)
`REDDIT_SUBREDDITS`	No	`startups,...`	Comma-separated subreddit list
`PIPELINE_CONCURRENCY`	No	`5`	Parallel post processing
`CLUSTER_SIMILARITY_THRESHOLD`	No	`0.82`	Cosine similarity threshold
`CLUSTER_REGEN_THRESHOLD`	No	`50`	Cluster rename every N problems
`API_PORT`	No	`4000`	API server port
`NEXT_PUBLIC_API_URL`	No	`http://localhost:4000`	API URL for frontend

Tech Stack

Monorepo: Turborepo
Frontend: Next.js 14 (App Router), Tailwind CSS
API: Fastify
Pipeline: Custom Node.js service
Databases: PostgreSQL + pgvector, MongoDB
Queue: RabbitMQ (amqplib)
AI: OpenAI GPT-4o + text-embedding-3-small (Anthropic Claude optional)
ORM: Prisma (PostgreSQL), Mongoose (MongoDB)
Runtime: Node.js 20, TypeScript

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps		apps
packages		packages
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soluva — AI-Powered Problem Aggregator

Architecture

Monorepo Structure

Pipeline Stages

Quick Start

Prerequisites

1. Clone and Install

2. Start Infrastructure

3. Set Up Database

4. Build Packages

5. Run Services (in separate terminals)

Or: Run Everything with Docker

API Endpoints

Adding a New Source Service

Environment Variables

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Soluva — AI-Powered Problem Aggregator

Architecture

Monorepo Structure

Pipeline Stages

Quick Start

Prerequisites

1. Clone and Install

2. Start Infrastructure

3. Set Up Database

4. Build Packages

5. Run Services (in separate terminals)

Or: Run Everything with Docker

API Endpoints

Adding a New Source Service

Environment Variables

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages