Skip to content

Drex72/Soluva

Repository files navigation

Soluva — AI-Powered Problem Aggregator

Soluva crawls public platforms (Reddit, Quora, blogs), extracts real problem statements from posts using AI, clusters similar problems together, and surfaces them on a public feed as structured insights.

Architecture

┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│ source-reddit│────▶│  RabbitMQ   │────▶│   pipeline   │
│ source-quora │     │  (queues)   │     │  (AI stages) │
└──────────────┘     └─────────────┘     └──────┬───────┘
                                                │
                           ┌────────────────────┼────────────────────┐
                           ▼                    ▼                    ▼
                      ┌─────────┐        ┌───────────┐        ┌─────────┐
                      │ MongoDB │        │ PostgreSQL │        │ RabbitMQ│
                      │ (posts) │        │ + pgvector │        │ (events)│
                      └─────────┘        └─────┬─────┘        └─────────┘
                                               │
                                         ┌─────┴─────┐
                                         │  Fastify   │
                                         │   API      │
                                         └─────┬─────┘
                                               │
                                         ┌─────┴─────┐
                                         │  Next.js   │
                                         │  Frontend  │
                                         └───────────┘

Monorepo Structure

apps/
  web/              → Next.js 14 frontend (App Router, Tailwind CSS)
  api/              → Fastify REST API (public, no auth)
  pipeline/         → AI processing pipeline (5 stages)
  source-reddit/    → Reddit source microservice
  source-quora/     → Quora source scaffold

packages/
  types/            → Shared TypeScript interfaces
  db/               → Mongoose + Prisma clients
  queue/            → RabbitMQ connection helpers
  ai/               → Shared AI utilities (OpenAI/Anthropic)
  config/           → Environment configuration

Pipeline Stages

  1. Problem Extraction — LLM analyzes post text, extracts discrete problem statements
  2. Embedding Generation — OpenAI text-embedding-3-small (1536 dims)
  3. Cluster Matching — pgvector cosine similarity against existing clusters
  4. Cluster Regeneration — LLM regenerates cluster names at threshold intervals
  5. Mark Processed — Updates MongoDB post as processed

Quick Start

Prerequisites

  • Node.js 20+
  • Docker & Docker Compose
  • An OpenAI API key

1. Clone and Install

git clone <repo-url> && cd soluva
cp .env.example .env
# Edit .env with your OPENAI_API_KEY
npm install

2. Start Infrastructure

docker compose up -d postgres mongodb rabbitmq

3. Set Up Database

cd packages/db
npx prisma db push
cd ../..

4. Build Packages

npm run build

5. Run Services (in separate terminals)

# Terminal 1: API
cd apps/api && npm run dev

# Terminal 2: Pipeline
cd apps/pipeline && npm run dev

# Terminal 3: Reddit source
cd apps/source-reddit && npm run dev

# Terminal 4: Frontend
cd apps/web && npm run dev

Or: Run Everything with Docker

cp .env.example .env
# Edit .env with your OPENAI_API_KEY

docker compose up --build

Services will be available at:

API Endpoints

Method Path Description
GET /feed Paginated cluster feed (sort: trending/recent, industry filter)
GET /feed/search?q=... Full-text + semantic search across clusters
GET /clusters/:id Cluster detail with top 20 problems
GET /industries List of all discovered industries
GET /health Health check

Adding a New Source Service

To add a new data source (e.g., Hacker News, Twitter):

  1. Create apps/source-<name>/ following the pattern in apps/source-reddit/
  2. Implement a fetcher that returns SoluvaPost[]
  3. The service should:
    • Fetch posts from the platform on a cron schedule
    • Normalize into SoluvaPost with the appropriate type and source
    • Deduplicate via url against MongoDB
    • Store in MongoDB with processed: false
    • Publish to soluva.raw_posts queue
  4. Add the service to docker-compose.yml
  5. The pipeline will automatically process posts from any source

Environment Variables

Variable Required Default Description
MONGODB_URI Yes MongoDB connection string
DATABASE_URL Yes PostgreSQL connection string
RABBITMQ_URL Yes RabbitMQ connection string
OPENAI_API_KEY Yes OpenAI API key
ANTHROPIC_API_KEY No Anthropic API key (if using Claude)
AI_PROVIDER No openai LLM provider (openai or anthropic)
REDDIT_SUBREDDITS No startups,... Comma-separated subreddit list
PIPELINE_CONCURRENCY No 5 Parallel post processing
CLUSTER_SIMILARITY_THRESHOLD No 0.82 Cosine similarity threshold
CLUSTER_REGEN_THRESHOLD No 50 Cluster rename every N problems
API_PORT No 4000 API server port
NEXT_PUBLIC_API_URL No http://localhost:4000 API URL for frontend

Tech Stack

  • Monorepo: Turborepo
  • Frontend: Next.js 14 (App Router), Tailwind CSS
  • API: Fastify
  • Pipeline: Custom Node.js service
  • Databases: PostgreSQL + pgvector, MongoDB
  • Queue: RabbitMQ (amqplib)
  • AI: OpenAI GPT-4o + text-embedding-3-small (Anthropic Claude optional)
  • ORM: Prisma (PostgreSQL), Mongoose (MongoDB)
  • Runtime: Node.js 20, TypeScript

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors