A full-stack application that scrapes websites, processes content using Google Gemini AI, and manages high-load tasks using an asynchronous queue system.
- Frontend: Next.js 14 (App Router), TanStack Query, Tailwind CSS
- Backend: Node.js, Express
- Database: PostgreSQL + Drizzle ORM
- Queue: BullMQ + Redis
- AI/Scraping: Google Gemini API + Puppeteer
- Asynchronous Processing: Long-running scraping jobs are offloaded to a Redis queue to prevent request timeouts.
- Live Polling: Frontend uses TanStack Query to poll job status in real-time.
- Robust Scraping: Uses Puppeteer (Headless Chrome) to render JavaScript-heavy sites before scraping.
- Task History: Persists all jobs and results in PostgreSQL.
- Dockerized: Entire stack runs with a single command.
Prerequisites: Docker Desktop installed.
-
Create a
.envfile in theserverfolder:DATABASE_URL=postgres://myuser:mypassword@postgres:5432/scraper_db REDIS_HOST=redis REDIS_PORT=6379 GEMINI_API_KEY=your_gemini_api_key_here
-
Run the application:
docker-compose up --build
-
Open your browser:
- App: http://localhost:3000
- API: http://localhost:4000
/project: Next.js frontend application./server: Express API and Worker logic.docker-compose.yml: Orchestration for DB, Redis, Backend, and Frontend.
Bonus Implemented: The entire application is containerized using Docker Compose. Database schemas are automatically applied on container startup using Drizzle Migrations.