A modern, client-side LLM chat application with RAG capabilities, built as a Progressive Web App (PWA) using Next.js and liquid glass design.
- Client-side LLM Inference: Powered by wllama for browser-based AI
- RAG (Retrieval-Augmented Generation): Enhanced responses using duckwasm vector search
- Real-time Metrics: Context usage tracking and tokens-per-second monitoring
- Progressive Web App: Installable, offline-capable experience
- Liquid Glass Theme: Modern glassmorphism UI with backdrop blur effects
- Responsive Design: Optimized for desktop and mobile devices
- Dark Mode: Consistent dark theme with glass aesthetics
- Smooth Animations: Fluid transitions and micro-interactions
- Model Management: Download, cache, and switch between different LLM models
- Document Processing: Upload and process PDFs, text files for RAG context
- Streaming Responses: Real-time text generation with live metrics
- PWA Capabilities: Service worker, offline support, app manifest
wnxt/
├── app/ # Next.js App Router
│ ├── globals.css # Global styles with liquid glass theme
│ ├── layout.tsx # Root layout with PWA metadata
│ ├── not-found.tsx # 404 error page
│ └── page.tsx # Main application entry point
├── components/ # React components
│ ├── ui/ # shadcn/ui component library
│ │ ├── button.tsx # Button components
│ │ ├── input.tsx # Input field components
│ │ ├── label.tsx # Label components
│ │ ├── popover.tsx # Popover components
│ │ ├── progress.tsx # Progress bar components
│ │ ├── scroll-area.tsx # Scrollable area components
│ │ ├── select.tsx # Select dropdown components
│ │ ├── slider.tsx # Slider components
│ │ ├── switch.tsx # Toggle switch components
│ │ └── textarea.tsx # Text area components
│ ├── chat-interface.tsx # Chat interface components
│ ├── chat-screen.tsx # Main chat screen with conversation management
│ ├── metrics-panel.tsx # Real-time performance metrics display
│ ├── model-screen.tsx # Model download and management interface
│ ├── model-selector.tsx # Model selection components
│ ├── rag-panel.tsx # RAG document management panel
│ ├── rag-screen.tsx # Complete RAG document processing interface
│ ├── settings-panel.tsx # Settings configuration panel
│ ├── settings-screen.tsx # Complete settings management interface
│ └── tab-navigation.tsx # Bottom tab navigation for mobile
├── lib/ # Core business logic and utilities
│ ├── model-cache-manager.ts # Model caching and storage management
│ ├── vector-db-manager.ts # Vector database operations for RAG
│ ├── wllama-config.ts # WLLama configuration and model definitions
│ ├── wllama-context.tsx # React context for WLLama state management
│ ├── wllama-types.ts # TypeScript type definitions
│ └── utils.ts # Utility functions (cn class merging)
├── public/ # Static assets and PWA resources
│ ├── icon-192.png # PWA icon (192x192)
│ ├── icon-512.png # PWA icon (512x512)
│ ├── manifest.json # PWA manifest configuration
│ ├── sw.js # Service worker for offline functionality
│ ├── wllama/ # WLLama WebAssembly builds
│ │ ├── multi-thread/wllama.wasm # Multi-threaded WASM build
│ │ └── single-thread/wllama.wasm # Single-threaded WASM build
│ └── workbox-*.js # Workbox service worker utilities
└── Configuration Files
├── next.config.js # Next.js configuration with PWA setup
├── tailwind.config.ts # Tailwind CSS configuration
├── tsconfig.json # TypeScript configuration
└── package.json # Dependencies and build scripts
The central React context provider that orchestrates all LLM operations:
- Model Management: Download, load, unload, and switch between quantized GGUF models
- Inference Control: Streaming text generation with configurable parameters
- Conversation Management: Multi-conversation support with persistent chat history
- RAG Integration: Document processing and context injection into prompts
- Real-time Metrics: Token usage, generation speed, and memory monitoring
- Caching Strategy: IndexedDB-based model and document persistence
Custom vector storage implementation using IndexedDB for browser persistence:
- Document Storage: Chunked document storage with metadata
- Embedding Management: Vector embeddings for semantic search
- Similarity Search: Cosine similarity-based retrieval for RAG context
- Persistent Storage: Browser-based persistence without external dependencies
- Performance Optimized: Indexed queries for efficient vector operations
Intelligent model caching system for optimal performance:
- Download Management: Progress tracking and resumable downloads
- Cache Validation: Model integrity verification and cleanup
- Storage Optimization: Efficient use of browser storage quotas
- Cross-session Persistence: Models persist across browser sessions
- Next.js 14.1.0: App Router with TypeScript support
- React 18.2.0: Component-based UI with hooks and concurrent features
- Tailwind CSS 3.4.1: Utility-first CSS framework with custom glassmorphism utilities
- shadcn/ui: Complete component library built on Radix UI primitives
- Radix UI: Accessible, unstyled UI primitives (dialog, dropdown, popover, etc.)
- Lucide React 0.344.0: Modern icon library replacing FontAwesome
- Tailwind Animate: CSS animation utilities for smooth transitions
- @wllama/wllama v2.3.6: WebAssembly LLM inference runtime with multi-thread support
- @huggingface/jinja v0.5.1: Chat template processing for various model formats
- Vector Database: Custom IndexedDB-based vector storage with cosine similarity search
- Model Caching: IndexedDB-based model persistence and cache management
- next-pwa v5.6.0: Service worker generation and PWA utilities
- TypeScript 5.3.3: Type-safe development with strict configuration
- ESLint 8.56.0: Code linting with Next.js configuration
- PostCSS 8.4.33: CSS processing with Autoprefixer and Tailwind
- Custom CSS variables for glassmorphism effects
- Backdrop blur and transparency gradients
- Shimmer animations for loading states
- Consistent border radius and shadows
- Context Usage: Visual progress bar showing token consumption
- Tokens Per Second: Live TPS calculation during generation
- Memory Usage: Browser memory monitoring
- Generation Status: Visual indicators for AI state
- Multiple model support (Qwen, Llama, SmolLM, etc.)
- Download progress tracking
- Model caching and validation
- Status indicators (downloading, loading, ready, active)
- Document upload interface with drag-and-drop
- File type support (PDF, TXT, MD, and other text formats)
- IndexedDB-based vector storage with persistent caching
- Document chunking and metadata management
- Vector similarity search for context retrieval
- Real-time processing status and progress tracking
- Message history with role indicators
- Streaming response simulation
- Auto-scroll to latest messages
- Token counting per message
This project migrates and enhances the original wllama examples/main demo:
- Model downloading and caching
- Chat completion interface
- Inference parameter controls
- Model switching capabilities
- PWA Architecture: Installable web app
- Liquid Glass UI: Modern design system
- RAG Integration: Document-based context enhancement
- Real-time Metrics: Performance monitoring
- Responsive Design: Mobile-first approach
- Enhanced UX: Better user interactions and feedback
- RAG Embeddings: Document upload and processing is fully implemented, but embedding generation using wllama is currently simulated. Real embeddings require additional integration work.
- Settings Management: Import/export functionality shows placeholder implementations and needs backend integration.
- Model Switching: While supported, switching between loaded models requires unloading the current model first.
- Memory Usage: Large models (7B+ parameters) may require significant RAM and may not work on devices with limited memory.
- First Load: Initial model downloads can be large (hundreds of MB) and may take time on slower connections.
- Multi-threading: Requires Cross-Origin isolation headers, which may limit compatibility with some deployment environments.
- WebAssembly: Requires modern browsers with WebAssembly support
- IndexedDB: Used for model and document caching - some privacy-focused browsers may limit storage
- SharedArrayBuffer: Required for multi-threaded inference - needs COOP/COEP headers
- Some features are marked with TODO comments in the codebase and represent planned enhancements rather than bugs.
- The RAG system architecture is designed to be extensible for future embedding model integration.
- Node.js 18.17+ (required by Next.js 14)
- npm 9+ (ships with recent Node releases)
- Modern web browser with WebAssembly support (Chrome 95+, Firefox 100+, Safari 16.4+)
- At least 2GB free RAM for model inference
cd wnxt
npm installnpm run devThe app runs at http://localhost:3000 with hot reload enabled. In development, COOP/COEP headers are added automatically for multi-threaded WebAssembly support.
# Run linting
npm run lint
# Run TypeScript type checking
npm run type-checknpm run build
npm run startThe build pipeline compiles the Next.js app and generates the service worker through next-pwa. For full PWA capabilities in production, serve over HTTPS.
npm run pwa-installThis command helps set up the PWA for installation during development.
- DuckWasm Integration: Full RAG pipeline with vector search
- Advanced Model Support: More LLM architectures
- Conversation Persistence: Save/load chat sessions
- Voice Input: Speech-to-text capabilities
- Multi-modal Support: Image and file processing
- Custom Model Training: Fine-tuning interface
- Web Workers: Background model loading
- IndexedDB: Enhanced caching strategies
- WebGPU: Hardware acceleration (future)
- Model Sharding: Large model support
This project demonstrates the migration of wllama functionality into a modern PWA with enhanced UX and RAG capabilities. The architecture is designed to be extensible and maintainable.
MIT License - see LICENSE file for details.