Skip to content

Client hosted large language model chat applicationwith Retrieval Augmented Generation (RAG), built on Next.js 14, WASM, and [`@wllama/wllama`] runtime. Ships as a Progressive Web App so it can be installed and run offline after assets are cached.

Notifications You must be signed in to change notification settings

botBehavior/wnxt

Repository files navigation

WNXT LLM Chat PWA

A modern, client-side LLM chat application with RAG capabilities, built as a Progressive Web App (PWA) using Next.js and liquid glass design.

🚀 Features

Core Functionality

  • Client-side LLM Inference: Powered by wllama for browser-based AI
  • RAG (Retrieval-Augmented Generation): Enhanced responses using duckwasm vector search
  • Real-time Metrics: Context usage tracking and tokens-per-second monitoring
  • Progressive Web App: Installable, offline-capable experience

Design & UX

  • Liquid Glass Theme: Modern glassmorphism UI with backdrop blur effects
  • Responsive Design: Optimized for desktop and mobile devices
  • Dark Mode: Consistent dark theme with glass aesthetics
  • Smooth Animations: Fluid transitions and micro-interactions

Technical Features

  • Model Management: Download, cache, and switch between different LLM models
  • Document Processing: Upload and process PDFs, text files for RAG context
  • Streaming Responses: Real-time text generation with live metrics
  • PWA Capabilities: Service worker, offline support, app manifest

🏗️ Project Structure

wnxt/
├── app/                    # Next.js App Router
│   ├── globals.css        # Global styles with liquid glass theme
│   ├── layout.tsx         # Root layout with PWA metadata
│   ├── not-found.tsx      # 404 error page
│   └── page.tsx           # Main application entry point
├── components/            # React components
│   ├── ui/               # shadcn/ui component library
│   │   ├── button.tsx    # Button components
│   │   ├── input.tsx     # Input field components
│   │   ├── label.tsx     # Label components
│   │   ├── popover.tsx   # Popover components
│   │   ├── progress.tsx  # Progress bar components
│   │   ├── scroll-area.tsx # Scrollable area components
│   │   ├── select.tsx    # Select dropdown components
│   │   ├── slider.tsx    # Slider components
│   │   ├── switch.tsx    # Toggle switch components
│   │   └── textarea.tsx  # Text area components
│   ├── chat-interface.tsx # Chat interface components
│   ├── chat-screen.tsx   # Main chat screen with conversation management
│   ├── metrics-panel.tsx  # Real-time performance metrics display
│   ├── model-screen.tsx  # Model download and management interface
│   ├── model-selector.tsx # Model selection components
│   ├── rag-panel.tsx     # RAG document management panel
│   ├── rag-screen.tsx    # Complete RAG document processing interface
│   ├── settings-panel.tsx # Settings configuration panel
│   ├── settings-screen.tsx # Complete settings management interface
│   └── tab-navigation.tsx # Bottom tab navigation for mobile
├── lib/                  # Core business logic and utilities
│   ├── model-cache-manager.ts # Model caching and storage management
│   ├── vector-db-manager.ts   # Vector database operations for RAG
│   ├── wllama-config.ts       # WLLama configuration and model definitions
│   ├── wllama-context.tsx     # React context for WLLama state management
│   ├── wllama-types.ts        # TypeScript type definitions
│   └── utils.ts               # Utility functions (cn class merging)
├── public/              # Static assets and PWA resources
│   ├── icon-192.png     # PWA icon (192x192)
│   ├── icon-512.png     # PWA icon (512x512)
│   ├── manifest.json    # PWA manifest configuration
│   ├── sw.js           # Service worker for offline functionality
│   ├── wllama/         # WLLama WebAssembly builds
│   │   ├── multi-thread/wllama.wasm    # Multi-threaded WASM build
│   │   └── single-thread/wllama.wasm   # Single-threaded WASM build
│   └── workbox-*.js    # Workbox service worker utilities
└── Configuration Files
    ├── next.config.js      # Next.js configuration with PWA setup
    ├── tailwind.config.ts  # Tailwind CSS configuration
    ├── tsconfig.json       # TypeScript configuration
    └── package.json        # Dependencies and build scripts

🏛️ Core Architecture

State Management (wllama-context.tsx)

The central React context provider that orchestrates all LLM operations:

  • Model Management: Download, load, unload, and switch between quantized GGUF models
  • Inference Control: Streaming text generation with configurable parameters
  • Conversation Management: Multi-conversation support with persistent chat history
  • RAG Integration: Document processing and context injection into prompts
  • Real-time Metrics: Token usage, generation speed, and memory monitoring
  • Caching Strategy: IndexedDB-based model and document persistence

Vector Database (vector-db-manager.ts)

Custom vector storage implementation using IndexedDB for browser persistence:

  • Document Storage: Chunked document storage with metadata
  • Embedding Management: Vector embeddings for semantic search
  • Similarity Search: Cosine similarity-based retrieval for RAG context
  • Persistent Storage: Browser-based persistence without external dependencies
  • Performance Optimized: Indexed queries for efficient vector operations

Model Caching (model-cache-manager.ts)

Intelligent model caching system for optimal performance:

  • Download Management: Progress tracking and resumable downloads
  • Cache Validation: Model integrity verification and cleanup
  • Storage Optimization: Efficient use of browser storage quotas
  • Cross-session Persistence: Models persist across browser sessions

🔧 Technology Stack

Frontend Framework

  • Next.js 14.1.0: App Router with TypeScript support
  • React 18.2.0: Component-based UI with hooks and concurrent features
  • Tailwind CSS 3.4.1: Utility-first CSS framework with custom glassmorphism utilities

UI Components

  • shadcn/ui: Complete component library built on Radix UI primitives
  • Radix UI: Accessible, unstyled UI primitives (dialog, dropdown, popover, etc.)
  • Lucide React 0.344.0: Modern icon library replacing FontAwesome
  • Tailwind Animate: CSS animation utilities for smooth transitions

AI/ML & Data Processing

  • @wllama/wllama v2.3.6: WebAssembly LLM inference runtime with multi-thread support
  • @huggingface/jinja v0.5.1: Chat template processing for various model formats
  • Vector Database: Custom IndexedDB-based vector storage with cosine similarity search
  • Model Caching: IndexedDB-based model persistence and cache management

PWA & Build Tools

  • next-pwa v5.6.0: Service worker generation and PWA utilities
  • TypeScript 5.3.3: Type-safe development with strict configuration
  • ESLint 8.56.0: Code linting with Next.js configuration
  • PostCSS 8.4.33: CSS processing with Autoprefixer and Tailwind

📊 Key Features Implemented

1. Liquid Glass Design System

  • Custom CSS variables for glassmorphism effects
  • Backdrop blur and transparency gradients
  • Shimmer animations for loading states
  • Consistent border radius and shadows

2. Real-time Metrics Tracking

  • Context Usage: Visual progress bar showing token consumption
  • Tokens Per Second: Live TPS calculation during generation
  • Memory Usage: Browser memory monitoring
  • Generation Status: Visual indicators for AI state

3. Model Management System

  • Multiple model support (Qwen, Llama, SmolLM, etc.)
  • Download progress tracking
  • Model caching and validation
  • Status indicators (downloading, loading, ready, active)

4. RAG Document Pipeline

  • Document upload interface with drag-and-drop
  • File type support (PDF, TXT, MD, and other text formats)
  • IndexedDB-based vector storage with persistent caching
  • Document chunking and metadata management
  • Vector similarity search for context retrieval
  • Real-time processing status and progress tracking

5. Chat Interface

  • Message history with role indicators
  • Streaming response simulation
  • Auto-scroll to latest messages
  • Token counting per message

🎯 Migration from wllama Demo

This project migrates and enhances the original wllama examples/main demo:

Original Features Migrated:

  • Model downloading and caching
  • Chat completion interface
  • Inference parameter controls
  • Model switching capabilities

New Features Added:

  • PWA Architecture: Installable web app
  • Liquid Glass UI: Modern design system
  • RAG Integration: Document-based context enhancement
  • Real-time Metrics: Performance monitoring
  • Responsive Design: Mobile-first approach
  • Enhanced UX: Better user interactions and feedback

⚠️ Known Issues & Limitations

Current Implementation Status

  • RAG Embeddings: Document upload and processing is fully implemented, but embedding generation using wllama is currently simulated. Real embeddings require additional integration work.
  • Settings Management: Import/export functionality shows placeholder implementations and needs backend integration.
  • Model Switching: While supported, switching between loaded models requires unloading the current model first.

Performance Considerations

  • Memory Usage: Large models (7B+ parameters) may require significant RAM and may not work on devices with limited memory.
  • First Load: Initial model downloads can be large (hundreds of MB) and may take time on slower connections.
  • Multi-threading: Requires Cross-Origin isolation headers, which may limit compatibility with some deployment environments.

Browser Compatibility

  • WebAssembly: Requires modern browsers with WebAssembly support
  • IndexedDB: Used for model and document caching - some privacy-focused browsers may limit storage
  • SharedArrayBuffer: Required for multi-threaded inference - needs COOP/COEP headers

Development Notes

  • Some features are marked with TODO comments in the codebase and represent planned enhancements rather than bugs.
  • The RAG system architecture is designed to be extensible for future embedding model integration.

🚀 Getting Started

Prerequisites

  • Node.js 18.17+ (required by Next.js 14)
  • npm 9+ (ships with recent Node releases)
  • Modern web browser with WebAssembly support (Chrome 95+, Firefox 100+, Safari 16.4+)
  • At least 2GB free RAM for model inference

Installation

cd wnxt
npm install

Development

npm run dev

The app runs at http://localhost:3000 with hot reload enabled. In development, COOP/COEP headers are added automatically for multi-threaded WebAssembly support.

Quality Assurance

# Run linting
npm run lint

# Run TypeScript type checking
npm run type-check

Production Build

npm run build
npm run start

The build pipeline compiles the Next.js app and generates the service worker through next-pwa. For full PWA capabilities in production, serve over HTTPS.

PWA Installation

npm run pwa-install

This command helps set up the PWA for installation during development.

🔮 Future Enhancements

Planned Features:

  • DuckWasm Integration: Full RAG pipeline with vector search
  • Advanced Model Support: More LLM architectures
  • Conversation Persistence: Save/load chat sessions
  • Voice Input: Speech-to-text capabilities
  • Multi-modal Support: Image and file processing
  • Custom Model Training: Fine-tuning interface

Performance Optimizations:

  • Web Workers: Background model loading
  • IndexedDB: Enhanced caching strategies
  • WebGPU: Hardware acceleration (future)
  • Model Sharding: Large model support

🤝 Contributing

This project demonstrates the migration of wllama functionality into a modern PWA with enhanced UX and RAG capabilities. The architecture is designed to be extensible and maintainable.

📄 License

MIT License - see LICENSE file for details.

About

Client hosted large language model chat applicationwith Retrieval Augmented Generation (RAG), built on Next.js 14, WASM, and [`@wllama/wllama`] runtime. Ships as a Progressive Web App so it can be installed and run offline after assets are cached.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published