WNXT LLM Chat PWA

A modern, client-side LLM chat application with RAG capabilities, built as a Progressive Web App (PWA) using Next.js and liquid glass design.

🚀 Features

Core Functionality

Client-side LLM Inference: Powered by wllama for browser-based AI
RAG (Retrieval-Augmented Generation): Enhanced responses using duckwasm vector search
Real-time Metrics: Context usage tracking and tokens-per-second monitoring
Progressive Web App: Installable, offline-capable experience

Design & UX

Liquid Glass Theme: Modern glassmorphism UI with backdrop blur effects
Responsive Design: Optimized for desktop and mobile devices
Dark Mode: Consistent dark theme with glass aesthetics
Smooth Animations: Fluid transitions and micro-interactions

Technical Features

Model Management: Download, cache, and switch between different LLM models
Document Processing: Upload and process PDFs, text files for RAG context
Streaming Responses: Real-time text generation with live metrics
PWA Capabilities: Service worker, offline support, app manifest

🏗️ Project Structure

wnxt/
├── app/                    # Next.js App Router
│   ├── globals.css        # Global styles with liquid glass theme
│   ├── layout.tsx         # Root layout with PWA metadata
│   ├── not-found.tsx      # 404 error page
│   └── page.tsx           # Main application entry point
├── components/            # React components
│   ├── ui/               # shadcn/ui component library
│   │   ├── button.tsx    # Button components
│   │   ├── input.tsx     # Input field components
│   │   ├── label.tsx     # Label components
│   │   ├── popover.tsx   # Popover components
│   │   ├── progress.tsx  # Progress bar components
│   │   ├── scroll-area.tsx # Scrollable area components
│   │   ├── select.tsx    # Select dropdown components
│   │   ├── slider.tsx    # Slider components
│   │   ├── switch.tsx    # Toggle switch components
│   │   └── textarea.tsx  # Text area components
│   ├── chat-interface.tsx # Chat interface components
│   ├── chat-screen.tsx   # Main chat screen with conversation management
│   ├── metrics-panel.tsx  # Real-time performance metrics display
│   ├── model-screen.tsx  # Model download and management interface
│   ├── model-selector.tsx # Model selection components
│   ├── rag-panel.tsx     # RAG document management panel
│   ├── rag-screen.tsx    # Complete RAG document processing interface
│   ├── settings-panel.tsx # Settings configuration panel
│   ├── settings-screen.tsx # Complete settings management interface
│   └── tab-navigation.tsx # Bottom tab navigation for mobile
├── lib/                  # Core business logic and utilities
│   ├── model-cache-manager.ts # Model caching and storage management
│   ├── vector-db-manager.ts   # Vector database operations for RAG
│   ├── wllama-config.ts       # WLLama configuration and model definitions
│   ├── wllama-context.tsx     # React context for WLLama state management
│   ├── wllama-types.ts        # TypeScript type definitions
│   └── utils.ts               # Utility functions (cn class merging)
├── public/              # Static assets and PWA resources
│   ├── icon-192.png     # PWA icon (192x192)
│   ├── icon-512.png     # PWA icon (512x512)
│   ├── manifest.json    # PWA manifest configuration
│   ├── sw.js           # Service worker for offline functionality
│   ├── wllama/         # WLLama WebAssembly builds
│   │   ├── multi-thread/wllama.wasm    # Multi-threaded WASM build
│   │   └── single-thread/wllama.wasm   # Single-threaded WASM build
│   └── workbox-*.js    # Workbox service worker utilities
└── Configuration Files
    ├── next.config.js      # Next.js configuration with PWA setup
    ├── tailwind.config.ts  # Tailwind CSS configuration
    ├── tsconfig.json       # TypeScript configuration
    └── package.json        # Dependencies and build scripts

🏛️ Core Architecture

State Management (`wllama-context.tsx`)

The central React context provider that orchestrates all LLM operations:

Model Management: Download, load, unload, and switch between quantized GGUF models
Inference Control: Streaming text generation with configurable parameters
Conversation Management: Multi-conversation support with persistent chat history
RAG Integration: Document processing and context injection into prompts
Real-time Metrics: Token usage, generation speed, and memory monitoring
Caching Strategy: IndexedDB-based model and document persistence

Vector Database (`vector-db-manager.ts`)

Custom vector storage implementation using IndexedDB for browser persistence:

Document Storage: Chunked document storage with metadata
Embedding Management: Vector embeddings for semantic search
Similarity Search: Cosine similarity-based retrieval for RAG context
Persistent Storage: Browser-based persistence without external dependencies
Performance Optimized: Indexed queries for efficient vector operations

Model Caching (`model-cache-manager.ts`)

Intelligent model caching system for optimal performance:

Download Management: Progress tracking and resumable downloads
Cache Validation: Model integrity verification and cleanup
Storage Optimization: Efficient use of browser storage quotas
Cross-session Persistence: Models persist across browser sessions

🔧 Technology Stack

Frontend Framework

Next.js 14.1.0: App Router with TypeScript support
React 18.2.0: Component-based UI with hooks and concurrent features
Tailwind CSS 3.4.1: Utility-first CSS framework with custom glassmorphism utilities

UI Components

shadcn/ui: Complete component library built on Radix UI primitives
Radix UI: Accessible, unstyled UI primitives (dialog, dropdown, popover, etc.)
Lucide React 0.344.0: Modern icon library replacing FontAwesome
Tailwind Animate: CSS animation utilities for smooth transitions

AI/ML & Data Processing

@wllama/wllama v2.3.6: WebAssembly LLM inference runtime with multi-thread support
@huggingface/jinja v0.5.1: Chat template processing for various model formats
Vector Database: Custom IndexedDB-based vector storage with cosine similarity search
Model Caching: IndexedDB-based model persistence and cache management

PWA & Build Tools

next-pwa v5.6.0: Service worker generation and PWA utilities
TypeScript 5.3.3: Type-safe development with strict configuration
ESLint 8.56.0: Code linting with Next.js configuration
PostCSS 8.4.33: CSS processing with Autoprefixer and Tailwind

📊 Key Features Implemented

1. Liquid Glass Design System

Custom CSS variables for glassmorphism effects
Backdrop blur and transparency gradients
Shimmer animations for loading states
Consistent border radius and shadows

2. Real-time Metrics Tracking

Context Usage: Visual progress bar showing token consumption
Tokens Per Second: Live TPS calculation during generation
Memory Usage: Browser memory monitoring
Generation Status: Visual indicators for AI state

3. Model Management System

Multiple model support (Qwen, Llama, SmolLM, etc.)
Download progress tracking
Model caching and validation
Status indicators (downloading, loading, ready, active)

4. RAG Document Pipeline

Document upload interface with drag-and-drop
File type support (PDF, TXT, MD, and other text formats)
IndexedDB-based vector storage with persistent caching
Document chunking and metadata management
Vector similarity search for context retrieval
Real-time processing status and progress tracking

5. Chat Interface

Message history with role indicators
Streaming response simulation
Auto-scroll to latest messages
Token counting per message

🎯 Migration from wllama Demo

This project migrates and enhances the original wllama examples/main demo:

Original Features Migrated:

Model downloading and caching
Chat completion interface
Inference parameter controls
Model switching capabilities

New Features Added:

PWA Architecture: Installable web app
Liquid Glass UI: Modern design system
RAG Integration: Document-based context enhancement
Real-time Metrics: Performance monitoring
Responsive Design: Mobile-first approach
Enhanced UX: Better user interactions and feedback

⚠️ Known Issues & Limitations

Current Implementation Status

RAG Embeddings: Document upload and processing is fully implemented, but embedding generation using wllama is currently simulated. Real embeddings require additional integration work.
Settings Management: Import/export functionality shows placeholder implementations and needs backend integration.
Model Switching: While supported, switching between loaded models requires unloading the current model first.

Performance Considerations

Memory Usage: Large models (7B+ parameters) may require significant RAM and may not work on devices with limited memory.
First Load: Initial model downloads can be large (hundreds of MB) and may take time on slower connections.
Multi-threading: Requires Cross-Origin isolation headers, which may limit compatibility with some deployment environments.

Browser Compatibility

WebAssembly: Requires modern browsers with WebAssembly support
IndexedDB: Used for model and document caching - some privacy-focused browsers may limit storage
SharedArrayBuffer: Required for multi-threaded inference - needs COOP/COEP headers

Development Notes

Some features are marked with TODO comments in the codebase and represent planned enhancements rather than bugs.
The RAG system architecture is designed to be extensible for future embedding model integration.

🚀 Getting Started

Prerequisites

Node.js 18.17+ (required by Next.js 14)
npm 9+ (ships with recent Node releases)
Modern web browser with WebAssembly support (Chrome 95+, Firefox 100+, Safari 16.4+)
At least 2GB free RAM for model inference

Installation

cd wnxt
npm install

Development

npm run dev

The app runs at http://localhost:3000 with hot reload enabled. In development, COOP/COEP headers are added automatically for multi-threaded WebAssembly support.

Quality Assurance

# Run linting
npm run lint

# Run TypeScript type checking
npm run type-check

Production Build

npm run build
npm run start

The build pipeline compiles the Next.js app and generates the service worker through next-pwa. For full PWA capabilities in production, serve over HTTPS.

PWA Installation

npm run pwa-install

This command helps set up the PWA for installation during development.

🔮 Future Enhancements

Planned Features:

DuckWasm Integration: Full RAG pipeline with vector search
Advanced Model Support: More LLM architectures
Conversation Persistence: Save/load chat sessions
Voice Input: Speech-to-text capabilities
Multi-modal Support: Image and file processing
Custom Model Training: Fine-tuning interface

Performance Optimizations:

Web Workers: Background model loading
IndexedDB: Enhanced caching strategies
WebGPU: Hardware acceleration (future)
Model Sharding: Large model support

🤝 Contributing

This project demonstrates the migration of wllama functionality into a modern PWA with enhanced UX and RAG capabilities. The architecture is designed to be extensible and maintainable.

📄 License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
components		components
lib		lib
public		public
.gitignore		.gitignore
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

botBehavior/wnxt

Folders and files

Latest commit

History

Repository files navigation