A production-ready AI application built with Next.js 15, TypeScript, and Google Gemini that demonstrates advanced RAG (Retrieval-Augmented Generation) implementation with Supabase vector store and LangSmith observability for document processing and intelligent question answering.
This project is created for educational and portfolio purposes to demonstrate:
- Advanced AI engineering skills with LangChain.js and Google Gemini
- Full-stack development with modern technologies (Next.js 15, TypeScript)
- Production-ready application architecture and best practices
- Complete RAG pipeline implementation with Supabase vector store
- LangSmith observability and tracing for AI applications
- Professional software development practices and documentation
This project is for educational and portfolio use only. Commercial use is prohibited without explicit permission.
- ๐ Document Upload: Support for PDF, TXT, DOCX, and Markdown files with robust parsing
- ๐ค AI-Powered Q&A: Ask questions about uploaded documents using Google Gemini 2.0 Flash
- ๐ Source Citations: Get answers with references to specific document sections
- ๐๏ธ Document Management: View, filter, and manage uploaded documents
- โก Real-time Processing: Live status updates during document processing
- ๐ฌ Chat Interface: Modern chat UI with typing indicators and message history
- ๐ Error Recovery: Comprehensive error handling with retry logic and graceful degradation
- ๐ Type-Safe: Full TypeScript implementation with strict mode
- ๐ฑ Responsive Design: Mobile-first approach with Tailwind CSS
- ๐ก๏ธ Error Handling: Comprehensive error handling with retry logic
- โก Performance: Optimized with proper loading states and caching
- ๐ Security: Input validation and file type restrictions
- ๐ง Smart Chunking: LangChain RecursiveCharacterTextSplitter with sentence boundary detection
- ๐ Vector Search: Supabase pgvector with HNSW indexing for efficient similarity search
- ๐ฏ Embedding Generation: Google Gemini embeddings (3072 dimensions) for semantic understanding
- ๐ Relevance Scoring: Confidence and relevance metrics for answers
- ๐ Complete RAG Pipeline: Query โ Embedding โ Vector Search โ Context Retrieval โ AI Generation โ Citation
- ๐ Observability: LangSmith tracing for AI pipeline monitoring and debugging
- Next.js 15 with App Router
- TypeScript (strict mode)
- Tailwind CSS for styling
- React 19 with modern hooks
- Next.js API Routes for backend logic
- LangChain.js for AI functionality and RAG pipeline
- Google Gemini models (gemini-2.0-flash-exp, gemini-embedding-001)
- Supabase with pgvector extension for vector storage
- Prisma ORM with Supabase PostgreSQL for metadata
- Zod for validation
- RAG Architecture with LangChain document chunking
- Supabase Vector Store with pgvector for efficient similarity search
- Google Gemini Embeddings (3072 dimensions) for semantic understanding
- LangChain Expression Language (LCEL) for chain composition
- Context-Aware Generation with source citations
- LangSmith Integration for observability and tracing
- Jest for unit testing
- Cypress for E2E testing
- ESLint for code quality
- TypeScript for type safety
document-assistant-ai/
โโโ app/
โ โโโ api/ # API routes
โ โ โโโ health/ # Health check endpoint
โ โ โโโ upload/ # File upload endpoint
โ โ โโโ process/ # Document processing endpoint
โ โ โโโ query/ # Query processing endpoint
โ โ โโโ documents/ # Document management endpoint
โ โโโ client/ # Client-side components
โ โโโ layout.tsx # Root layout
โโโ components/ # Reusable UI components
โ โโโ FileUpload.tsx # Drag-and-drop file upload
โ โโโ DocumentList.tsx # Document management interface
โ โโโ QueryInterface.tsx # Question input interface
โ โโโ QueryResults.tsx # AI response display
โ โโโ ... # Other UI components
โโโ lib/ # Core business logic
โ โโโ types.ts # TypeScript type definitions
โ โโโ utils.ts # Utility functions
โ โโโ ai-services.ts # AI service integration with LangSmith
โ โโโ database.ts # Database service layer
โ โโโ supabase.ts # Supabase client configuration
โ โโโ vector-store.ts # Supabase vector store integration
โ โโโ document-loaders.ts # LangChain document processing
โ โโโ langchain-chains.ts # RAG chains with LCEL
โ โโโ langsmith.ts # LangSmith observability setup
โ โโโ config.ts # Application configuration
โ โโโ error-handling.ts # Error handling utilities
โโโ prisma/ # Database schema and migrations
โ โโโ schema.prisma # Database models
โโโ __tests__/ # Unit tests
โโโ cypress/ # E2E tests
โโโ docs/ # Documentation
- Node.js 18+
- npm or yarn
- Google AI API key
- Supabase account (for vector storage)
- LangSmith account (optional, for observability)
-
Clone the repository
git clone <your-repo-url> cd document-assistant-ai
-
Install dependencies
npm install
-
Set up environment variables
cp env.example .env.local
Edit
.env.local
and add your API keys:GOOGLE_API_KEY=your_google_api_key_here DATABASE_URL="postgres://postgres.[YOUR-PROJECT-REF]:[YOUR-PASSWORD]@aws-0-us-east-1.pooler.supabase.com:5432/postgres" SUPABASE_URL=https://[YOUR-PROJECT-REF].supabase.co SUPABASE_ANON_KEY=your_supabase_anon_key SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key LANGCHAIN_TRACING_V2=true LANGCHAIN_API_KEY=your_langsmith_api_key LANGCHAIN_PROJECT=document-assistant-ai
-
Set up Supabase
- Create a new Supabase project at supabase.com
- Go to Settings โ Database โ Connect
- Copy the Supavisor Session pooler connection string
- Run the SQL script in
supabase-setup.sql
to create the vector store
-
Set up LangSmith (Optional)
- Create a LangSmith account at smith.langchain.com
- Get your API key from the settings
- Add the API key to
.env.local
-
Set up the database
npx prisma generate npx prisma db push
-
Get Google AI API Key
- Go to Google AI Studio
- Create a new API key
- Copy the key to your
.env.local
file
-
Run the development server
npm run dev
-
Open your browser Navigate to http://localhost:3000
- Click on the "Upload Documents" tab
- Drag and drop files or click "browse files"
- Supported formats: PDF, TXT, DOCX, MD
- Maximum file size: 10MB
- View uploaded documents in the "My Documents" tab
- Filter by status (uploading, processing, processed, error)
- Select specific documents for targeted queries
- Switch to the "Ask Questions" tab
- Type your question in the text area
- Use example questions or create your own
- Get AI-powered answers with source citations
- View chat history with typing indicators
GET /api/health
Returns the status of all services (AI, Database, File Storage).
POST /api/upload
Content-Type: multipart/form-data
file: [File]
Uploads and processes a document file with real text extraction.
POST /api/process
Content-Type: application/json
{
"documentId": "string",
"forceReprocess": boolean
}
Processes a document and creates searchable chunks with embeddings.
POST /api/query
Content-Type: application/json
{
"question": "string",
"documentIds": ["string"],
"maxResults": number,
"includeSources": boolean
}
Processes a user query and returns AI-generated answers with sources.
GET /api/documents
Retrieves all uploaded documents with their processing status.
npm run test
npm run test:coverage
npm run test:e2e
npm run test:all
- Push your code to GitHub
- Connect your repository to Vercel
- Add environment variables in Vercel dashboard:
GOOGLE_API_KEY
: Your Google AI API keyDATABASE_URL
: Your database connection string
- Deploy!
npm run build
npm start
Variable | Description | Required | Example |
---|---|---|---|
GOOGLE_API_KEY |
Google AI API key | Yes | AIzaSy... |
DATABASE_URL |
Supabase PostgreSQL connection string (pooler) | Yes | postgres://postgres.[PROJECT-REF]:[PASSWORD]@aws-0-us-east-1.pooler.supabase.com:5432/postgres |
SUPABASE_URL |
Supabase project URL | Yes | https://xxx.supabase.co |
SUPABASE_ANON_KEY |
Supabase anonymous key | Yes | eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... |
SUPABASE_SERVICE_ROLE_KEY |
Supabase service role key | Yes | eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... |
LANGCHAIN_TRACING_V2 |
Enable LangSmith tracing | No | true |
LANGCHAIN_API_KEY |
LangSmith API key | No | ls__xxx |
LANGCHAIN_PROJECT |
LangSmith project name | No | document-assistant-ai |
-
Document Upload & Processing
File Upload โ Text Extraction (pdf2json/mammoth) โ LangChain Chunking โ Gemini Embeddings โ Supabase Vector Store
-
Query Processing
User Query โ Query Embedding โ Supabase Vector Search โ Context Retrieval โ Gemini Generation โ Response with Citations
-
Vector Similarity Search
- Supabase pgvector with HNSW indexing for efficient similarity search
- Google Gemini embeddings (3072 dimensions) for semantic understanding
- Relevance scoring and ranking with confidence metrics
- Context window management for optimal response generation
- Supabase PostgreSQL: Documents, DocumentChunks, Queries, QuerySources for metadata
- Supabase Vector Store: Document embeddings and vector similarity search
- Unified Architecture: All data in Supabase for optimal performance and scalability
- Exponential backoff retry logic
- Comprehensive error boundaries with React ErrorBoundary
- Graceful degradation with fallback UI
- User-friendly error messages
- Global error handlers to prevent page reloads
- Robust PDF parsing with pdf2json fallback
- Lazy Loading: Components loaded on demand
- Caching: API responses cached where appropriate
- Chunking: Large documents processed in chunks with LangChain
- Debouncing: Search inputs debounced to reduce API calls
- Vector Search: Efficient similarity calculations with Supabase pgvector
- Progress Indicators: Real-time feedback for long-running operations
- Health Checks: Regular service health monitoring
- Error Tracking: Comprehensive error logging
- Performance Metrics: Response time tracking
- Processing Status: Real-time status updates
- LangSmith Tracing: AI pipeline observability and debugging
- Vector Store Monitoring: Supabase performance metrics
- Update
DocumentType
inlib/types.ts
- Add extraction logic in
app/api/upload/route.ts
- Update validation in
components/FileUpload.tsx
- Adjust parameters in
lib/ai-services.ts
- Update prompts in
lib/langchain-chains.ts
- Modify chunking strategy in
lib/document-loaders.ts
- Configure LangSmith tracing in
lib/langsmith.ts
- Update Tailwind classes in components
- Modify color scheme in
tailwind.config.js
- Add custom CSS for specific components
- โ Complete RAG pipeline implementation with LangChain
- โ Supabase vector store with pgvector for efficient similarity search
- โ Google Gemini embeddings (3072 dimensions) for semantic understanding
- โ LangChain Expression Language (LCEL) for chain composition
- โ Context-aware AI responses with source citations
- โ LangSmith observability and tracing integration
- โ Type-safe API development with TypeScript
- โ Modern React patterns with hooks and context
- โ Unified Supabase architecture (PostgreSQL + Vector Store)
- โ File processing with robust PDF parsing (pdf2json)
- โ Error handling with React ErrorBoundary
- โ Real-time UI updates and progress indicators
- โ Comprehensive testing suite (Jest + Cypress)
- โ Error handling and retry logic
- โ Performance optimization with vector indexing
- โ Security considerations and input validation
- โ Professional documentation and setup guides
- โ LangSmith observability for production monitoring
- Document Processing: 5-15 seconds depending on file size
- Query Response: 2-5 seconds with Supabase vector search
- Vector Search: Sub-second similarity calculations with pgvector
- UI Responsiveness: <100ms for user interactions
- LangSmith Tracing: Real-time observability with minimal overhead
- Multi-language Support: Process documents in different languages
- Advanced Chunking: Semantic chunking based on content structure
- User Authentication: Multi-user support with document privacy
- API Rate Limiting: Production-grade rate limiting
- Caching Layer: Redis integration for improved performance
- Advanced Analytics: Document usage analytics and insights
- Custom Embeddings: Fine-tuned embeddings for specific domains
This project is currently not accepting contributions. See CONTRIBUTING.md for more information.
This project is licensed under the MIT License with commercial use restrictions. See LICENSE file for details.
For questions about this project or professional opportunities, please reach out through GitHub Issues or LinkedIn.
Built with โค๏ธ for AI Engineer Portfolio