A powerful TypeScript backend for document AI processing, featuring PDF text extraction, semantic search, AI-powered Q&A, and user authentication with email verification.
- Features
- Tech Stack
- API Endpoints
- Database Schema
- Services
- Email Service
- Installation & Setup
- Environment Variables
- Usage Examples
- Rate Limiting
- Deployment
- Troubleshooting
- PDF Processing: Extract text from PDF documents using pdf2json
- Semantic Search: Vector-based document search using embeddings
- AI Q&A: GPT-4 powered question answering based on document content
- User Authentication: JWT-based auth with email verification
- Rate Limiting: Smart rate limiting for anonymous and authenticated users
- File Upload: Secure file upload with multer
- Email Service: Nodemailer integration for account activation
- Database: PostgreSQL with Supabase integration
- TypeScript: Full TypeScript support with type safety
- Runtime: Node.js with TypeScript
- Framework: Express.js
- Database: PostgreSQL (Supabase)
- AI/ML: OpenAI GPT-4, Xenova Transformers
- Email: Nodemailer with Gmail
- File Processing: pdf2json, multer
- Authentication: JWT, bcrypt
- Vector Search: Custom cosine similarity implementation
Register a new user account.
Request Body:
{
"name": "John Doe",
"email": "john@example.com",
"password": "securepassword"
}
Response:
{
"message": "Signup successful! Check your email to activate.",
"userId": 123,
"emailSent": true,
"messageId": "email-message-id"
}
Authenticate user and get JWT token.
Request Body:
{
"email": "john@example.com",
"password": "securepassword"
}
Response:
{
"message": "Login successful",
"token": "jwt-token-here",
"user": {
"id": 123,
"name": "John Doe",
"email": "john@example.com"
}
}
Activate user account using email token.
Response:
{
"status": "success",
"message": "β
Hi John, your account is activated! You can now log in."
}
Upload and process PDF documents.
Request: Multipart form data
file
: PDF fileuserId
: (optional) User ID for authenticated uploads
Response:
{
"message": "β
Document uploaded successfully",
"document": {
"id": 456,
"filename": "document.pdf",
"fileType": "application/pdf",
"fileUrl": "/uploads/filename.pdf"
},
"chunks": 15
}
Query documents using AI-powered semantic search.
Request Body:
{
"documentId": 456,
"query": "What is the main topic of this document?",
"limit": 5,
"userId": 123
}
Response:
{
"answer": "Based on the document content, the main topic is...",
"sources": [
{
"chunk_text": "Relevant text chunk...",
"similarity": 0.85
}
],
"queryId": 789,
"rateLimitInfo": {
"limit": 50,
"remaining": 49,
"resetTime": "2024-01-02T00:00:00.000Z"
}
}
Get current rate limit status.
Response:
{
"limit": 50,
"remaining": 45,
"resetTime": "2024-01-02T00:00:00.000Z",
"requiresAuth": false
}
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
active BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
filename VARCHAR(255) NOT NULL,
file_type VARCHAR(100) NOT NULL,
file_url VARCHAR(500) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE document_chunks (
id SERIAL PRIMARY KEY,
document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
chunk_text TEXT NOT NULL,
embedding TEXT NOT NULL, -- JSON string of vector
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE query_usage (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
ip_address VARCHAR(45) NOT NULL,
document_id INTEGER,
query TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE chat_history (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id) ON DELETE CASCADE,
document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
query TEXT NOT NULL,
answer TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
- Purpose: Generate AI-powered answers using OpenAI GPT-4
- Model: GPT-4o-mini for cost efficiency
- Features: Context-aware responses based on document content
- Purpose: Generate text embeddings for semantic search
- Model: Xenova/all-MiniLM-L6-v2 (384 dimensions)
- Features: Singleton pattern for performance, mean pooling
- Purpose: Semantic document search using cosine similarity
- Features: Vector-based search, configurable result limits
- Purpose: Manage API usage limits
- Limits:
- Anonymous users: 3 queries/day
- Authenticated users: 50 queries/day
- Features: IP-based tracking, daily reset
- Purpose: Handle file metadata storage
- Features: File validation, metadata management
The email service uses Nodemailer with Gmail SMTP:
const transporter = nodemailer.createTransporter({
service: "gmail",
auth: {
user: "your-email@gmail.com",
pass: "your-app-password" // Gmail App Password
}
});
- Account Activation: Send activation emails during signup
- Email Verification: Verify email addresses before account activation
- Error Handling: Comprehensive error logging and fallback responses
- Template Support: HTML email templates with dynamic content
- Enable 2-Factor Authentication on your Gmail account
- Generate an App Password:
- Go to Google Account β Security β 2-Step Verification β App passwords
- Generate password for "Mail"
- Use the App Password in your environment variables
Test email functionality with the test endpoint:
curl -X POST http://localhost:10000/api/auth/test-email \
-H "Content-Type: application/json" \
-d '{"testEmail": "test@example.com"}'
- Node.js 18+
- PostgreSQL database (or Supabase)
- Gmail account for email service
- OpenAI API key
git clone <repository-url>
cd DocAIBackend
npm install
Create a .env
file in the root directory:
# Database Configuration
DB_HOST=aws-1-ap-south-1.pooler.supabase.com
DB_PORT=6543
DB_NAME=postgres
DB_USER=postgres.your-project-ref
DB_PASSWORD=your-database-password
# Supabase Configuration
SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your-supabase-anon-key
# JWT Secret
JWT_SECRET=your-super-secret-jwt-key
# OpenAI API
OPENAI_API_KEY=your-openai-api-key
# Email Configuration
EMAIL_USER=your-email@gmail.com
EMAIL_PASS=your-gmail-app-password
# Server Configuration
PORT=10000
NODE_ENV=development
npm run migrate
npm run dev
npm run build
npm start
- Limit: 3 queries per day
- Tracking: IP address based
- Reset: Daily at midnight UTC
- Limit: 50 queries per day
- Tracking: User ID based
- Reset: Daily at midnight UTC
All responses include rate limit information:
{
"limit": 50,
"remaining": 45,
"resetTime": "2024-01-02T00:00:00.000Z",
"requiresAuth": false
}
Ensure all environment variables are set in your deployment platform:
# Production Database
DB_HOST=your-production-db-host
DB_PORT=5432
DB_NAME=your-db-name
DB_USER=your-db-user
DB_PASSWORD=your-db-password
# Production URLs
SUPABASE_URL=https://your-project.supabase.co
OPENAI_API_KEY=your-openai-key
JWT_SECRET=your-production-jwt-secret
- Connect your GitHub repository
- Set environment variables in Vercel dashboard
- Deploy automatically on push
- Connect your repository
- Set environment variables
- Use Node.js buildpack
- Set build command:
npm run build
- Set start command:
npm start
- Connect your repository
- Set environment variables
- Deploy automatically
If you encounter IPv6 connection errors:
- Ensure your database host supports IPv4
- Add
family: 4
to your database configuration - Use connection string with IPv4 address
Error: connect ENETUNREACH [IPv6 address]
Solution: Force IPv4 connection in database config:
const pool = new Pool({
// ... other config
family: 4, // Force IPv4
});
Error: SASL: SCRAM-SERVER-FIRST-MESSAGE: client password must be a string
Solution:
- Use Gmail App Password instead of regular password
- Ensure
DB_PASSWORD
is set in environment variables
Error: Invalid API key
Solution:
- Verify
OPENAI_API_KEY
is set correctly - Check API key has sufficient credits
- Ensure API key has access to GPT-4
Error: Unsupported file type
Solution:
- Only PDF and plain text files are supported
- Check file MIME type is correct
- Ensure file is not corrupted
Enable debug logging by setting:
NODE_ENV=development
DEBUG=true
Check application logs for detailed error information:
# Development
npm run dev
# Production
npm start
- Sign up a new user:
curl -X POST http://localhost:10000/auth/signup \
-H "Content-Type: application/json" \
-d '{
"name": "John Doe",
"email": "john@example.com",
"password": "securepassword"
}'
- Activate account (click link in email)
- Login:
curl -X POST http://localhost:10000/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "john@example.com",
"password": "securepassword"
}'
- Upload a PDF:
curl -X POST http://localhost:10000/file/upload \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-F "file=@document.pdf" \
-F "userId=123"
- Query the document:
curl -X POST http://localhost:10000/query/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d '{
"documentId": 456,
"query": "What is the main topic of this document?",
"userId": 123
}'
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the ISC License.
For support and questions:
- Create an issue in the repository
- Check the troubleshooting section
- Review the API documentation