SuiVerify is a comprehensive identity verification system built with Python FastAPI that provides secure offline Aadhaar verification with OCR and face recognition. The system is designed for KYC (Know Your Customer) compliance and supports multiple verification workflows.
- Framework: FastAPI 0.104.1
- Server: Uvicorn with auto-reload
- Database: MongoDB with Motor (async driver)
- Cache/Queue: Redis 5.0.0
- Computer Vision: OpenCV, Tesseract OCR, DeepFace, TensorFlow
- Authentication: JWT tokens with python-jose
verification-backend/
├── main.py # FastAPI application entry point
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── app/
│ ├── database/ # Database connection and initialization
│ │ ├── __init__.py
│ │ ├── connection.py # MongoDB connection management
│ │ └── init_db.py # Database initialization
│ ├── models/ # Data models and schemas
│ │ ├── schemas.py # Pydantic models for API
│ │ ├── user.py # User data models
│ │ └── encryption_metadata.py # Encryption metadata models
│ ├── routers/ # API route handlers
│ │ ├── aadhar.py # Aadhaar OCR endpoints
│ │ ├── face.py # Face recognition endpoints
│ │ ├── user.py # User management endpoints
│ │ ├── kyc.py # Complete KYC workflow
│ │ ├── encryption.py # Encryption metadata management
│ │ └── credentials.py # Credential management
│ └── services/ # Business logic services
│ ├── ocr_service.py # Aadhaar OCR processing
│ ├── face_recognition_service.py # Face matching algorithms
│ ├── user_service.py # User management
│ ├── encryption_service.py # Data encryption
│ ├── redis_service.py # Redis operations
│ └── kafka_service.py # Message queue service
- OCR Extraction: Extracts personal information from Aadhaar cards
- Photo Extraction: Isolates face photo from Aadhaar document
- Data Validation: Validates extracted data format and integrity
- Supported Fields: Name, DOB, Gender, Phone, Address, Aadhaar Number
- Service: PAN OCR uses Tesseract (via pytesseract) and an ensemble of preprocessing steps to robustly extract PAN card fields across formats.
- Supported Fields: PAN Number, Cardholder Name, Father's Name, Date of Birth, extracted photo (base64), raw OCR text for debugging.
- Endpoint: POST
/api/pan/extract-pan-data(multipart/form-data withfilefield).
Usage notes:
- The service performs several image preprocessing strategies (contrast/brightness adjustments, adaptive thresholding, denoising) to improve OCR accuracy on varied card images.
- The
extract_name()logic was improved to reliably pick the cardholder's name (it prefers the line immediately before the "Father's Name" label, with robust cleanup and validation). - The response preserves
raw_text(truncated in production responses) to assist debugging and monitoring.
Example response (200):
{
"success": true,
"data": {
"pan_number": "AAAAA9999A",
"name": "JOHN DOE",
"father_name": "RICHARD DOE",
"dob": "01/01/1990",
"pan_photo_base64": "<base64>",
"raw_text": "..."
},
"message": "PAN data extracted successfully"
}
Testing / Dev:
- Ensure Tesseract is installed and available on PATH (Windows: install via Scoop/Installer or set
pytesseract.pytesseract.tesseract_cmd). - Use
docs/PAN_OCR_Frontend_Test.html(moved intodocs/) for a quick browser-based upload test. - Use the provided scripts in
scripts/(e.g.,extract_pan.py) to run local tests.
Edge cases & notes:
- OCR can produce noisy text; the service implements cleaning steps and name-specific heuristics (line-before-
Fatherrule, label detection, uppercase patterns). - For best accuracy, supply high-resolution, well-lit images and avoid heavy compression.
- Consider adding confidence scores in the future and per-field validation rules (regex for PAN format:
^[A-Z]{5}[0-9]{4}[A-Z]$).
- Multi-Algorithm Support: Uses DeepFace with multiple models
- High Accuracy Mode: Advanced face matching with confidence scoring
- Live Photo Verification: Compares live selfies with Aadhaar photo
- Anti-Spoofing: Basic liveness detection capabilities
- Multi-Step Process: Aadhaar → Face Match → Verification
- Session Management: Temporary data storage during verification
- Verification Types: Supports 'above18' and 'citizenship' verification
- Audit Trail: Comprehensive logging of verification attempts
- Profile Management: User registration and profile updates
- Verification History: Track verification attempts and results
- Status Tracking: Monitor verification status and completion
- Data Encryption: Sensitive data encryption at rest
- Metadata Management: Encryption key and metadata tracking
- Secure Storage: MongoDB with encrypted collections
- JWT Authentication: Secure API access control
GET /- API health check and service statusGET /health- Detailed health check of all services
POST /users/register- User registrationGET /users/profile/{user_id}- Get user profilePUT /users/profile/{user_id}- Update user profileGET /users/{user_id}/verifications- Get verification history
POST /start-verification- Start complete KYC processPOST /complete-verification- Complete KYC verificationGET /verification-status/{session_id}- Check verification status
POST /extract- Extract data from Aadhaar imagePOST /extract-photo- Extract photo from AadhaarPOST /validate- Validate Aadhaar data format
POST /match- Compare two face imagesPOST /verify-with-aadhaar- Verify face against Aadhaar photo
POST /encryption-metadata- Store encryption metadataGET /encryption-metadata/{user_id}- Retrieve encryption metadata
# MongoDB Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
DATABASE_NAME=suiverify_db
# Redis Configuration (for caching/queuing)
REDIS_URL=redis://localhost:6379
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your_redis_password
REDIS_USERNAME=your_redis_username
REDIS_STREAM_NAME=verification_stream
REDIS_CONSUMER_GROUP=verification_group
REDIS_CONSUMER_NAME=verification_consumer- Python 3.8+ (Python 3.12+ recommended for TensorFlow 2.16+)
- MongoDB instance
- Redis instance
- Tesseract OCR installed on system
- Python 3.12+: Use TensorFlow 2.16+ (current requirements.txt)
- Python 3.8-3.11: Use TensorFlow 2.10-2.13 (change tensorflow>=2.16.0 to tensorflow>=2.10.0,<2.14.0)
# Clone and navigate to backend
cd verification-backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your configuration
# Run the server
python main.py
# Or with uvicorn directly:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload# 1. Install system dependencies
sudo apt update
sudo apt install -y \
tesseract-ocr tesseract-ocr-eng \
libglib2.0-0t64 libsm6 libxext6 libxrender-dev libgomp1 \
libgtk-3-0t64 mesa-common-dev libgl1-mesa-dev libglu1-mesa-dev \
build-essential python3-dev
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Upgrade pip and install Python packages
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
# 4. Install tf-keras if needed (for TensorFlow 2.16+)
pip install tf-keras>=2.16.0
# 5. Verify Tesseract installation
tesseract --version
# 6. Configure environment
cp .env.example .env
# Edit .env with your configuration
# 7. Run the server
python3 main.py- Server: Deploy with Gunicorn + Uvicorn workers
- Database: MongoDB Atlas or self-hosted MongoDB cluster
- Cache: Redis Cloud or self-hosted Redis
- Storage: Secure file storage for temporary images
- Monitoring: Health check endpoints for load balancer
- Security: HTTPS, rate limiting, input validation
- User profiles and basic information
- Verification status and history
- Encrypted sensitive data
- Complete audit trail of verification attempts
- Success/failure reasons
- Timestamp and session information
- Encryption keys and metadata
- User data encryption tracking
- Key rotation information
- CORS Enabled: Supports React dev servers (ports 3000, 5173, 5175)
- File Uploads: Multipart form data for image uploads
- JSON APIs: RESTful JSON responses
- Error Handling: Structured error responses
- MongoDB: Primary data storage
- Redis: Caching and message queuing
- Kafka: Optional message streaming (service available)
-
Sensitive data encryption at rest
-
Temporary image data cleanup
-
JWT token-based authentication
- Minimal data retention
- User consent tracking
- Data anonymization options
- Audit trail maintenance
- Input validation and sanitization
- Rate limiting on sensitive endpoints
- CORS configuration
- Environment variable protection
- Service-level health monitoring
- Database connection status
- External service availability
- Resource utilization tracking
- Structured logging with levels
- Verification attempt tracking
- Error logging and alerting
- Performance metrics
-
MongoDB Connection: Check URI format and network access
-
Redis Connection: Verify Redis server status and credentials
-
Tesseract OCR: Ensure Tesseract is installed and in PATH
-
Face Recognition: Check TensorFlow/DeepFace model downloads
- Image preprocessing and compression
- Database query optimization
- Redis caching strategies
- Async/await pattern usage
- Connection pooling
This backend serves as the core verification engine for the SuiVerify identity verification platform, providing secure, scalable, and compliant KYC services.