Framework-agnostic contracts for OCR, ETL, and document processing capabilities.
The DataProcessor package provides interface-only contracts for specialized data processing tasks. This is a pure interface package - all concrete implementations must be provided in the application layer (apps/Atomy) due to vendor SDK dependencies.
- OCR/Document Recognition: Extract structured data from images and PDFs
- Document Classification: Identify document types (invoice, receipt, contract, ID)
- Data Transformation: Format conversion and normalization
- Batch Processing: High-volume document processing queues
- Multi-Language Support: Process documents in various languages
- Confidence Scoring: Validation thresholds for extracted data
DocumentRecognizerInterface- OCR service contractDocumentParserInterface- Structured data extractionDocumentClassifierInterface- Document type identificationDataTransformerInterface- Data format conversionsDataValidatorInterface- Extracted data validationBatchProcessorInterface- Bulk document processing
ProcessingResult- OCR output with confidence scoresDocumentMetadata- Document properties (type, size, MIME)ExtractionConfidence- Confidence score (0-100%)
This package provides ONLY contracts. Vendor-specific implementations (Azure Cognitive Services, AWS Textract, Google Vision API) must be created in the application layer.
Recommended OCR vendors for implementation in apps/Atomy:
- Azure Cognitive Services - Form Recognizer, OCR
- AWS Textract - Document analysis, forms, tables
- Google Cloud Vision API - OCR, label detection
- Tesseract OCR - Open-source (lower accuracy)
// In application layer (Atomy), inject DocumentRecognizerInterface
use Nexus\DataProcessor\Contracts\DocumentRecognizerInterface;
public function __construct(
private readonly DocumentRecognizerInterface $ocr
) {}
public function processInvoice(string $filePath): array
{
$result = $this->ocr->recognizeDocument($filePath, 'invoice');
if ($result->getConfidence() < 80) {
// Queue for manual review
$this->queueForReview($result);
}
return $result->getExtractedData();
}This package is consumed by:
Nexus\Payable- for vendor bill OCR processingNexus\Receivable- for customer document processingNexus\Hrm- for employee document verificationNexus\Procurement- for PO/GR document scanning
This package integrates with:
Nexus\Storage- for document archiving (REQUIRED)Nexus\AuditLogger- for processing audit trails (REQUIRED)Nexus\Notifier- for processing completion notifications (REQUIRED)
- OCR processing: < 10s per document (single page) via async queue
- Batch processing: 100 documents per hour minimum
- Image preprocessing: < 2s per document
- Document classification: < 1s per document
- Small business: Basic OCR for common document types (< 100 docs/month)
- Medium business: Advanced OCR with field mapping and validation (100-1000 docs/month)
- Large enterprise: ML-powered OCR with continuous learning (1000+ docs/day)
// In apps/Atomy/app/Services/AzureOcrAdapter.php
namespace App\Services;
use Nexus\DataProcessor\Contracts\DocumentRecognizerInterface;
use Azure\AI\FormRecognizer\FormRecognizerClient;
final class AzureOcrAdapter implements DocumentRecognizerInterface
{
public function __construct(
private readonly FormRecognizerClient $client
) {}
public function recognizeDocument(string $filePath, string $documentType): ProcessingResult
{
// Azure-specific implementation
$response = $this->client->beginRecognizeCustomFormsFromUrl($filePath);
// Transform Azure response to ProcessingResult
return new ProcessingResult(
extractedData: $this->transformAzureData($response),
confidence: $this->calculateConfidence($response)
);
}
}- Encrypt documents at rest and in transit
- Sanitize extracted data to prevent injection attacks
- Support GDPR compliance with document retention and deletion policies
- Log all document access and processing events
- Implement rate limiting for OCR API usage
- Getting Started Guide - Installation, prerequisites, and your first OCR integration
- API Reference - Complete interface, value object, and exception documentation
- Integration Guide - Laravel and Symfony integration with vendor adapter examples
- Basic Usage - Simple OCR processing with confidence validation
- Advanced Usage - Multi-vendor fallback, batch processing, custom validation
- Requirements - Detailed package requirements (24 requirements, 87.5% complete)
- Implementation Summary - Development progress, metrics, and design decisions
- Test Suite Summary - Testing strategy for contract-only packages
- Valuation Matrix - Package value assessment ($475,000 estimated value)
- Package Type: Pure contract package (interface-only)
- Lines of Code: 196 lines across 5 files
- Dependencies: Zero external dependencies (PHP 8.3+ only)
- Framework: Framework-agnostic
MIT License - see LICENSE file for details.