Description
The current file-based implementation of the curation task queue (DBTaskQueue) presents several critical issues in production environment
Problems Identified
Data Loss in Multi-Instance Deployments or parallel processing
- Multiple DSpace instances writing to the same file system cause race conditions
- Queue entries get overwritten when multiple processes access the same queue file simultaneously
- No atomic operations guarantee data consistency
Poor Concurrency Management
- No proper locking mechanism between different DSpace instances
- Tasks can be processed multiple times by different instances
- Queue state becomes inconsistent across cluster nodes
Performance Degradation
- File I/O operations become bottleneck under high load
- No indexing capabilities for efficient queue searching
- Linear scan through entire file for queue operations
- Difficult to monitor queue status through standard database tools
- No transaction support for queue operations
- Manual file system cleanup required
- Backup and recovery procedures are complex
- Migration is more difficult since the queue is stored in separate files on the filesystem (legacy approach)
Proposed Solution:
Migrate the curation task queue system to use database tables with proper:
- Concurrent access control through database locking
- Performance optimization
- Integration with existing DSpace database infrastructure
Description
The current file-based implementation of the curation task queue (DBTaskQueue) presents several critical issues in production environment
Problems Identified
Data Loss in Multi-Instance Deployments or parallel processing
Poor Concurrency Management
Performance Degradation
Proposed Solution:
Migrate the curation task queue system to use database tables with proper: