Skip to content

Curation Queue Improvement/Refactoring #543

@steph-ieffam

Description

@steph-ieffam

Description

The current file-based implementation of the curation task queue (DBTaskQueue) presents several critical issues in production environment

Problems Identified

Data Loss in Multi-Instance Deployments or parallel processing

  • Multiple DSpace instances writing to the same file system cause race conditions
  • Queue entries get overwritten when multiple processes access the same queue file simultaneously
  • No atomic operations guarantee data consistency

Poor Concurrency Management

  • No proper locking mechanism between different DSpace instances
  • Tasks can be processed multiple times by different instances
  • Queue state becomes inconsistent across cluster nodes

Performance Degradation

  • File I/O operations become bottleneck under high load
  • No indexing capabilities for efficient queue searching
  • Linear scan through entire file for queue operations
  • Difficult to monitor queue status through standard database tools
  • No transaction support for queue operations
  • Manual file system cleanup required
  • Backup and recovery procedures are complex
  • Migration is more difficult since the queue is stored in separate files on the filesystem (legacy approach)

Proposed Solution:

Migrate the curation task queue system to use database tables with proper:

  • Concurrent access control through database locking
  • Performance optimization
  • Integration with existing DSpace database infrastructure

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions