Context & Problem Statement
Currently, Reitti processes data imports (GPX/Google JSON) and API batches by writing directly to our production tables. This presents a few risks:
- Lack of Resiliency: If the server crashes mid-import, we end up with partial/corrupt data.
- No Undo: Once an import starts, it is difficult to stop or revert before it affects the user's timeline.
- Contention: High-volume imports and real-time API points can cause lock contention on the same tables.
Proposed Solution: The "Staging-to-Promotion" Workflow
I will implement a job system using JobRunr (OSS) to manage all data entry points. We will introduce a Staging Table to act as a buffer before data is promoted to the final timeline.
Architectural Overview
The system relies on three pillars:
- Staging Table (
staged_gps_points): A lightweight table with minimal indexing to store raw data during the parsing phase.
- JobRunr OSS: The background job engine. It handles scheduling, retries, and persistence using our existing SQL database.
- Manual Sequential Lock: Since JobRunr OSS does not support named Mutexes, I will implement a manual database lock mechanism to ensure only one promotion job runs at a time.
The Data Lifecycle
1. Ingestion Phase (Parallel)
- Files: User uploads a file. We stream-parse it into
staged_gps_points tagged with a job_id.
- API Batches: The
PointCollector accumulates 1,000 points and flushes them into staged_gps_points, immediately triggering a promotion job.
2. The Grace Period (Cancelable)
- For file imports, once parsing is complete, the promotion step is scheduled with a 5-minute delay.
- If a user cancels during this window, the scheduled job is deleted via
jobScheduler.delete(jobId) and staging rows are purged.
3. Promotion Phase (Sequential & Resilient)
- The job wakes and attempts to acquire a database-level lock.
- Data is moved via
INSERT INTO final_table SELECT ... FROM staged_gps_points WHERE job_id = X.
- This phase is non-cancelable to maintain transactional integrity.
4. Enrichment Phase (Background)
- Upon successful promotion, a continuation job is fired for reverse geocoding. This runs in parallel as it does not block the primary import flow.
Job Monitoring and Progress Estimation
Since JobRunr OSS does not provide a native progress tracking field or custom UI, I will implement the following:
- Job Visibility: I will query the
StorageProvider directly to retrieve lists of SCHEDULED (pending) and PROCESSING (running) jobs for the Reitti dashboard.
- Progress Tracking: Progress will be calculated by comparing the row count in
staged_gps_points for a specific job_id against the total point count recorded during the initial ingestion phase.
Implementation Plan
Example Implementation Sketch
// Scheduling an import with a 5-minute grace period
public void startImport(InputStream stream, Long userId) {
String jobId = jobScheduler.enqueue(() -> stagingService.parse(stream, userId));
jobScheduler.schedule(
Instant.now().plus(5, ChronoUnit.MINUTES),
() -> promotionService.promote(jobId)
);
}
// Sequential promotion using a manual lock
public void promote(String jobId) {
lockService.executeWithLock("IMPORT_LOCK", () -> {
promotionService.moveDataToFinal(jobId);
jobScheduler.enqueue(() -> enrichmentService.process(jobId));
});
}
// Fetching pending jobs for the UI
public List<JobRecord> getPendingJobs() {
return storageProvider.getJobs(StateName.SCHEDULED, PageRequest.ascOnUpdatedAt(100));
}
Why this is a win for Reitti
- Stability: JobRunr ensures that no imports are lost if the application restarts.
- Scalability: The staging table prevents heavy write operations from affecting production table performance.
- Data Integrity: The sequential lock and staging pattern prevent partial imports and database bloat.
- User Control: The grace period allows for easy cancellation of accidental uploads.
Context & Problem Statement
Currently, Reitti processes data imports (GPX/Google JSON) and API batches by writing directly to our production tables. This presents a few risks:
Proposed Solution: The "Staging-to-Promotion" Workflow
I will implement a job system using JobRunr (OSS) to manage all data entry points. We will introduce a Staging Table to act as a buffer before data is promoted to the final timeline.
Architectural Overview
The system relies on three pillars:
staged_gps_points): A lightweight table with minimal indexing to store raw data during the parsing phase.The Data Lifecycle
1. Ingestion Phase (Parallel)
staged_gps_pointstagged with ajob_id.PointCollectoraccumulates 1,000 points and flushes them intostaged_gps_points, immediately triggering a promotion job.2. The Grace Period (Cancelable)
jobScheduler.delete(jobId)and staging rows are purged.3. Promotion Phase (Sequential & Resilient)
INSERT INTO final_table SELECT ... FROM staged_gps_points WHERE job_id = X.4. Enrichment Phase (Background)
Job Monitoring and Progress Estimation
Since JobRunr OSS does not provide a native progress tracking field or custom UI, I will implement the following:
StorageProviderdirectly to retrieve lists ofSCHEDULED(pending) andPROCESSING(running) jobs for the Reitti dashboard.staged_gps_pointsfor a specificjob_idagainst the total point count recorded during the initial ingestion phase.Implementation Plan
jobrunr-spring-boot-3-starter.staged_gps_pointstable.job_lockstable or aSELECT FOR UPDATEmechanism for sequential promotion.StagingService: Handle high-speed batch inserts usingJdbcTemplate.PromotionService: Implement promotion logic with manual locking.PointCollectorto hand off batches to the staging pipeline.Example Implementation Sketch
Why this is a win for Reitti