Skip to content

jena-rdf-delta module proposal: Enhanced dataset synchronization and high availability #3191

@plturrell

Description

@plturrell

Version

Jena 5.5.0-SNAPSHOT

Feature

Overview

This proposal concerns adding a new jena-rdf-delta module to Apache Jena for enhanced dataset synchronization and high availability features.

Important Note About Existing rdf-delta
I understand that an existing https://github.com/afs/rdf-delta project exists in the wider Jena ecosystem. This proposal aims to discuss how this enhancement relates to the existing project and whether:

  1. These enhancements should be contributed to the existing project instead
  2. The existing project should be considered for integration into core Jena
  3. This should be a separate module with different goals

Motivation
Distributed Jena deployments need robust synchronization and high availability mechanisms. This module aims to provide enhancements to RDF Delta's capabilities.

Key Features

  • Optimized patch compression for reduced bandwidth usage
  • Advanced conflict resolution strategies
  • Improved high-availability configuration
  • Enhanced performance for patch application
  • Better recovery mechanisms

Benefits

  • More efficient synchronization across distributed datasets
  • Improved resilience in high-availability configurations
  • Reduced network overhead through optimized patches
  • Better handling of conflict scenarios
  • Simplified client API

Integration with Jena
The module provides a simplified API for connecting to Delta servers:
// Simplified connection API
DeltaConnection conn = DeltaClientBuilder.createOptimizedConnection(
"http://localhost:1066/", "zone", "ds1");

// Use normally
conn.update(txn -> {
Graph graph = txn.getDefaultGraph();
graph.add(Triple.create(subject, predicate, object));
});

Relationship to Existing rdf-delta
I'd like to understand how the Jena community would prefer to handle this:

  1. Should these enhancements be contributed to the existing rdf-delta project?
  2. Is there a plan to bring rdf-delta into core Jena?
  3. Are these enhancements different enough to warrant a separate module?

Questions

  1. What is the preferred approach for contributing these enhancements?
  2. What features would be most valuable to the community?
  3. How should this relate to the existing rdf-delta project?
  4. What testing and documentation would be needed?

Next Steps
Based on community feedback, I would propose:

  1. Coordinating with maintainers of the existing rdf-delta project
  2. Determining the best approach for contribution
  3. Running performance benchmarks for new features
  4. Submitting appropriate pull requests based on consensus

Are you interested in contributing a solution yourself?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions