Skip to content

Implement SnapshotValidator #2243

@CTTY

Description

@CTTY

Is your feature request related to a problem or challenge?

We need a mechanism to validate new snapshots to avoid conflict, right now there isn't a way to do this.

Describe the solution you'd like

SnapshotValidator trait:

pub(crate) trait SnapshotValidator {
    /// Validates a snapshot against a table.
    ///
    /// # Arguments
    ///
    /// * `base` - The base table to validate against
    /// * `parent_snapshot_id` - The ID of the parent snapshot, if any. This is usually
    ///   the latest snapshot of the base table, unless it's a non-main branch
    ///   (note: writing to branches is not currently supported)
    ///
    /// # Returns
    ///
    /// A `Result` indicating success or an error if validation fails
    async fn validate(&self, _base: &Table, _parent_snapshot_id: Option<i64>) -> Result<()> {
        Ok(())
    }

    /// Retrieves the history of snapshots between two points with matching operations and content type.
    ///
    /// # Arguments
    ///
    /// * `base` - The base table to retrieve history from
    /// * `from_snapshot_id` - The starting snapshot ID (exclusive), or None to start from the beginning
    /// * `to_snapshot_id` - The ending snapshot ID (inclusive)
    /// * `matching_operations` - Set of operations to match when collecting snapshots
    /// * `manifest_content_type` - The content type of manifests to collect
    ///
    /// # Returns
    ///
    /// A tuple containing:
    /// * A vector of manifest files matching the criteria
    /// * A set of snapshot IDs that were collected
    ///
    /// # Errors
    ///
    /// Returns an error if the history between the snapshots cannot be determined
    async fn validation_history(
        &self,
        base: &Table,
        from_snapshot_id: Option<i64>,
        to_snapshot_id: i64,
        matching_operations: &HashSet<Operation>,
        manifest_content_type: ManifestContentType,
    ) -> Result<(Vec<ManifestFile>, HashSet<i64>)> {
    }

    /// Validates that there are no new delete files for the given data files.
    ///
    /// # Arguments
    ///
    /// * `base` - The base table to validate against
    /// * `from_snapshot_id` - The starting snapshot ID (exclusive), or None to start from the beginning
    /// * `to_snapshot_id` - The ending snapshot ID (inclusive), or None if there is no current table state
    /// * `data_files` - The data files to check for conflicting delete files
    /// * `ignore_equality_deletes` - Whether to ignore equality deletes and only check for positional deletes
    ///
    /// # Returns
    ///
    /// A `Result` indicating success or an error if validation fails
    ///
    /// # Errors
    ///
    /// Returns an error if new delete files are found for any of the data files
    async fn validate_no_new_deletes_for_data_files(&self,
        base: &Table,
        from_snapshot_id: Option<i64>,
        to_snapshot_id: Option<i64>,
        data_files: &[DataFile],
        ignore_equality_deletes: bool,
    ) -> Result<()> {

Willingness to contribute

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions