Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the equality delete writer #341

Open
Tracked by #346
ZENOTME opened this issue Apr 23, 2024 · 5 comments · May be fixed by #372
Open
Tracked by #346

Implement the equality delete writer #341

ZENOTME opened this issue Apr 23, 2024 · 5 comments · May be fixed by #372
Assignees

Comments

@ZENOTME
Copy link
Contributor

ZENOTME commented Apr 23, 2024

After we finish #275, we can implement the equality delete writer based on this framework.

There is a rust implementation that can be referred to in icelake. But better design is acceptable.

related spec:
https://iceberg.apache.org/spec/#equality-delete-files

@Dysprosium0626
Copy link
Contributor

Hi @ZENOTME, Maybe I can take this issue after you complete #345

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Apr 24, 2024

Hi @ZENOTME, Maybe I can take this issue after you complete #345

Sure! Thanks!

@liurenjie1024
Copy link
Collaborator

Assigned to you, thanks @Dysprosium0626 !

@Dysprosium0626
Copy link
Contributor

Hi I nearly complete adding EqualityDeleteWriter but I encounter some problem.
My impl is here: https://github.com/Dysprosium0626/iceberg-rust/blob/add_equality_delete_writer/crates/iceberg/src/writer/base_writer/equality_delete_writer.rs

Basically, in my test case, I write some schema to build up a ParquetWriterBuilder and pass it into EqualityDeleteFileWriterBuilder.

        // prepare writer
        let pb = ParquetWriterBuilder::new(
            WriterProperties::builder().build(),
            to_write.schema(),
            file_io.clone(),
            location_gen,
            file_name_gen,
        );
        let equality_ids = vec![1, 3];
        let mut equality_delete_writer = EqualityDeleteFileWriterBuilder::new(pb)
            .build(EqualityDeleteWriterConfig::new(
                equality_ids,
                schema.clone(),
                PARQUET_FIELD_ID_META_KEY,
            ))
            .await?;

The FieldProjector will filter columns in schema by the equality_ids and I tried to generate a delete_schema with fields after projection.

    async fn build(self, config: Self::C) -> Result<Self::R> {
        let (projector, fields) = FieldProjector::new(
            config.schema.fields(),
            &config.equality_ids,
            &config.column_id_meta_key,
        )?;
        let delete_schema = Arc::new(arrow_schema::Schema::new(fields));
        Ok(EqualityDeleteFileWriter {
            inner_writer: Some(self.inner.clone().build().await?),
            projector,
            delete_schema,
            equality_ids: config.equality_ids,
        })
    }

The problem is I cannot pass the delete_schema to FileWriterBuilder(ParquetWriterBuilder in this case), and the schema for inner writer is the old version(without projection), so the inner writer canno write file with properly.
Do you have any ideas? @ZENOTME

@ZENOTME
Copy link
Contributor Author

ZENOTME commented May 6, 2024

Thanks! @Dysprosium0626 Sorry for replying late. Our original idea here is to construct the delete schema outside the EqualityDeleteFileWriter.

 let equality_ids = vec![1, 3];
 let delete_schema = ...;
 let pb = ParquetWriterBuilder::new(
            WriterProperties::builder().build(),
            delete_schema,
            file_io.clone(),
            location_gen,
            file_name_gen,
 );
 let mut equality_delete_writer = EqualityDeleteFileWriterBuilder::new(pb)
          .build(EqualityDeleteWriterConfig::new(
                equality_ids,
                PARQUET_FIELD_ID_META_KEY,
            ))
            .await?;

Looks like the schema always can be determined before we build the writer rather than "run time".

@Dysprosium0626 Dysprosium0626 linked a pull request May 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants