Skip to content

Data: Add TCK for Writer builder in FileFormat API#16575

Open
Guosmilesmile wants to merge 3 commits into
apache:mainfrom
Guosmilesmile:writebuilder_tck
Open

Data: Add TCK for Writer builder in FileFormat API#16575
Guosmilesmile wants to merge 3 commits into
apache:mainfrom
Guosmilesmile:writebuilder_tck

Conversation

@Guosmilesmile
Copy link
Copy Markdown
Contributor

Add TCK coverage for the missing parts of the Writer builder in the File Format API:

  • set(...)
  • setAll(...)
  • meta(key, value)
  • meta(Map)
  • overwrite()

withFileEncryptionKey(...) and withAADPrefix(...) are Parquet-only and related to encryption. They will be covered in a follow-up PR.

@github-actions github-actions Bot added the data label May 27, 2026
/** Write with engine type T without explicit engineSchema, read with Generic Record */
@ParameterizedTest
@FieldSource("FORMAT_AND_GENERATOR")
void testDataWriterEngineWriteWithoutEngineSchema(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have fix #15688. testDataWriterEngineWriteWithoutEngineSchema is the same testDataWriterEngineWriteGenericRead. Remove this ut.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove it in a different PR pls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change. I will raise a new pr to fix this when this pr is merge.


@ParameterizedTest
@FieldSource("FILE_FORMATS")
void testDataWriterSet(FileFormat fileFormat) throws IOException {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I can found the same property for all format is comress type, so test it in this part.

.isInstanceOf(AlreadyExistsException.class)
.hasMessageContaining("Already exists");

genericRecords = dataGenerator.generateRecords();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that this is the same records as previously. We might want to generate new records

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. My original intent was to use different data, but I didn't realize I was using the same seed. This time I'm generating a different number of records to produce different data.

return dataFile;
}

private static Reader newOrcReader(InputFile inputFile, Configuration conf) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create an interface for the FileFormats to collect these methods?

So every FF needs to implement these for testing, and BaseFormatModelTests doesn't include file format specific code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce a new interface FileFormatTestSupport and move file-format-specific logic into its implementations, such as compressionProperties, writeRecordsWithoutFieldIds, and supportsFeature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants