Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg ITs #219

Merged
merged 21 commits into from
Nov 14, 2023
Merged

Iceberg ITs #219

merged 21 commits into from
Nov 14, 2023

Conversation

vamshigv
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

  1. Adds Integration tests for Conversion from Iceberg.
  2. Built on Iceberg Data generation + IT source tests #201

Brief change log

  1. Adds Integration tests for Conversion from Iceberg.
  2. Built on Iceberg Data generation + IT source tests #201

Verify this pull request

(Please pick either of the following options)

This change added tests and can be verified as follows:

@vamshigv vamshigv changed the base branch from iceberg_source_it to main November 12, 2023 00:49
@vamshigv
Copy link
Contributor Author

@the-other-tim-brown Please take a look.

@@ -255,14 +280,11 @@ private Object generateRandomValueForType(
return LocalDate.ofEpochDay(randomDay);
case TIME:
long totalMicrosInDay = ChronoUnit.DAYS.getDuration().toMillis() * 1000;
long randomTimeInMicros = ThreadLocalRandom.current().nextLong(totalMicrosInDay);
return randomTimeInMicros;
return ThreadLocalRandom.current().nextLong(totalMicrosInDay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use RANDOM here instead of ThreadLocalRandom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have method to generate long in range, now moved to generating double and used multiplier.

return icebergDataHelper.getTableSchema().columns().stream()
.map(Types.NestedField::name)
.filter(name -> !name.equals("timestamp_local_micros_nullable_field"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check if there is a way for Hudi to read this back as the logical type? Do the values read back as timestamps in Delta?

Copy link
Contributor Author

@vamshigv vamshigv Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For hudi there is a bug, linked in the comments and it doesn't work for Delta too and I see spark 3.2 has no support for that granularity and coverting to TimestampType doesn't work on the read side (throws errors in parquet read).

@vamshigv vamshigv merged commit 252b4f4 into main Nov 14, 2023
1 check passed
@vamshigv vamshigv deleted the iceberg_its branch November 14, 2023 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants