Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding map type read support #914

Merged
merged 6 commits into from
Mar 6, 2023

Conversation

davidrabinowitz
Copy link
Member

@davidrabinowitz davidrabinowitz commented Mar 2, 2023

Currently reading ARRAY<STRUCT<key,value>> fields into Spark Map is supported. Write and documentation will be added in additional PR, in order to keep them (relatively) small

@davidrabinowitz
Copy link
Member Author

/gcbrun

@sonatype-lift
Copy link

sonatype-lift bot commented Mar 2, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/spark-bigquery-connector/914.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/spark-bigquery-connector/914.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gcbrun

1 similar comment
@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz davidrabinowitz self-assigned this Mar 4, 2023
@davidrabinowitz davidrabinowitz changed the title Adding map type support Adding map type read support Mar 6, 2023
Field key = subFields.get("key");
Field value = subFields.get("value");
MapType mapType = DataTypes.createMapType(convert(key).dataType(), convert(value).dataType());
return Optional.of(new StructField(field.getName(), mapType, /* nullable */ false, metadata));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is nullable always false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is a REPEATED field, not NULLABLE. There are some compromises we need to do as BigQuery has no native MAP type

MapType mapType = DataTypes.createMapType(convert(key).dataType(), convert(value).dataType());
return Optional.of(new StructField(field.getName(), mapType, /* nullable */ false, metadata));
} catch (IllegalArgumentException e) {
// no "key" or "value" fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if subfields contains "key" and "value" instead of try..catch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed that

@davidrabinowitz
Copy link
Member Author

/gcbrun

@davidrabinowitz
Copy link
Member Author

/gcbrun

e18cheng added a commit to ascend-io/spark-bigquery-connector that referenced this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants