Skip to content

Conversation

@jaidisido
Copy link
Contributor

Issue #, if available:
#672

Description of changes:
Added some enhancements to previous PR #673

  • Leverage the catalog_table_input details when available to extract serde info from existing catalog tables
  • Extend the logic to the add_csv_partitions method

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jaidisido
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: 09ec000
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Comment on lines +160 to +165
serde_info = {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
if serde_library is None
else serde_library,
"Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we missed this method in the previous PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxispeicher, apologies, I am not sure if I already tagged you to this PR or not for review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.

Comment on lines +504 to +508
serde_info: Dict[str, Any] = {}
if catalog_table_input:
serde_info = catalog_table_input["StorageDescriptor"]["SerdeInfo"]
serde_library: Optional[str] = serde_info.get("SerializationLibrary", None)
serde_parameters: Optional[Dict[str, str]] = serde_info.get("Parameters", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the to_csv method, instead of passing None to the create_ csv_table and add_csv_partitions method, we extract the serde info from the existing catalog table if it exists

Comment on lines +475 to +478
wr.catalog.add_csv_partitions(
database=glue_database,
table=glue_table,
partitions_values=response["partitions_values"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the test to include add_csv_partitions

@jaidisido
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-sDRE8Pq0duHT
  • Commit ID: 979f728
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@maxispeicher maxispeicher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for adding the missing parts 🙂

Comment on lines +160 to +165
serde_info = {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
if serde_library is None
else serde_library,
"Parameters": {"field.delim": sep, "escape.delim": "\\"} if serde_parameters is None else serde_parameters,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. If so, I completely missed it 🙈
I also thought that I did include it but seems like I forgot about it eventually 😓.

@jaidisido jaidisido merged commit dff4aa6 into main May 5, 2021
@jaidisido jaidisido deleted the feat-672-add-serde-info branch May 5, 2021 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants