Skip to content

Conversation

@kukushking
Copy link
Contributor

Issue #787

Description of changes:
Add schema evolution check to wr.s3.to_csv

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kukushking kukushking requested a review from jaidisido July 7, 2021 10:32
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 1b64e6a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 3745e85
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Comment on lines +483 to +492

columns_types: Dict[str, str] = {}
partitions_types: Dict[str, str] = {}
if (database is not None) and (table is not None):
columns_types, partitions_types = _data_types.athena_types_from_pandas_partitioned(
df=df, index=index, partition_cols=partition_cols, dtype=dtype, index_left=True
)
if schema_evolution is False:
_check_schema_changes(columns_types=columns_types, table_input=catalog_table_input, mode=mode)

Copy link
Contributor

@jaidisido jaidisido Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it necessary to move this block outside the existing if condition for database and table on line 499? Mostly asking because it's likely to create a conflict with my Governed table branch which I would prefer to avoid :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahah, yes, I wanted to do it before _to_dataset call which in some cases will send a request to delete the files; and right after df = df[columns] if columns else df which is right when we form a "final" dataframe that we should check for schema evolution.

@jaidisido jaidisido merged commit 4edc97d into main Jul 9, 2021
@jaidisido jaidisido deleted the feat-to-csv-schema-evolution branch July 9, 2021 11:06
@kukushking kukushking self-assigned this Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants