Add schema evolution to s3.to_csv #799

kukushking · 2021-07-07T10:25:42Z

Issue #787

Description of changes:
Add schema evolution check to wr.s3.to_csv

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jaidisido · 2021-07-07T10:37:39Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
Commit ID: 1b64e6a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-07-07T11:14:37Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
Commit ID: 3745e85
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2021-07-07T16:12:22Z

awswrangler/s3/_write_text.py

+
+        columns_types: Dict[str, str] = {}
+        partitions_types: Dict[str, str] = {}
+        if (database is not None) and (table is not None):
+            columns_types, partitions_types = _data_types.athena_types_from_pandas_partitioned(
+                df=df, index=index, partition_cols=partition_cols, dtype=dtype, index_left=True
+            )
+            if schema_evolution is False:
+                _check_schema_changes(columns_types=columns_types, table_input=catalog_table_input, mode=mode)
+


Why was it necessary to move this block outside the existing if condition for database and table on line 499? Mostly asking because it's likely to create a conflict with my Governed table branch which I would prefer to avoid :)

Hahah, yes, I wanted to do it before _to_dataset call which in some cases will send a request to delete the files; and right after df = df[columns] if columns else df which is right when we form a "final" dataframe that we should check for schema evolution.

Add schema evolution to s3.to_csv

1b64e6a

kukushking requested a review from jaidisido July 7, 2021 10:32

Merge branch 'main' into feat-to-csv-schema-evolution

3745e85

jaidisido reviewed Jul 7, 2021

View reviewed changes

jaidisido merged commit 4edc97d into main Jul 9, 2021

jaidisido deleted the feat-to-csv-schema-evolution branch July 9, 2021 11:06

jaidisido mentioned this pull request Aug 16, 2021

Update docs for awswrangler.s3.to_csv #868

Closed

kukushking self-assigned this Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add schema evolution to s3.to_csv #799

Add schema evolution to s3.to_csv #799

Uh oh!

kukushking commented Jul 7, 2021

Uh oh!

jaidisido commented Jul 7, 2021

Uh oh!

jaidisido commented Jul 7, 2021

Uh oh!

jaidisido Jul 7, 2021 •

edited

Loading

Uh oh!

kukushking Jul 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add schema evolution to s3.to_csv #799

Add schema evolution to s3.to_csv #799

Uh oh!

Conversation

kukushking commented Jul 7, 2021

Uh oh!

jaidisido commented Jul 7, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido commented Jul 7, 2021

AWS CodeBuild CI Report

Uh oh!

jaidisido Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kukushking Jul 7, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaidisido Jul 7, 2021 •

edited

Loading