New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Destination AWS Datalake: fix KeyError issue and add parquet support #17193
🐛 Destination AWS Datalake: fix KeyError issue and add parquet support #17193
Conversation
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Sorry the delay in review this amazing contribution @henriblancke !!! Hope to get a review until wednesday. |
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@henriblancke sorry the delay here. I didn't have time this week and I'm OOO Today. I added this to my priority to-do list on Monday. |
@marcosmarxm thanks for the update and thanks for adding this to your priority list for Monday 😄! Enjoy your day off today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henriblancke tests are failing to me, looks I can't find the database for integration tests. I need some test to check what is causing it.
@marcosmarxm sorry to hear that! What test is failing for you? Do you have All integration tests seem to pass for me locally. I may be able to help troubleshoot, let me know what test(s) are failing for you. Thanks! |
draft PR to run CI: #23855 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration tests are not passing. Here is my latest run https://github.com/airbytehq/airbyte/actions/runs/4359907106/jobs/7622249697
@grishick it looks like |
@grishick for the connector icon check, there was no icon defined in the previous version. Should I go ahead and define one? |
yes please |
airbyte-integrations/connectors/destination-aws-datalake/destination_aws_datalake/spec.json
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/destination-aws-datalake/destination_aws_datalake/spec.json
Outdated
Show resolved
Hide resolved
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@grishick let me know what the next steps here are. Thanks again for all the help! |
...e-integrations/connectors/destination-aws-datalake/destination_aws_datalake/config_reader.py
Outdated
Show resolved
Hide resolved
...e-integrations/connectors/destination-aws-datalake/destination_aws_datalake/config_reader.py
Outdated
Show resolved
Hide resolved
...e-integrations/connectors/destination-aws-datalake/destination_aws_datalake/config_reader.py
Outdated
Show resolved
Hide resolved
) | ||
if table is None: | ||
message = f"Could not create a table in database {connector_config.lakeformation_database_name}" | ||
tbl = "airbyte_test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this table name should be randomized and check
should also attempt to delete this table before exiting. Right now, subsequent test runs fail with this error (https://github.com/airbytehq/airbyte/actions/runs/4483018644):
ERROR airbyte:destination.py:147 Could not create table airbyte_*** in database airbyte-integration: AlreadyExistsException('An error occurred (AlreadyExistsException) when calling the CreateTable operation: Table already exists.')
=========================== short *** summary info ============================
FAILED integration_***s/integration_***.py::***_check_valid_config - Asser...
========================= 1 failed, 3 passed in 48.70s =========================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm weird, aws_handler.reset_table(db, tbl)
a couple of lines below this one should delete the table if it exists and all it's underlying data in s3 before it tries to create it again. Does your configured iam role or user have permissions to access/delete the table? The integration test logs show the following sequence of events:
- A call made to GetTable returning "Table not found" (user may not have permissions to access it)
- Calls made to s3, ListObjects, PutObject, etc (writing data to s3)
- A call made to CreateTable returning "Table already exists" (error could be caused by table existing but user not being able to access it)
- A call made to DeleteTable returning "Table not found"
Makes me think that for some reason the role or user does not have permissions to access the existing table? But I may be overlooking something? I'll make sure to randomize the table name in my next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A scenario that the integration test should account for is that a previous run did not properly cleanup (it could be cancelled, interrupted, failed due to a hosting problem, CI problem or anything else).
Another problem is that CHECK
should not assume that airbyte_test
table does not exist in the destination, because a user could be using multiple connections to write to the same destination. It is safer to add a random suffix/preffix to the tables created by CHECK
method to avoid a name collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense!
@henriblancke I re-ran integration tests and added several more comments based on the failures. I modified the config that we have saved in GSM in order to get passed failures caused by missing config values, and added one more comment about a failure in |
Signed-off-by: Henri Blancke <blanckehenri@gmail.com>
@grishick thanks again for the review 🚀, I really appreciate all the help here! I responded to your |
Rerunning the integration tests and CI checks: #23855 |
/test connector=connectors/destination-aws-datalake
Build PassedTest summary info:
|
Published and merged here: #23855 |
What
Describe what the change is solving
"type":"object"
, resulting in aKeyError('properties')
and breaking the syncHow
Describe the solution
This change uses
awswrangler
to get a lot of additional functionality for this destination out of the box, such as:Additional features include:
json-schema
to infer pyarrow and athena typesnamespaces
so a single destination can create multiple databases (useful for example, when you want multiple sources to use their own database).governed
vsexternal
tablesdouble
vsdecimal
).check
.🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
There are some spec and configuration changes
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Integration
Acceptance
Put your acceptance tests output here.