Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] DynamoDb connector - refresh outdated table definition #348

Closed
yrybak opened this issue Feb 9, 2021 · 6 comments
Closed

[QUESTION] DynamoDb connector - refresh outdated table definition #348

yrybak opened this issue Feb 9, 2021 · 6 comments
Labels
question Further information is requested

Comments

@yrybak
Copy link

yrybak commented Feb 9, 2021

Hi,

I configured DynamoDB data source for DynamoDB and everything seems to works well.
New tables, created in DynamoDB, appears automatically in Athena data source tables list.

Lambda stack shows version 2020.33.1 in tags.

But I noticed for some tables that fields definition in Athena is not updated automatically. New fields added to the DynamoDB table don't appear in the list of table fields in Athena. Querying of new columns throws error.

Is there any way to refresh table definition or should it work automatically?

Thanks,
Yaroslav

@yrybak yrybak added the question Further information is requested label Feb 9, 2021
@avirtuos
Copy link
Contributor

avirtuos commented Feb 9, 2021

The tables and columns should update automatically. It is possible that your new columns are either using unsupported data types or the connector's schema inference which reads N rows from the table in order to discover columns isn't finding any records with the columns you added. As a work around, you can define your table in AWS Glue DataCatalog and the connector will use schema from Glue as a suppliment to what it finds in DDB. The Readme for the connector outlines how to do this. Let us know if this help.s

@yrybak
Copy link
Author

yrybak commented Feb 9, 2021

Anthony, thanks for the clarification. I have only simple fields, except one Map of strings. Probably the issue is with schema inference.
I created table using DynamoDB Crawler, but that didn't help. I will update my connector as my current version doesn't support glue (e.g. there is no disable_glue parameter) and will try again.

@yrybak
Copy link
Author

yrybak commented Feb 10, 2021

@avirtuos I am still struggling to make connector to use table in Glue DataCatalog.
I updated AthenaDynamoDBConnector to latest release tag v2021.6.1.
Used DynamoDb Crawler to create table in Glue DataCatalog. I see classification set to "dynamodb".
I do see all fields in the Glue Table. But when I query table in Athena using DynamoDB connector, I still cannot query new fields.

Questions:

  1. Readme mentions following Lambda parameters: kms_key_id, disable_glue and glue_catalog. I don't see such parameters in the CloudFormation template - https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-dynamodb/athena-dynamodb.yaml#L45-L47. Though I see in code that if "disable_glue" variable is missing, it is considered to be "false". So Glue metadata should be used. Is that correct?
  2. Does it matter in what Glue DataCatalog database DynamoDb table is created? I currently have it not in the "default" database but in the database, created by me.
  3. I don't see my table, created by crawler, in the Athena's AwsDataCatalog data source and my database. Is this expected for tables with "dynamodb" classification?
  4. Assuming Glue metadata started working, should I see all table fields in the Athena's left pane (DynamoDb connector data source / default database / my table / fields)? Or fields there will not be updated from Glue metadata, but I can still query them using SQL?

Thanks a lot,
Yaroslav

@yrybak
Copy link
Author

yrybak commented Feb 10, 2021

I see failed requests in CloudTrail to get table in "default" database.
Will change crawler to use default database...

@avirtuos
Copy link
Contributor

I'm not sure if crawler will produce a table that is usable by the connector. You need to ensure the glue table uses the supported types. Just keep this in mind.

@yrybak
Copy link
Author

yrybak commented Feb 10, 2021

Thanks, Anthony.
I finally managed to make Glue metadata work.
In case of using any Glue database other than "default", with "dynamo-db-flag", this database appears in Athena under DynamoDb connector data source. So it is needed to specify correct database name in queries, because table in connector's "default" database still have metadata retrieved by connector's schema inference.
In case of using "default" database everything is straightforward, metadata in "default" database is taken from Glue catalog table.
Crawler replaces hyphen with underscore in table name, so for such tables it is needed to specify sourceTable, or recreate table manually with hyphens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants