-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[glue] Table format is Unknown #9902
Comments
Hi @SZubarev - Thanks for reporting this. Indeed looks like an oversight. I did some digging and created a Glue table with the console, then described it with {
"TableList": [
{
"Name": "epolon-test",
"DatabaseName": "triage",
"CreateTime": 1598175131.0,
"UpdateTime": 1598175131.0,
"Retention": 0,
"StorageDescriptor": {
"Columns": [
{
"Name": "count",
"Type": "smallint"
}
],
"Location": "<location>",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"Compressed": false,
"NumberOfBuckets": 0,
"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.serde2.OpenCSVSerde",
"Parameters": {
"separatorChar": ","
}
},
"SortColumns": [],
"StoredAsSubDirectories": false
},
"PartitionKeys": [],
"TableType": "EXTERNAL_TABLE",
"Parameters": {
"classification": "csv" // this parameter is not being passed by CDK.
},
"CreatedBy": "<role>",
"IsRegisteredWithLakeFormation": false
}
]
} Looks like the aws-cdk/packages/@aws-cdk/aws-glue/lib/table.ts Lines 262 to 264 in 25a9cc7
|
I think I can try and implement the fix for this. Though one detail I noticed is, when creating a glue table in the console, some of the data formats require extra options: the choice of separator for CSV files, and the row tag for XML files. So, along these lines I think it's worth also extending the DataFormat class to support specifying these options (since they are required in the AWS console table creation workflow). As a temporary workaround, @SZubarev you can use this code like this to override the property, so the table is marked as having the CSV classification. const cfnTable = myTable.node.defaultChild as glue.CfnTable;
((cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters as string) = {
classification: "csv",
...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters
}; [It seems that in |
@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff. Another way of doing it would be: cfnTable.addPropertyOverride('TableInput.Parameters', { classification: "csv", ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters}); But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches |
Fixes #9902 ~I also added support for the XML data type that's available as a choice when creating Glue tables in the AWS console.~ ~I've also added a commit which adds optional parameters csvSeparator and rowTag props. I'm not super experienced with Glue so I'm not sure how much value this provides and if this is the best way to organize the API, so I'm open to scrapping those changes for later.~ ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Creating Glue Table. After table created in Glue console field "Classification" is "Unknown" while table data format is specified as CSV.
Reproduction Steps
What did you expect to happen?
Create Glue Table with CSV format
What actually happened?
Table format is shown as "Unknown" in Glue console.
Environment
Other
This is 🐛 Bug Report
The text was updated successfully, but these errors were encountered: