Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[glue] Table format is Unknown #9902

Closed
SZubarev opened this issue Aug 21, 2020 · 3 comments · Fixed by #9923
Closed

[glue] Table format is Unknown #9902

SZubarev opened this issue Aug 21, 2020 · 3 comments · Fixed by #9923
Assignees
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/small Small work item – less than a day of effort in-progress This issue is being actively worked on. p1

Comments

@SZubarev
Copy link

Creating Glue Table. After table created in Glue console field "Classification" is "Unknown" while table data format is specified as CSV.

Reproduction Steps

const targetTable = new glue.Table(this,"TargetTable",{
      database: glueDatabase,
      dataFormat: glue.DataFormat.CSV,
      bucket: s3Bucket,
      tableName: "target_s3_table",
      columns:[
        {
         name:"sample_field",
         type:glue.Schema.STRING
        },
      ]
    })

What did you expect to happen?

Create Glue Table with CSV format

What actually happened?

Table format is shown as "Unknown" in Glue console.

Environment

  • CLI Version :
  • Framework Version: 1.60.0 (build 8e3f53a)
  • Node.js Version: v14.8.0
  • OS : MacOS 10.14.6
  • Language (Version): TypeScript 3.7.2

Other


This is 🐛 Bug Report

@SZubarev SZubarev added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 21, 2020
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Aug 21, 2020
@iliapolo
Copy link
Contributor

iliapolo commented Aug 23, 2020

Hi @SZubarev - Thanks for reporting this. Indeed looks like an oversight.

I did some digging and created a Glue table with the console, then described it with aws glue get-tables:

{
    "TableList": [
        {
            "Name": "epolon-test",
            "DatabaseName": "triage",
            "CreateTime": 1598175131.0,
            "UpdateTime": 1598175131.0,
            "Retention": 0,
            "StorageDescriptor": {
                "Columns": [
                    {
                        "Name": "count",
                        "Type": "smallint"
                    }
                ],
                "Location": "<location>",
                "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
                "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
                "Compressed": false,
                "NumberOfBuckets": 0,
                "SerdeInfo": {
                    "SerializationLibrary": "org.apache.hadoop.hive.serde2.OpenCSVSerde",
                    "Parameters": {
                        "separatorChar": ","
                    }
                },
                "SortColumns": [],
                "StoredAsSubDirectories": false
            },
            "PartitionKeys": [],
            "TableType": "EXTERNAL_TABLE",
            "Parameters": {
                "classification": "csv" // this parameter is not being passed by CDK.
            },
            "CreatedBy": "<role>",
            "IsRegisteredWithLakeFormation": false
        }
    ]
}

Looks like the Classification field in the console is controlled by the classification parameter. We need to add this parameter:

parameters: {
has_encrypted_data: this.encryption !== TableEncryption.UNENCRYPTED,
},

@iliapolo iliapolo added effort/small Small work item – less than a day of effort p1 p2 and removed needs-triage This issue or PR still needs to be triaged. p2 labels Aug 23, 2020
@Chriscbr
Copy link
Contributor

I think I can try and implement the fix for this. Though one detail I noticed is, when creating a glue table in the console, some of the data formats require extra options: the choice of separator for CSV files, and the row tag for XML files. So, along these lines I think it's worth also extending the DataFormat class to support specifying these options (since they are required in the AWS console table creation workflow).

As a temporary workaround, @SZubarev you can use this code like this to override the property, so the table is marked as having the CSV classification.

const cfnTable = myTable.node.defaultChild as glue.CfnTable;
((cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters as string) = {
  classification: "csv",
  ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters
};

[It seems that in glue.generated.ts, all fields of CfnTable.TableInputProperty are marked as readonly, so in order to override them for an existing L2 construct, we need to do extra typecasting, but maybe there's a better way to do this which I don't see...]

@iliapolo
Copy link
Contributor

@Chriscbr Thanks for the workaround code! The code is ok, those castings appear a lot when doing this sort of stuff.

Another way of doing it would be:

cfnTable.addPropertyOverride('TableInput.Parameters', { classification: "csv", ...(cfnTable.tableInput as glue.CfnTable.TableInputProperty).parameters});

But that essentially the same. In general, to anyone interested, these approaches are described here: CDK Escape Hatches

@iliapolo iliapolo added the in-progress This issue is being actively worked on. label Aug 24, 2020
@mergify mergify bot closed this as completed in #9923 Sep 2, 2020
mergify bot pushed a commit that referenced this issue Sep 2, 2020
Fixes #9902 

~I also added support for the XML data type that's available as a choice when creating Glue tables in the AWS console.~

~I've also added a commit which adds optional parameters csvSeparator and rowTag props. I'm not super experienced with Glue so I'm not sure how much value this provides and if this is the best way to organize the API, so I'm open to scrapping those changes for later.~

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/small Small work item – less than a day of effort in-progress This issue is being actively worked on. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants