|
1 | 1 | ## The CDK Construct Library for AWS Glue
|
2 | 2 | This module is part of the [AWS Cloud Development Kit](https://github.com/awslabs/aws-cdk) project.
|
| 3 | + |
| 4 | +### Database |
| 5 | + |
| 6 | +A `Database` is a logical grouping of `Tables` in the Glue Catalog. |
| 7 | + |
| 8 | +```ts |
| 9 | +new glue.Database(stack, 'MyDatabase', { |
| 10 | + databaseName: 'my_database' |
| 11 | +}); |
| 12 | +``` |
| 13 | + |
| 14 | +By default, a S3 bucket is created and the Database is stored under `s3://<bucket-name>/`, but you can manually specify another location: |
| 15 | + |
| 16 | +```ts |
| 17 | +new glue.Database(stack, 'MyDatabase', { |
| 18 | + databaseName: 'my_database', |
| 19 | + locationUri: 's3://explicit-bucket/some-path/' |
| 20 | +}); |
| 21 | +``` |
| 22 | + |
| 23 | +### Table |
| 24 | + |
| 25 | +A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.): |
| 26 | + |
| 27 | +```ts |
| 28 | +new glue.Table(stack, 'MyTable', { |
| 29 | + database: myDatabase, |
| 30 | + tableName: 'my_table', |
| 31 | + columns: [{ |
| 32 | + name: 'col1', |
| 33 | + type: glue.Schema.string, |
| 34 | + }, { |
| 35 | + name: 'col2', |
| 36 | + type: glue.Schema.array(Schema.string), |
| 37 | + comment: 'col2 is an array of strings' // comment is optional |
| 38 | + }] |
| 39 | + dataFormat: glue.DataFormat.Json |
| 40 | +}); |
| 41 | +``` |
| 42 | + |
| 43 | +By default, a S3 bucket will be created to store the table's data but you can manually pass the `bucket` and `s3Prefix`: |
| 44 | + |
| 45 | +```ts |
| 46 | +new glue.Table(stack, 'MyTable', { |
| 47 | + bucket: myBucket, |
| 48 | + s3Prefix: 'my-table/' |
| 49 | + ... |
| 50 | +}); |
| 51 | +``` |
| 52 | + |
| 53 | +#### Partitions |
| 54 | + |
| 55 | +To improve query performance, a table can specify `partitionKeys` on which data is stored and queried separately. For example, you might partition a table by `year` and `month` to optimize queries based on a time window: |
| 56 | + |
| 57 | +```ts |
| 58 | +new glue.Table(stack, 'MyTable', { |
| 59 | + database: myDatabase, |
| 60 | + tableName: 'my_table', |
| 61 | + columns: [{ |
| 62 | + name: 'col1', |
| 63 | + type: glue.Schema.string |
| 64 | + }], |
| 65 | + partitionKeys: [{ |
| 66 | + name: 'year', |
| 67 | + type: glue.Schema.smallint |
| 68 | + }, { |
| 69 | + name: 'month', |
| 70 | + type: glue.Schema.smallint |
| 71 | + }], |
| 72 | + dataFormat: glue.DataFormat.Json |
| 73 | +}); |
| 74 | +``` |
| 75 | + |
| 76 | +### [Encryption](https://docs.aws.amazon.com/athena/latest/ug/encryption.html) |
| 77 | + |
| 78 | +You can enable encryption on a Table's data: |
| 79 | +* `Unencrypted` - files are not encrypted. The default encryption setting. |
| 80 | +* [S3Managed](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html) - Server side encryption (`SSE-S3`) with an Amazon S3-managed key. |
| 81 | +```ts |
| 82 | +new glue.Table(stack, 'MyTable', { |
| 83 | + encryption: glue.TableEncryption.S3Managed |
| 84 | + ... |
| 85 | +}); |
| 86 | +``` |
| 87 | +* [Kms](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) - Server-side encryption (`SSE-KMS`) with an AWS KMS Key managed by the account owner. |
| 88 | + |
| 89 | +```ts |
| 90 | +// KMS key is created automatically |
| 91 | +new glue.Table(stack, 'MyTable', { |
| 92 | + encryption: glue.TableEncryption.Kms |
| 93 | + ... |
| 94 | +}); |
| 95 | + |
| 96 | +// with an explicit KMS key |
| 97 | +new glue.Table(stack, 'MyTable', { |
| 98 | + encryption: glue.TableEncryption.Kms, |
| 99 | + encryptionKey: new kms.EncryptionKey(stack, 'MyKey') |
| 100 | + ... |
| 101 | +}); |
| 102 | +``` |
| 103 | +* [KmsManaged](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) - Server-side encryption (`SSE-KMS`), like `Kms`, except with an AWS KMS Key managed by the AWS Key Management Service. |
| 104 | +```ts |
| 105 | +new glue.Table(stack, 'MyTable', { |
| 106 | + encryption: glue.TableEncryption.KmsManaged |
| 107 | + ... |
| 108 | +}); |
| 109 | +``` |
| 110 | +* [ClientSideKms](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html#client-side-encryption-kms-managed-master-key-intro) - Client-side encryption (`CSE-KMS`) with an AWS KMS Key managed by the account owner. |
| 111 | +```ts |
| 112 | +// KMS key is created automatically |
| 113 | +new glue.Table(stack, 'MyTable', { |
| 114 | + encryption: glue.TableEncryption.ClientSideKms |
| 115 | + ... |
| 116 | +}); |
| 117 | + |
| 118 | +// with an explicit KMS key |
| 119 | +new glue.Table(stack, 'MyTable', { |
| 120 | + encryption: glue.TableEncryption.ClientSideKms, |
| 121 | + encryptionKey: new kms.EncryptionKey(stack, 'MyKey') |
| 122 | + ... |
| 123 | +}); |
| 124 | +``` |
| 125 | + |
| 126 | +*Note: you cannot provide a `Bucket` when creating the `Table` if you wish to use server-side encryption (`Kms`, `KmsManaged` or `S3Managed`)*. |
| 127 | + |
| 128 | +### Types |
| 129 | + |
| 130 | +A table's schema is a collection of columns, each of which have a `name` and a `type`. Types are recursive structures, consisting of primitive and complex types: |
| 131 | + |
| 132 | +```ts |
| 133 | +new glue.Table(stack, 'MyTable', { |
| 134 | + columns: [{ |
| 135 | + name: 'primitive_column', |
| 136 | + type: glue.Schema.string |
| 137 | + }, { |
| 138 | + name: 'array_column', |
| 139 | + type: glue.Schema.array(glue.Schema.integer), |
| 140 | + comment: 'array<integer>' |
| 141 | + }, { |
| 142 | + name: 'map_column', |
| 143 | + type: glue.Schema.map( |
| 144 | + glue.Schema.string, |
| 145 | + glue.Schema.timestamp), |
| 146 | + comment: 'map<string,string>' |
| 147 | + }, { |
| 148 | + name: 'struct_column', |
| 149 | + type: glue.Schema.struct([{ |
| 150 | + name: 'nested_column', |
| 151 | + type: glue.Schema.date, |
| 152 | + comment: 'nested comment' |
| 153 | + }]), |
| 154 | + comment: "struct<nested_column:date COMMENT 'nested comment'>" |
| 155 | + }], |
| 156 | + ... |
| 157 | +``` |
| 158 | +
|
| 159 | +#### Primitive |
| 160 | +
|
| 161 | +Numeric: |
| 162 | +* `bigint` |
| 163 | +* `float` |
| 164 | +* `integer` |
| 165 | +* `smallint` |
| 166 | +* `tinyint` |
| 167 | +
|
| 168 | +Date and Time: |
| 169 | +* `date` |
| 170 | +* `timestamp` |
| 171 | +
|
| 172 | +String Types: |
| 173 | +
|
| 174 | +* `string` |
| 175 | +* `decimal` |
| 176 | +* `char` |
| 177 | +* `varchar` |
| 178 | +
|
| 179 | +Misc: |
| 180 | +* `boolean` |
| 181 | +* `binary` |
| 182 | +
|
| 183 | +#### Complex |
| 184 | +
|
| 185 | +* `array` - array of some other type |
| 186 | +* `map` - map of some primitive key type to any value type. |
| 187 | +* `struct` - nested structure containing individually named and typed columns. |
0 commit comments