Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(glue): add L2 resources for Database and Table #1988

Merged
merged 25 commits into from
Mar 14, 2019
Merged

Conversation

sam-goodwin
Copy link
Contributor

This change adds L2 resources for Database and Table.

const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'my_database'
});

new glue.Table(stack, 'MyTable', {
  database,
  tableName: 'my_table',
  columns: [{
    name: 'col1',
    type: glue.Schema.string
  }],
  partitionKeys: [{
    name: 'year',
    type: glue.Schema.smallint
  }, {
    name: 'month',
    type: glue.Schema.smallint
  }],
  storageType: glue.StorageType.Json
});

Schemas are defined as an array of Column, each of which have a name and a Type:

Types

A table's schema is a collection of columns, each of which have a name and a type. Types are recursive structures, consisting of primitive and complex types:

Primitive

Numeric:

  • bigint
  • float
  • integer
  • smallint
  • tinyint

Date and Time:

  • date
  • timestamp

String Types:

  • string
  • decimal
  • char
  • varchar

Misc:

  • boolean
  • binary

Complex

  • array - array of some other type.
  • map - map of some primitive key type to any value type.
  • struct - nested structure containing individually named and typed columns.

Pull Request Checklist

  • Testing
    • Unit test added (prefer not to modify an existing test, otherwise, it's probably a breaking change)
    • CLI change?: coordinate update of integration tests with team
    • cdk-init template change?: coordinated update of integration tests with team
  • Docs
    • jsdocs: All public APIs documented
    • README: README and/or documentation topic updated
  • Title and Description
    • Change type: title prefixed with fix, feat will appear in changelog
    • Title: use lower-case and doesn't end with a period
    • Breaking?: last paragraph: "BREAKING CHANGE: <describe what changed + link for details>"
    • Issues: Indicate issues fixed via: "Fixes #xxx" or "Closes #xxx"
  • Sensitive Modules (requires 2 PR approvers)
    • IAM Policy Document (in @aws-cdk/aws-iam)
    • EC2 Security Groups and ACLs (in @aws-cdk/aws-ec2)
    • Grant APIs (only if not based on official documentation with a reference)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license.

@sam-goodwin sam-goodwin added the @aws-cdk/aws-glue Related to AWS Glue label Mar 11, 2019
@sam-goodwin sam-goodwin requested a review from a team as a code owner March 11, 2019 06:19
Copy link
Contributor

@eladb eladb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

packages/@aws-cdk/aws-glue/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/README.md Show resolved Hide resolved
new glue.Table(stack, 'MyTable', {
database: myDatabase,
tableName: 'my_table',
columns: [{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, if name is unique, why not use a hash?

Copy link
Contributor Author

@sam-goodwin sam-goodwin Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two semantics we want to model as strictly as we can: column uniqueness and ordering.

  • A hash models uniqueness well, but it does not model ordering. In node.js, the order of variables is the order in which they are added to the object, but that is not the case for other languages like java, where a developer would have to know to use a LinkedHashMap.
  • An array explicitly and intuitively defines the ordering in all languages, but it doesn't model column uniqueness.

I chose to statically model the ordering property with an array and check the uniqueness at runtime because then, at least the experience is consistent for all consumers. Using a hash might create confusion for consumers - they would not receive an error, the layout of their columns could just change arbitrarily.

packages/@aws-cdk/aws-glue/lib/database.ts Show resolved Hide resolved
/**
* Storage type of the table's data.
*/
storageType: StorageType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default to JSON?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to ask customers to be explicit about their data format. Front-load the important questions: schema, file format, location and security.

packages/@aws-cdk/aws-glue/lib/table.ts Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
* [SSE-S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html) - Server side encryption (SSE) with an Amazon S3-managed key.
```ts
new glue.Table(stack, 'MyTable', {
encryption: glue.TableEncryption.SSE_S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum names should be consistent with BucketEncryption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue the other way around - the enum values are consistent with the S3, Athena, Glue and EMR documentation. What would I name CSE-KMS if I were copying BucketEncryption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, but I think we have a problem with ALL_CAPS when converting those member names to other languages. Can we find names that are PascalCase?

packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/test/test.table.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/database.ts Show resolved Hide resolved
packages/@aws-cdk/aws-glue/lib/table.ts Outdated Show resolved Hide resolved
*/
// CSE_KMS = 'CSE-KMS'
CSE_KMS = 'CSE-KMS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed names that pass the jsii naming bar:

  • Unencrypted
  • SSE_KMS => Kms
  • SSE_KMS_MANAGED => KmsManaged
  • SSE-S3 => S3Managed
  • CSE_KMS => ClientKms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your way distinguishes CSE with the prefix Client and implies SSE for the others. It's efficient. I think I'd prefer ClientSideKms over ClientKms, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

I am okay with ServerSideXxx as well, but then we'll have to also change it in other places ;-) and I favor consistency at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants