feat(redshift): manage database users and tables via cdk #15931

BenChaimberg · 2021-08-07T07:18:58Z

This feature allows users to manage Redshift database resources, such as users, tables, and grants, within their CDK application. Because these resources do not have CloudFormation handlers, this feature leverages custom resources and the Amazon Redshift Data API for creation and modification.

The generic construct for this type of resource is DatabaseQuery. This construct provides the base functionality required for interacting with Redshift database resources, including configuring administrator credentials, creating a custom resource handler, and granting necessary IAM permissions. The custom resource handler code contains utility functions for executing query statements against the Redshift database.

Specific resources that use the DatabaseQuery construct, such as User and Table are responsible for providing the following to DatabaseQuery: generic database configuration properties, specific configuration properties that will get passed to the custom resource handler (eg., username for User). Specific resources are also responsible for writing the lifecycle-management code within the handler. In general, this consists of: configuration extraction (eg., pulling username from the AWSLambda.CloudFormationCustomResourceEvent passed to the handler) and one method for each lifecycle event (create, update, delete) that queries the database using calls to the generic utility function.

Users have a fairly simple lifecycle that allows them to be created, deleted, and updated when a secret containing a password is updated (secret rotation has not been implemented yet). Because of #9815, the custom resource provider queries Secrets Manager in order to access the password.

Tables have a more complicated lifecycle because we want to allow columns to be added to the table without resource replacement, as well as ensuring that dropped columns do not lose data. For these reasons, we generate a unique name per-deployment when the table name is requested to be generated by the end user. We also notify create a new table (using a new generated name) if a column is to be dropped and let CFN lifecycle rules dictate whether the old table should be removed or kept.

User privileges on tables are implemented via the UserTablePrivileges construct. This construct is located in the private directory to ensure that it is not exported for direct public use. This means that user privileges must be managed through the Table.grant method or the User.addTablePrivileges method. Thus, each User will have at most one UserTablePrivileges construct to manage its privileges. This is to avoid a situation where privileges could be erroneously removed when the same privilege is managed from two different CDK applications. For more details, see the README, under "Granting Privileges".

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

gitpod-io · 2021-08-07T07:19:00Z

…t-database-query

madeline-k

Some initial comments. I think the most interesting thing to decide here is what should we do on update of the table? A few options:

Current implementation
a. I think this is a good way to minimize clobbering data when the user changes things.
b. I am concerned this will cause some weird edge cases when the user changes the order of the columns in their table definitions. But maybe I am wrong about this?
Fully replace the table
a. I like this because it's simpler and will not result in confusion or drift between the actual table, and the table schema in CDK.
b. This is bad because users will not be able to extend their firehose with more fields being added without losing data. I think this is actually a common use case.
Add some kind of validation/restriction to limit what kinds of changes can be made to a table. Maybe only adding columns? I am not sure what this would look like.
Create a new table and do the data migration from original to new for them? Probably not possible, since volume of data could be too big for the Lambda timeout
Something else?

since the redshift L2 is experimental, we do have some wiggle room to make changes to this behavior later :)

madeline-k · 2021-08-19T17:17:54Z

packages/@aws-cdk/aws-redshift/README.md

+### Creating Users
+
+Create a user within a Redshift cluster database by instantiating a `User` construct. This
+will generate a username and password, store the credentials in a [Secrets Manager


Should we consider allowing setting the username?

Also can you include a brief description about how the username and password will be generated?

added descriptions, including the already-existing but undocumented ability to provide a username

do you think I should add the ability to provide a password in this PR?

packages/@aws-cdk/aws-redshift/README.md

packages/@aws-cdk/aws-redshift/lib/database-query-provider/create-table.ts

packages/@aws-cdk/aws-redshift/lib/table.ts

madeline-k · 2021-08-19T17:43:51Z

Can you add a PR description? Covering the important design decisions being made here?

nija-at · 2021-08-23T13:41:19Z

+1 to some of @madeline-k's concerns. Her comments are pointing to a larger issue of trying to manage 'relational' entities (in this case of this PR, Table) via CloudFormation.

I wonder if there is enough fidelity in CloudFormation's lifecycle to correctly manage them.

What is the motivation for adding these L2s? I would expect this to be captured in the PR description. It feels like these constructs would be the first of its kind.

…t-database-query

BenChaimberg · 2021-08-24T04:41:14Z

The motivation for this feature is to allow for one-click deployment of the feature in #15837. However, these certainly can be used outside of the Firehose context which is why @madeline-k and I decided to move the constructs to the Redshift module.

I see @nija-at's point about the limitations of CloudFormation when it comes to managing this resource. In my opinion, the ideal situation for schema updates would be to have column additions as "no/some-interruption" but column removals/type change as "require replacement". Then, CFN retention policy would handle whether or not the replacement causes the old table to be deleted. To make this work, we'd need to generate a PhysicalResourceId randomly as opposed to have it be based deterministically on the table name/columns.

I think this is feasible and align with my understanding of how CDK treats persistent resources. Maybe slightly better than how a DynamoDB table is treated (where there are no "no/some-interruption" replacements for column attributes). What do you think?

nija-at · 2021-08-24T13:16:56Z

The motivation for this feature is to allow for one-click deployment of the feature in #15837.

I can't quite make out what this feature is from the PR title and description.
Is this so that the redshift destination for kinesis firehose has to be a specific table?

Keeping these in the redshift module sounds logically correct.

To make this work, we'd need to generate a PhysicalResourceId randomly as opposed to have it be based deterministically on the table name/columns.

IIRC, changing PhysicalResourceId indicates to CFN that the resource needs to be replaced.
So, it should not change for column additions and should change for column deletions and modifications. Generating this randomly is going to trigger a replacement for every deployment irrespective of any changes here. Maybe I'm missing something?
If that is the case, you will then have to query the underlying redshift table to determine what kind of change is going to be deployed.

Overall, custom resources are expensive to maintain, test and get right.

As discussed above, custom resources pull in the responsibility of resource replacement onto the CDK. We don't have good mechanisms today to test or estimate this. (i.e., if the logic or calculating PhysicalResourceId, what is the impact to existing, new users, etc.)

I want to check to know if there is enough demand for this, and whether it's worth the effort to implement and maintain the redshift destination for kinesis firehose.
If we've done this analysis and have good confidence in its RoI, then go ahead by all means.

BenChaimberg · 2021-08-24T14:55:17Z

Is this so that the redshift destination for kinesis firehose has to be a specific table?

Added the missing description in the other PR as well (sorry about these – I need to be better about creating these the first time around). Essentially, I wrote an early version of this feature (Redshift tables and users managed via custom resources) for the Firehose Redshift destination PR so that we could create a Redshift user and table automatically within the Redshift destination. If we did not create the user and table automatically, the customer would need to manually create a user and table within the Redshift cluster (and pass the same username, password, and table name to the construct).

So, it should not change for column additions and should change for column deletions and modifications. Generating this randomly is going to trigger a replacement for every deployment irrespective of any changes here.

Sorry, I didn't explain my idea fully. Yes, the physical resource ID should not change for column additions and should change for column deletions and modifications. This means that the table name and columns alone cannot determine the physical resource ID since the same name and columns could produce different table resources depending on the previous state of the table. Thus, we need to add some randomness to the physical resource ID so that we can generate a new ID whenever we choose, regardless of the name and columns. However, we will only generate a new ID for column deletions and modifications; for additions, we will reuse the existing ID.

I want to check to know if there is enough demand for this, and whether it's worth the effort to implement and maintain the redshift destination for kinesis firehose.

As I stated above, this feature is not necessary for the Firehose destination; users can still create these Redshift resources manually. However, I believe that it is quite valuable, as there are several steps that would be necessary for a user to take (create the user, create the table, grant the user privileges on the table) that include secret management and translating table schema. There is an internal example already of customers using custom resources for this purpose: AKGamesCDKCommon. Furthermore, I envision these resources providing benefits beyond the Firehose destination table; customers that are building data analytics applications could use CDK to manage their database resources.

packages/@aws-cdk/aws-redshift/lib/database-query.ts

packages/@aws-cdk/aws-redshift/lib/database-query-provider/create-table.ts

packages/@aws-cdk/aws-redshift/lib/database-query.ts

…custom resource tests

…rops

BenChaimberg · 2021-08-30T03:24:06Z

@nija-at @madeline-k I re-worked the table update logic to replace in all situations except for a table addition. Let me know if you are happy with that decision. I also added unit tests for the handler logic which I hoped would assuage some of the concerns about maintainability. If you like the way they look, I can add them for the other handlers as well.

…t-database-query

nija-at

@njlynch - (tagging you since you're the owner of the redshift module)

I think it'll be prudent to mark these APIs as "beta" in case the redshift module comes up for graduation in the coming months.
(see #15931 (comment) on why I think this needs to bake in longer than the rest). OTOH, we can just treat this as any other API and keep trudging forward until there is an actual concern.
WDYT?

How should we approach this?
Should we tag this as "beta" already or wait to mark them during graduation (if/when we get there)?
Should we open an issue to track this and not forget?

nija-at

Provisional approval.

packages/@aws-cdk/aws-redshift/lib/private/privileges.ts

packages/@aws-cdk/aws-redshift/lib/private/database-query.ts

BenChaimberg · 2021-09-09T00:17:43Z

packages/@aws-cdk/aws-redshift/lib/database-options.ts

+   *
+   * @default - the admin secret is taken from the cluster
+   */
+  readonly adminUser?: secretsmanager.ISecret;


no need for User? suffix with secret?

Suggested change

readonly adminUser?: secretsmanager.ISecret;

readonly admin?: secretsmanager.ISecret;

BenChaimberg · 2021-09-09T00:18:24Z

packages/@aws-cdk/aws-redshift/lib/user.ts

+  /**
+   * The name of the user.
+   *
+   * For valid values, see: https://docs.aws.amazon.com/redshift/latest/dg/r_names.html


add validation at construction-time?

…t-database-query

nija-at

Took a glance through the PR. Nothing stood out to me.

Ship it! Let's give this a shot.

aws-cdk-automation · 2021-09-13T18:40:30Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
Commit ID: 8add00c
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2021-09-13T18:40:52Z

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

itajaja · 2021-09-27T14:25:32Z

this is great! would be amazing if this could be expanded to other schemas. do you think it would be easy? I could try to make a PR. any suggestion?

BenChaimberg · 2021-09-27T16:51:57Z

@itajaja By other schema do you mean the Redshift schema resource or some other database resources? In either case, if you can make an argument that customers will benefit from managing the resource via code, then the team may consider it a valuable contribution to this library! I would recommend opening a feature request issue to discuss the use case as well as the design going forward for these sorts of resources

BenChaimberg added 2 commits August 6, 2021 23:24

feat(redshift): manage database users and tables via custom resources

1d923e3

add cluster and databaseName to interfaces

9d3827d

BenChaimberg added the @aws-cdk/aws-redshift Related to Amazon Redshift label Aug 7, 2021

BenChaimberg requested a review from madeline-k August 7, 2021 07:18

mergify bot added the contribution/core This is a PR that came from AWS. label Aug 7, 2021

BenChaimberg added 5 commits August 7, 2021 00:20

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

6fef1ec

…t-database-query

remove outdated docs

2708e9b

update integ test

015d05d

move database props to separate file to pass monocdk build

1f58168

add to README

06a7570

BenChaimberg requested a review from a team August 17, 2021 22:38

madeline-k reviewed Aug 19, 2021

View reviewed changes

BenChaimberg added 2 commits August 23, 2021 20:51

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

a474705

…t-database-query

updated documentation and linting

9d67a38

update integ tests

3bb16b1

rix0rrr reviewed Aug 25, 2021

View reviewed changes

packages/@aws-cdk/aws-redshift/lib/database-query.ts Outdated Show resolved Hide resolved

rix0rrr reviewed Aug 25, 2021

View reviewed changes

packages/@aws-cdk/aws-redshift/lib/database-query.ts Outdated Show resolved Hide resolved

packages/@aws-cdk/aws-redshift/lib/database-query-provider/create-table.ts Outdated Show resolved Hide resolved

packages/@aws-cdk/aws-redshift/lib/database-query.ts Outdated Show resolved Hide resolved

BenChaimberg added this to the [DevPreview] Kinesis Data Firehose milestone Aug 25, 2021

BenChaimberg added 3 commits August 29, 2021 21:57

move user table privileges to separate construct

f228c18

only update table if columns added (replace in all other cases); add …

bd1fef5

…custom resource tests

make DatabaseQuery generic, strongly type handlers, don't serialize p…

e889618

…rops

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

10bb1e7

…t-database-query

nija-at changed the title ~~feat(redshift): manage database users and tables via custom resources~~ feat(redshift): manage database users and tables via cdk Sep 3, 2021

nija-at reviewed Sep 8, 2021

View reviewed changes

nija-at added the pr/do-not-merge This PR should not be merged at this time. label Sep 8, 2021

nija-at approved these changes Sep 8, 2021

View reviewed changes

document reasons for not granting duplicate privileges in multiple apps

6e75d11

BenChaimberg commented Sep 9, 2021

View reviewed changes

BenChaimberg added 13 commits September 8, 2021 20:19

remove TODOs in code

156e72e

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

b7610dc

…t-database-query

rewrite opening paragraph in readme

1a44e32

lint readme snippets

378f101

reduce table privileges to combine actions across multiple grant calls

307214f

remove magic handler names and replace with enum

83b933a

add tests to entrypoint of database query provider

2a5f636

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

67114bc

…t-database-query

ignore asset hash in integration tests

ada89e7

Merge branch 'master' of github.com:aws/aws-cdk into chaimber/redshif…

df7b1df

…t-database-query

fix location of handler name

f533242

refactor in prep of correctly generating table name

2152d53

properly generate table name so update replacements succeed

f5ab16d

nija-at approved these changes Sep 13, 2021

View reviewed changes

BenChaimberg added pr/do-not-merge This PR should not be merged at this time. and removed pr/do-not-merge This PR should not be merged at this time. labels Sep 13, 2021

BenChaimberg added 2 commits September 13, 2021 13:35

Merge branch 'master' into chaimber/redshift-database-query

17b9e2f

typo

8add00c

BenChaimberg removed the pr/do-not-merge This PR should not be merged at this time. label Sep 13, 2021

mergify bot merged commit a9d5118 into master Sep 13, 2021

mergify bot deleted the chaimber/redshift-database-query branch September 13, 2021 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(redshift): manage database users and tables via cdk #15931

feat(redshift): manage database users and tables via cdk #15931

BenChaimberg commented Aug 7, 2021 •

edited

gitpod-io bot commented Aug 7, 2021

madeline-k left a comment

madeline-k Aug 19, 2021

BenChaimberg Aug 24, 2021

madeline-k commented Aug 19, 2021

nija-at commented Aug 23, 2021

BenChaimberg commented Aug 24, 2021

nija-at commented Aug 24, 2021 •

edited

BenChaimberg commented Aug 24, 2021

BenChaimberg commented Aug 30, 2021

nija-at left a comment •

edited

nija-at left a comment

BenChaimberg Sep 9, 2021

BenChaimberg Sep 9, 2021

nija-at left a comment

aws-cdk-automation commented Sep 13, 2021

mergify bot commented Sep 13, 2021

itajaja commented Sep 27, 2021

BenChaimberg commented Sep 27, 2021

	readonly adminUser?: secretsmanager.ISecret;
	readonly admin?: secretsmanager.ISecret;

feat(redshift): manage database users and tables via cdk #15931

feat(redshift): manage database users and tables via cdk #15931

Conversation

BenChaimberg commented Aug 7, 2021 • edited

gitpod-io bot commented Aug 7, 2021

madeline-k left a comment

Choose a reason for hiding this comment

madeline-k Aug 19, 2021

Choose a reason for hiding this comment

BenChaimberg Aug 24, 2021

Choose a reason for hiding this comment

madeline-k commented Aug 19, 2021

nija-at commented Aug 23, 2021

BenChaimberg commented Aug 24, 2021

nija-at commented Aug 24, 2021 • edited

BenChaimberg commented Aug 24, 2021

BenChaimberg commented Aug 30, 2021

nija-at left a comment • edited

Choose a reason for hiding this comment

nija-at left a comment

Choose a reason for hiding this comment

BenChaimberg Sep 9, 2021

Choose a reason for hiding this comment

BenChaimberg Sep 9, 2021

Choose a reason for hiding this comment

nija-at left a comment

Choose a reason for hiding this comment

aws-cdk-automation commented Sep 13, 2021

AWS CodeBuild CI Report

mergify bot commented Sep 13, 2021

itajaja commented Sep 27, 2021

BenChaimberg commented Sep 27, 2021

BenChaimberg commented Aug 7, 2021 •

edited

nija-at commented Aug 24, 2021 •

edited

nija-at left a comment •

edited