Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naming conventions: cross-service integrations #1743

Closed
rix0rrr opened this issue Feb 12, 2019 · 16 comments
Closed

naming conventions: cross-service integrations #1743

rix0rrr opened this issue Feb 12, 2019 · 16 comments
Labels
@aws-cdk/core Related to core CDK functionality

Comments

@rix0rrr
Copy link
Contributor

rix0rrr commented Feb 12, 2019

Acute need came out of #1646, but this pattern crops up more, for example in #1663 (TBD).

Problem Statement

We have classes that users can use to link certain services together. Examples are CloudWatch Targets, StepFunctions Tasks, Lambda Event Sources, and probably there are more occurrences of this that I can't think of right now.

Usually such an integration is event based, and usually there are 3 entities involved:

                                                                          
    ┌───────────────┐    Customization    ┌───────────────┐               
    │               │                     │               │               
    │ Event Source  │────────────────────▶│ Event Target  │               
    │               │                     │               │               
    └───────────────┘                     └───────────────┘               
StepFunctions Workflow                       DDB Table                    
CloudWatch Event                             SNS Topic                    
S3/SQS/.. Lambda Event Source                ECS Task                     
                                             Lambda Function              
                                             ...                          

Obviously there will be a source and a target, and usually the event passing is customized in some way as well (by specifying additional properties).


Today, our integrations look like this:

CloudWatch Events

// Pretty easy, run the Lambda
eventRule.addTarget(myLambda);

// Still okay, run the Lambda with a JSON payload
eventRule.addTarget(myLambda, {
  jsonTemplate: {
    SomeArgument: 'Value'
  }
});


// Oops, the configuration object becomes super complex all of a sudden
const target = new ecs.Ec2EventRuleTarget(this, 'EventTarget', {
  cluster,
  taskDefinition,
  taskCount: 1
});

rule.addTarget(target, {
  jsonTemplate: {
    containerOverrides: [{
      name: 'TheContainer',
      environment: [{ name: 'I_WAS_TRIGGERED', value: 'From CloudWatch Events' }]
    }]
  }
});

The configuration becomes super complex when we try to enable interactions between more flexible services, such as ECS, which has various parameters that control the invocation. With the code as written, we even have to split those parameters over two statements, and there's no static or runtime safety in the second statement.

In this case, we would benefit from having everything in the Ec2EventRuleTarget class, and we should probably make similar classes for the other service integrations in order to stay consistent across the framework (instead of using the resource directly as the integration point).

Question is, what should those classes be named, and in what module should they live?

Right now, the integrations are implemented in the module of the resource.

Step Functions

To invoke a Lambda, Step Functions used to work like this:

new sfn.Task(this, 'Task', {
  // Returns 'arn:aws:lambda:region:account-id:function:function-name', configures permissions on bind(),
  // done.
  resource: myLambda
});

We would take the ARN of the Lambda, give that to StepFunctions, and everything would be good.

They've now launched integration with other services, and there are some complications.

  • Targets are no longer represented by the resource ARN, but by a magic ARN. For example, to run an ECS Task you don't specify any resource's ARN, but you specify Resource: "arn:aws:states:::ecs:runTask.sync", and in an additional field Parameters you put something like:
     "Parameters": {
            "Cluster": "cluster-arn",
            "TaskDefinition": "job-id",
            "Overrides": {
                "ContainerOverrides": [
                    {
                        "Name": "container-name",
                        "Command.$": "$.commands" 
                    }
                ]
            }
        },

We could model this as follows:

new sfn.Task(this, 'Task', {
  resource: new Ec2StepFunctionsTask(this, 'Ec2Task', {
    cluster,
    taskDefinition,
    overrides: [ ... ]
  })
});

But it would be even nicer to combine the two:

new Ec2StepFunctionsTask(this, 'Ec2Task', {
  cluster,
  taskDefinition,
  overrides: [ ... ]
})

And be able to use that object in a workflow directly. Again, what do we name this class, and in what module does it live?

Right now, the integrations are implemented in the module of the resource.

Lambda Event Sources

Lambda Event Sources already have taken the model where we actually have a class per integration (FooEventSource) and they all live in their own module @aws-cdk/aws-lambda-event-sources.

Used as:

const source = new SqsEventSource(...);

lambda.addEventSource(source);

Complications

NAMING

  • First and foremost for the ECS case, we have an overload of the word Task. A Task is a running instance of a task definition (same as a process is a running instance of an executable, etc.), but a Task is also a step in a Step Functions workflow that performs an action. So to be fully correct, a StepFunctions Task to run an ECS Task should be named something like FargateRunTaskTask.
  • Do we prefix or suffix? Prefix will be easier for discoverability and autocomplete, but suffix makes more sense for readability (compare: FargateRunTask vs. SfnFargateRunTask)
  • Understandability of the name depends on the package. A simple name like FargateRunTask might suffice in a package named @aws-cdk/aws-stepfunctions-tasks, but it might not be so clear in the aws-ecs package proper.

LOCATION

  • If we implement integrations in the integrated resource's package, we might run into code ownership issues down the line. If in some hypothetical future StepFunctions adds an integration with BloopService, they will need the cooperation of the owners of @aws-cdk/aws-bloop to land that integration in the CDK. This might be desirable as a forcing function, or it might not be because of expediency.
  • What is the most discoverable location for integration classes? Next to the resources that they integrate with, or in a special package that contains all integrations? I have a feeling the latter is more discoverable, but I'm not sure.

Honestly, I don't have any good proposals here yet. I'm leaning towards following the Lambda Event Sources model, where integrations go into their own package with suffix naming. So I think I'd like:

  • aws-stepfunctions-tasks with FargateRunTask, TablePutItemTask, etc.
  • aws-events-targets with FargateRun(Task)Target, InvokeFunctionTarget, etc.

I'm interested in hearing opinions though.

@eladb
Copy link
Contributor

eladb commented Feb 18, 2019

There is something that "works" with the lambda-event-sources approach, I agree. The main problem I have with it is that in many cases it will pull down many dependencies you don't really need. But I am not sure that's a huge issue to be honest (esp given the single version line approach we are now taking).

I think CodePipeline is an example for how this can go really far, with classes sprinkled across the entire framework. Initially it felt like a good idea, but I am seeing @skinny85 to constantly needing to make modifications across the entire repo, which is not going to be very sustainable as we have more and more maintainers, so I can see the value in moving all of the codepipeline actions into a single module.

Discoverability is tricky. On one hand, when you know where to look, a single package with all integrations is highly discoverable. On the other hand, the IDE doesn't help you there. You will have addAction(action: IPipelineAction) and go figure where actions come from. But I guess docs can easily solve this and our docs can list all possible concrete implementations, so there's that.

@skinny85
Copy link
Contributor

I think CodePipeline is an example for how this can go really far, with classes sprinkled across the entire framework. Initially it felt like a good idea, but I am seeing @skinny85 to constantly needing to make modifications across the entire repo, which is not going to be very sustainable as we have more and more maintainers, so I can see the value in moving all of the codepipeline actions into a single module.

I think the reason for these repo-wide changes is because we were doing breaking changes to the API of the Actions. The moment we stabilize the API at 1.0, we will no longer need (or in fact even could) do these, and then the separation will be much more beneficial.

@eladb
Copy link
Contributor

eladb commented Feb 18, 2019

I still think it would make sense to bring in all the codepipeline stuff into a single library for discoverability purposes (similar to lambda-event-sources). Any reason why this won't be a more sustainable model?

@skinny85
Copy link
Contributor

I can think of 2 reasons against it:

  • The ownership model gets very fuzzy. The Actions were meant to be owned by the appropriate service teams. So you will have a package jointly owned (and collaborated on) by ~10 service teams. Seems like a bottleneck waiting to happen.
  • You will pull in pretty much the entire CDK Construct Library as dependencies if you want to use even a single Action.

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 18, 2019

The Actions were meant to be owned by the appropriate service teams.

I actually think this is backwards. The Actions encode something about the pipeline model owned by the pipeline team. Why should Lambda team own the class that generates a piece of CodePipeline's JSON model?

You will pull in pretty much the entire CDK Construct Library as dependencies if you want to use even a single Action.

Today I'm pulling in CodePipeline and all of its dependencies if I want to instantiate a Lambda (regardless of any pipelines). How does that make sense?

@skinny85
Copy link
Contributor

I actually think this is backwards. The Actions encode something about the pipeline model owned by the pipeline team. Why should Lambda team own the class that generates a piece of CodePipeline's JSON model?

It's not part of the CodePipeline JSON. It is 100% owned and maintained by the service team. The CodePipeline team does not have to know anything about that JSON for it to work (see: ECR Action that was added recently).

Today I'm pulling in CodePipeline and all of its dependencies if I want to instantiate a Lambda (regardless of any pipelines). How does that make sense?

You're not. You're just pulling the codepipeline-api package, which is very small, and doesn't have any dependencies (besides the foundational packages like IAM and Events).

@eladb
Copy link
Contributor

eladb commented Feb 27, 2019

@rix0rrr @skinny85 @RomainMuller I am leaning towards @rix0rrr's proposal to organize all integration types for a service in a central package.

The main factor in this decision to me (like most design decisions we have) is to try and put myself in the shoes of a user and ask what is their mental model when they want to find and use those classes, and what would be the natural name for these types.

For example, if I define a pipeline, there are a few things I want to be able to do, ideally without leaving my IDE:

  1. Find out what actions are supported by CodePipeline
  2. Use a specific CodePipeline action (e.g. Lambda or CloudFormation).

In both those scenarios, I would argue that the mental hierarchy is "pipeline => actions => specific-action". So pipeline-actions.InvokeLambdaAction is more natural than lambda.InvokeLambdaPipelineAction.

This approach also resolves some of the naming issues @rix0rrr raises: stepfunction-tasks.FargateRunTask is way cleaner than ecs.FargateRunStepFunctionsTask.

I am not very concerned about pulling in all the dependencies with the integration module. We have that today with lambda-event-sources and haven't seen any issues. If we do, we can consider @rix0rrr's idea of modeling those dependencies as "weak" dependencies ("peer dependencies") in supportive environments.


One thing we haven't discussed explicitly, but I am assuming it falls of this decision - does that mean that we now let go of all these adaptor interfaces with their ugly asXxx methods?

Do we want to get rid of eventRule.addTarget(lambda) so it becomes:

eventRule.addTarget(new event_targets.LambdaTarget(lambda));

Or do we leave it as an option for specific classes to implement IEventRuleTarget so they can be used as a target without an intermediator. I am leaning towards consistency...

What about our onXxx methods? Are we getting rid of those?

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 28, 2019

does that mean that we now let go of all these adaptor interfaces with their ugly asXxx methods

Imho, yes.

What about our onXxx methods? Are we getting rid of those?

We can't, really. It's similar to addTarget() in that sense, I feel. We still need something to represent the event, either a method or an object. We need either:

new LambdaTarget({ on: pipeline.failureEvent, lambda: lambda });

Or what we have today.

@eladb eladb added the @aws-cdk/core Related to core CDK functionality label Feb 28, 2019
@skinny85
Copy link
Contributor

Can we talk about what should be the name of these classes in languages that don't support qualified imports, like Java, Go (Python I believe as well?), etc.?

In the CodePipeline case, should the name be LambdaInvokeAction? LambdaInvokePipelineAction? Something else?

@eladb
Copy link
Contributor

eladb commented Feb 28, 2019

The name of the class should be LambdaInvokeAction.

In all languages types are qualified by the module they reside in. This is why we call the Lambda function Function. It is a basic notion in all languages, otherwise every class name in Java, for example, would have had all the qualifiers in the class name itself.

This is what my IDE shows me when I type Node:

screen shot 2019-02-28 at 8 26 02 pm

When I pick one of the options (based on the fully qualified name), the IDE will create an import for me that allows me to refer to this class as Node in my code. If I need another Node class from another namespace, I can use the fully qualified names.

To me this is one of the benefits of this approach. If this class would have resided under the aws-lambda module, we would need to include the CodePipeline terminology there (see @rix0rrr's notes about how messed up it can get with step functions and ECS).

@skinny85
Copy link
Contributor

I'm a little worried that the Action suffix is too ambiguous, and a different AWS service might come along and use that word in a different context, and we'll have a problem on our hands.

However, assuming you want to stick with LambdaInvokeAction, I think the best solution would be to have the central integrations package (codepipeline-actions) depend on all of the service packages, re-exporting the classes it needs, while keeping the integrations (Actions in case of CodePipeline) in their respective service packages.

It combines the strengths of both approaches: the ownership story is clear, discoverability is aided by the central package, while dependency-conscious customers can use the Actions only through the service packages, without pulling in the entire Construct Library along with codepipeline-actions.

The reason I think this is fine is because the names in this scheme are redundant anyway. If you assume Java/Go customers will use an unqualified LambdaInvokeAction, then obviously the TypeScript codepipeline_actions.LambdaInvokeAction is redundant (either way, I actually think making our names only good when using a particular namespace prefix is a mistake - we don't control that) - as redundant as lambda.LambdaInvokeAction.

I understand this is currently not possible with JSII, but I believe the advantages are large enough to at least try to make it work.

Would love to hear your opinion on this.

@eladb
Copy link
Contributor

eladb commented Feb 28, 2019

We won't be able to call it LambdaInvokeAction and have it located in the lambda package because we lose the CodePipeline lexical context, and then the collisions are going to be something we can't resolve (again, see @rix0rrr's example about StepFunctions and ECS).

Collisions on the consumer side can always be resolved by the user. So even if for some odd reason there will be two modules in our framework that has a class LambdaInvokeAction and a user wants to refer to both in the same source file, they can always use fully qualified names. It occasionally happens in Java and a common practice. I doubt that this will happen with this specific class name, but it can, and it's not a problem.

A dependency-only package with re-exports is not possible in jsii, and adding it only in TypeScript will break a fundamental concept of jsii that all languages have the exact type system because those re-exported types will be hidden in all languages besides TypeScript, so people will go to the documentation and will see a module called "codepipeline-actions", but it won't have any Java classes it in. That's not something we can afford. It might be possible to consider adding a type-aliasing feature in jsii but I don't see that happening in the short term, and I am not convinced that it's a good idea in general. The constraints we get from jsii have proven to be quite helpful at keeping our code organization simple and straightforward and I think that's an important aspect of the framework design.

I don't see the dependency issue as a major concern based on our experience so far, and the ownership story of "polluting" modules with integration classes throughout the entire framework is not more clear in my mind. If only, I suspect it might create confusion.

@skinny85
Copy link
Contributor

I don't see the dependency issue as a major concern based on our experience so far, and the ownership story of "polluting" modules with integration classes throughout the entire framework is not more clear in my mind. If only, I suspect it might create confusion.

I don't agree, but I'm willing to agree to disagree.

A consequence of this decision is that the toCodePipelineAction helpers that we have currently will have to be removed as well.

@skinny85
Copy link
Contributor

skinny85 commented Mar 4, 2019

To sum up the discussion: we're going with the 'single package for all integrations' approach.

@kevinslin
Copy link

seems like settled on a decision here. can this issue be closed?

@eladb
Copy link
Contributor

eladb commented Aug 7, 2019

Yes, thanks!

@eladb eladb closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality
Projects
None yet
Development

No branches or pull requests

4 participants