Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Discovery for @plugins #1091

Closed
filmaj opened this issue Mar 8, 2021 · 35 comments
Closed

Service Discovery for @plugins #1091

filmaj opened this issue Mar 8, 2021 · 35 comments
Assignees

Comments

@filmaj
Copy link
Member

filmaj commented Mar 8, 2021

This is related to the @plugins work currently in a release candidate (arc v8.5.0-RC.3). The RFC for the @plugins work is in #1062.

First came @macros, soon we will have @plugins, but we're always looking to expand capabilities and tackle interesting use cases. Early on we identified that @plugins may need a runtime component, that is, an ability to interact with the service(s) exposed by the plugin during runtime from within a Lambda. In my arc adventures, I have recently hit the need for this ability, so I am here to sketch out the two rough solutions I can imagine and collect input from the community - whether different approaches or feedback on the existing approaches I list out.

To clarify the scope of this issue, let me use an existing feature of arc as a means of detailing what I mean by service discovery and runtime capabilities: @tables. @tables allows someone to define DynamoDB tables, but also, via the use of @architect/functions, gives other Lambdas within your project runtime access to these DynamoDB tables via an arc-specific API.

So this issue / brainstorm is about discussing how can we offer this capability to third party architect plugins?

I have two options to offer, one unstructured and one structured. This is not an exhaustive list, just what was banging around in my head. I hope this issue can be a brainstorming session 😄

1. Environment variables

I have seen this technique employed in the wild with existing @macros: the solution for runtime discovery is to drop an environment variable into the CloudFormation template for your application's stack that contains the physical ID of the service you are wrapping. All Lambdas have access to this environment variable, and thus can use the AWS SDK to directly reference the service via its physical ID. For an example of this approach, see the macro-upload macro, which add an S3 bucket physical ID to the environment variables for all defined Lamdas.

  • Pros: no API to maintain in arc, this whole topic becomes a userland problem, no need for API calls to AWS at runtime
  • Cons: no structure, potential for multiple plugins in the future to clobber over identical environment variable names that they use, no sanctioned / golden path for plugin authors to follow

2. Generalize the existing arc service discovery mechanism.

The way architect implements service discovery today is composed of two parts:

  1. At package time (right before deploying), package will look at the inventory of your application and, depending on what it contains, will create SSM Parameters that follow a specific naming convention. The values for these parameters contain the physical IDs of the resources in question. The parameter names follow a format like CloudFormationStackName/resourceType/resourceName, i.e. MyAppStaging/events/email-outbox. Physical IDs are needed to interact with the resource at runtime via the AWS SDK.
  2. On instantiation at runtime within a running Lambda context, @architect/functions will search these parameters and create a hash table / object lookup.

The (current) limitation for this working-today service discovery mechanism is that it is scoped only to three built-in arc pragmas: @tables, @events and @queues.

To generalize the first part for use with plugins (creating the SSM Parameters), perhaps the plugin interface could be expanded with another method, maybe plugins.services, which is a function that the plugin author must implement if they are adding resources to the CloudFormation stack whose physical IDs they want to expose at runtime. This function accepts as arguments the parsed arc project manifest and app inventory, and is expected to return an array of logical IDs used to represent the services. The package SSM assembler can then leverage this method to create an SSM Parameter per logical ID exposed in this way, right before deploy, containing the physical ID of the AWS resource. These plugin-specific Parameters can be namespaced so it is clear they are plugin IDs, could be named e.g. CloudFormationStackName/plugins/pluginName/resourceName, i.e. MyAppStaging/plugins/copper-plugin-cognito-user-pool/MyUserPool.

To generalize the second part for use with plugins (@architect/functions retrieving the physical IDs), the same mechanism could be employed as exists today: at runtime within a running Lambda context functions can lookup parameters for the type plugins, construct the hash map of arc-project-resource-names to physical IDs, and provide that object on the arc/functions module itself, i.e. arc.services. So a parameter with name MyAppStaging/plugins/copper-plugin-cognito-user-pool/MyUserPool and value containing physical ID for the Cognito User Pool "MyUserPool" would yield a service map in arc functions that would contain the physical ID through a reference like arc.services.plugins['copper-plugin-cognito-user-pool'].MyUserPool.

Using the arc.services "service map" at runtime, any Lambda could then use the AWS SDK to interact with the service in question. Perhaps there is a stretch/bonus goal available here for somehow mounting a plugin-authored API wrapper around instantiating the AWS SDK resource / instance appropriately, but that is IMO syntactical sugar, plus we'd need to tackle the issue of how to distribute the plugin API wrapper across all project Lambdas (src/shared? munging the architect/functions module packaged in each Lambda with the plugin API wrapper?).


I would prefer to aim for the latter option. One goal I have in my mind for arc plugins is to get to a reality where all built-in arc pragmas are authored as plugins. I think that yields a clean API and implementation and also provides loads of examples to the arc community for how to author their own plugins.

@filmaj filmaj self-assigned this Mar 8, 2021
@brianleroux
Copy link
Member

I def think we need to execute on option 2. The added bonus scope here is this will enable us to generalize the underlying storage mechanism for discovery and, should we need to, change it out. SSM is great but folks might want to Secrets Manager or CloudMap and enabling that flexibility would be extra awesome. (Also love the idea of making this abstraction complete enough that we can begin moving core primitives into plugins!)

@filmaj
Copy link
Member Author

filmaj commented Mar 9, 2021

One thing that we should keep in mind is ensuring a third party local development experience can be crafted with this mechanism. Not sure if that comes built-into this mechanism somehow and we keep the details transparent to the plugin author, or if we have to explicitly surface that.

Use case to use as a thought experiment for this is the way DynamoDB tables work today. Based on the value of NODE_ENV, arc/functions will look to connect to either a local dynalite or a remote DynamoDB instance. This kind of support would be a prerequisite, I think. So, reasoning through not just the way in which service discovery works in a plugin-based reality, but also how it interacts with the existing plugin sandbox hooks.

@kristoferjoseph
Copy link
Contributor

kristoferjoseph commented Mar 12, 2021

I ran into a similar issue while adding configurable file paths to our starter app repos.

We need this service discovery for our local sandbox experience as well.

Example:
When a user has configured a file path say /api instead of the default src/http/get-api/ and is using the root package.json as the projects Architect manifest then @begin/data looks in the wrong place for the Architect manifest resulting in a reference error when it can't find a table named "data".

There are two ways to fix this:

  1. We check a bunch of places and let Satan take the wheel.
  • This will work most of the time, until someone has configured a deeply nested directory structure.
  1. We write an environment variable from sandbox when it starts up that is the path to the Architect manifest file to use.

@filmaj my 2 sounds a lot like your 1

Should we do both of your solutions?
I can add writing an environment variable today.

@brianleroux
Copy link
Member

I'm thinking we may want to make sandbox WAY more powerful and expose an API from it…maybe on another port idk. But it'd be cool to just ask the sandbox for table names, etc since it already looked those up. We can switch on ARC_ENV to figure out if we're doing that at runtime.

@kristoferjoseph
Copy link
Contributor

kristoferjoseph commented Mar 12, 2021

@brianleroux I agree that a sandbox api could be generally useful.

Currently arc modules interrogate inventory for resources, but they still need to know where the arc manifest is to do so.
We could make a runtime helper that is *arc manifest location aware in order to solely return inventory data.
This runtime helper would have access to a map of resources and IDs.

*Locally sandbox finds the arc manifest at start time. Inside a Lambda all deps are copied to root or node_modules so we always know the arc manifest will be at process.cwd()

How do we avoid collisions and duplication when plugins generate resources?
Do these plugins have access to life-cycle hooks for the CI-CD flow?

@filmaj
Copy link
Member Author

filmaj commented Mar 13, 2021

Should we do both of your solutions?
I can add writing an environment variable today.

@dam exactly why I bring this up. Since environment variables seem to be 'good enough' for folks today, is that sufficient? Is expanding arc with this 'service map' and runtime discovery ability for plugins going too far?

How do we avoid collisions and duplication when plugins generate resources?

For plugin-generated lambdas, the logical IDs are pre-determined if you use the package-provided helper function. But anything else is a wild west.

Do these plugins have access to life-cycle hooks for the CI-CD flow?

As it stands, the current plugin package method (and @macros method) executes pre-deploy. Plugins also now have sandbox startup and shutdown hooks. I could see this list of hooks expanding (pre and post destroy, for example).

@kristoferjoseph
Copy link
Contributor

@filmaj for my current use case an environment variable will work.
For plugins I don't think it is overkill since I can see plugins having more specific use cases that would be hard to satisfy with just an environment variable.

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

As I've been building more and more plugins, I've ran into a limitation of the environment variable approach. As you add more plugins and/or macros, you can't rely on the technique of adding environment variables to all Lambdas in the CloudFormation template. Since each plugin or macro runs once, and in sequence, middle-ware style, each plugin's execution context is not aware of any plugins that may wire themselves up to the CloudFormation template in the future. If I have two plugins that register their services by leaving their fingerprints on every Lambda in your app's environment variables, the total set of Lambdas is incomplete for the first plugin by the time the second one is running.

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

Building upon the idea of using environment variables, what if we used CloudFormation template's Outputs section as a means of injecting environment variables into all of an arc app's runtime functions? It's familiar to macro and plugin authors, and intended for descriptive information about the makeup of a stack - exactly what we need solved. The CloudFormation Outputs docs second sentence reads "For example, you can output the S3 bucket name for a stack to make the bucket easier to find."

Arc would convert CFN Outputs into runtime environment variables for all lambdas in the app in a single pass after all macros/plugins have spit out their CFN modifications - including new Outputs. These Outputs get automatically mapped to env vars for all Lambda runtimes.

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

Currently there's a max of 200 Outputs per Stack (quotas here). That should be enough environment variables for anyone, right? But, unfortunately, a 4kb limit on env vars per Lambda. Would that be enough?

One benefit to this approach is a performance boost: arc/functions no longer has to do the SSM lookup when lambdas boot up (instead uses environment variables, they're already there). I would guess that would shave off a few ms?

One downside is that this makes the service map 'static' per deploy. I suppose it already is - arc doesn't provide an API to modify the SSM-based service map - but I never thought about making it dynamic. Not sure if that would be useful...

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

I'm going to take a stab at rejigging arc/functions so that the SSM lookup happens once, at Lambda bootup, for all SSM parameters associated to the app, and assembles table, events, queues and plugins parameters in one go on startup. Currently, this happens separately for each of tables, events and queues, meaning, every time you invoke arc.events or queues' publish, we do a separate SSM lookup (in fact, every time you call publish with a different topic name, it does a separate SSM lookup).

Instead, when arc/functions boots up, can we execute a single SSM recursive lookup for all parameters, assembles a 'service map', and uses that service map to create the usual arc/functions runtime interfaces for events, tables and queues (not to mention populate the service map for plugin use)?

The premium support thing I linked to above about the 4kb env var max even says that if you need more than 4kb of env vars to use SSM 😂 . So, I think, moving forward, we want plugin authors to standardize on SSM because:

  1. There is theoretically no limit on them, and as arc app users consume more and more plugins, we will quickly hit that limit, and
  2. No race condition / order-of-plugins issues when it comes to ensuring parameters are distributed across all lambdas.

Might be tricky to implement it as the lookup is async, but I'll see what I can hack up.

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

I hacked up a branch of arc/functions to use the generalized service discovery. Does the discovery once on bootup (so we save on SSM roundtrips). Identical existing events/queues/tables API. But now there's also an arc.services promise that you can get a service map from (which will resolve once the SSM recursive get completes on bootup, and will return synchronously any time it is invoked afterwards).

Here's the compare view: https://github.com/architect/functions/compare/service-map?expand=1

It's mostly the same code, the main differences are:

  1. The services promise instantiated in src/index.js.

  2. How events, tables, queues use this services promise instead of calling into lookup (which does the recursive SSM search)

  3. The SSM lookup is generalized to support a service map of arbitrary depths by mapping each / character in SSM names as being something one level deeper. To date all SSM names have been of the form /<CloudFormationStackName>/<ResourceType>/<ResourceName>. This is still supported, but if you want to nest that deeper (for whatever reason), you can. This parameter name format is compiled by lookup to yield a single object, aka the service map, that looks something like this:

     {
       events: {
         sum: 'arn:aws:sns:us-west-2:417817716689:PluginTimestreamDemoStaging-SumEventTopic-DGK81FX4TO5T'
       },
       queues: {
         ting: 'https://sqs.us-west-2.amazonaws.com/417817716689/PluginTimestreamDemoStaging-TingQueue-W4HZTI3LM2V7.fifo'
       },
       static: {
         bucket: 'plugintimestreamdemostaging-staticbucket-juq4ab2l4fry',
         fingerprint: 'true'
       },
       tables: { data: 'PluginTimestreamDemoStaging-DataTable-AL710VWYNI9M' }
     }
    

So the idea is that if plugin authors want to surface some data about their plugin resources or services, they simply add an SSM Parameter whose name follows the format /<CloudFormationStackName>/<ResourceType>/<ResourceName> and whose value includes whatever the information wanting to be surfaced at runtime needs to be: physical IDs, ARNs, URLs, names, etc.

@brianleroux
Copy link
Member

Think we'll want to defer init to the callee (talked about this a bit on slack) otherwise this hit happens every coldstart and larger graphs could be expensive. this does mean implementor needs to consider caching on the runtime lookup side. usually best done outside the handler logic so warm invocations can reuse the values.

I'm going to take a stab on sandbox tomorrow with @kristoferjoseph … think we'll shim an endpoint /_arc that returns a payload that looks like above for local resource discovery. A future version we could/maybe should the implement SSM interface so the runtime lookup semantics can be shared.

@filmaj
Copy link
Member Author

filmaj commented Mar 16, 2021

Ya good point re: deferred init. I'll tackle updating that.

Sounds good re: sandbox. Makes sense re: shimming the SSM params interface. Would need to mock a single api (getParametersByPath)

@filmaj
Copy link
Member Author

filmaj commented Mar 18, 2021

K I updated the service-map branch of functions with the 'deferred init' functionality.

So, with the enhancement PR to sandbox adding to it a new service discovery internal arc API (architect/sandbox#557), I would say we should wire that up to the service discovery mechanism in arc. One way I can think of connecting that work with the plugin work is to add a new method to the plugins interface, perhaps dataExports or serviceDiscovery, that would "export" whatever data the plugin needs for use in runtime functions. The plugin author would implement this method to return a JS object of mapping names to values of scalars (strings, numbers, booleans). package would use this and automatically assemble SSM parameters for each plugin before a deploy, and sandbox would use this when it mounts the new /_ard API. This way plugins could implement runtime clients/libraries for use with their plugin that would work in both deployed and sandbox contexts.

FYI a very back-of-napkin benchmark but I did some testing with an arc app deployed to us-west-2 using the service-map branch of arc/functions. Doing the SSM round trip in a deployed app to pull in parameters takes about 100ms (and this is only for a single page of SSM parameters; some of these operations may require multiple calls to SSM as SSM parameter search only supports retrieving 10 parameters at a time).

With those timings in mind, I am thinking that perhaps in the future we could make this service discovery mechanism in arc smarter so that it can leverage one or more backing services to do service discovery. I am thinking we could try to leverage environment variables in functions as much as we can up to its maximum (of 4kb of env var data per function) before deferring to SSM. There is zero latency in using environment variables. Just some food for thought.

@ryanblock
Copy link
Member

An idea to cut potential SSM latency: aggregate all service discovery into a single parameter (as json) instead of across many paeans potentially requiring pagination.

@filmaj
Copy link
Member Author

filmaj commented Mar 18, 2021

Maximum free tier parameter value size is 4kb - same as environment variables. There is an 'advanced tier' for Parameter Store that bumps this limit to 8kb that charges $0.05 per paremeter per month, and $0.05 per 10,000 API interactions, but I don't think that's feasible for arc, since every (pessimistic/non-cached) lambda invocation would very likely require that API call.

Given these constraints, still think we should look at creating a standard interface in arc that leverages environment variables for service discovery - but really this would be more like deploy-time config injection and less like a dynamic service discovery mechanism.

@brianleroux
Copy link
Member

falls apart at scale also (eg. begin we'd be over the env var limits too) … think SSM is the right solution and caching values on cold start makes it a mostly rare 100ms hit which in most cases is totally acceptable.

@filmaj
Copy link
Member Author

filmaj commented Mar 18, 2021

aight

@filmaj
Copy link
Member Author

filmaj commented Mar 19, 2021

Regarding this point I made earlier:

So, with the enhancement PR to sandbox adding to it a new service discovery internal arc API (architect/sandbox#557), I would say we should wire that up to the service discovery mechanism in arc. One way I can think of connecting that work with the plugin work is to add a new method to the plugins interface, perhaps dataExports or serviceDiscovery, that would "export" whatever data the plugin needs for use in runtime functions. The plugin author would implement this method to return a JS object of mapping names to values of scalars (strings, numbers, booleans). package would use this and automatically assemble SSM parameters for each plugin before a deploy, and sandbox would use this when it mounts the new /_ard API. This way plugins could implement runtime clients/libraries for use with their plugin that would work in both deployed and sandbox contexts.

I've prototyped this out in deploy, sandbox and functions:

  1. deploy will now look for a variables function in the plugin module export and assemble SSM Parameters based on its output. This will execute as almost the last thing before deploy hands everything off to CloudFormation (see this deploy diff I just pushed up)
  2. sandbox's new /_asd API will look for this same variables function and register the plugin variables in the service map sandbox keeps (see this sandbox addition I just pushed up)
  3. functions will have a tweaked service discovery lookup mechanism that searches SSM Parameters when not running locally, otherwise will ask sandbox's /_asd API for the service map (see this functions commit I just pushed up)

Finally, what would this look like from the plugin author/module point of view? Like so:

module.exports = {
  variables: function s3ImageBucketVars ({ arc, stage /* , inventory */ }) {
    if (!arc['image-bucket']) return {};
    const isLocal = stage === 'testing';
    const bukkit = `${arc.app}-image-buket`;
    // we need to expose the S3 bucket name, access key and secret key for uploading to the bucket
    return {
      accessKey: isLocal ? 'S3RVER' : { Ref: 'ImageBucketCreds' },
      name: bukkit,
      secretKey: isLocal ? 'S3RVER' : { 'Fn::GetAtt': [ 'ImageBucketCreds', 'SecretAccessKey' ] }
    };
  },
  ... other plugin methods ...

What do you think?

Based on this, I can write a client library/wrapper for my s3 image bucket plugin that uses arc/functions service discovery / service map in BOTH sandbox and deployed contexts, and provide an identical local dev experience in sandbox as the functionality when the plugin is deployed to AWS.

@filmaj
Copy link
Member Author

filmaj commented Apr 1, 2021

Bumping this thread! @ryanblock / @brianleroux WDYT about my last comment?

I like that this gives me as a plugin consumer transparent plugin support in both sandbox and deployed contexts. I dislike adding another method that plugin authors must implement. I'm torn!

@ryanblock
Copy link
Member

This is dope! I prototyped a mocking SSM this weekend instead of /_asd, want to look at that as an alt approach?

@filmaj
Copy link
Member Author

filmaj commented Apr 1, 2021

Yes!

@filmaj
Copy link
Member Author

filmaj commented Apr 8, 2021

Thanks for the PR prototype for the the SSM mock in sandbox @ryanblock ! (for reference, that is here: architect/sandbox#563)

With that PR as a base, I will continue with a service discovery prototype using my above-mentioned new variables plugin method interface. Summary of required and completed work on the prototype thus far:

  • deploy support (of variables being translated into SSM parameters in CloudFormation transparently to the plugin author) would remain as-is.
  • the sandbox SSM mock would need to be expanded to reach into plugins and use their variables method and wire those up to the fake SSM server
  • functions would need tweaking to use the sandbox SSM mock instead of the old /_asd route prototype
  • expand on the arc.codes plugin docs to describe how to use all of this (first stab at service discovery documentation arc.codes#341)
  • all of the above modules would need new tests to cover all of this - tho maybe I wait on implementing these until after the prototype is complete in case we decide the API needs changing (wouldn't surprise me tbh)

Will keep everyone posted on progress in here as I move forward.

@filmaj
Copy link
Member Author

filmaj commented Apr 9, 2021

Took a first stab at writing up the plugin and (node) runtime documentation expansion to tackle this: architect/arc.codes#341

@filmaj
Copy link
Member Author

filmaj commented Apr 13, 2021

I was having a read through of the first draft of the docs and I'm not sure about the arc.services / await arc._loadServices() runtime API. Feels like should combine that into a single let services = await arc.services() method for symmetry with arc.tables().

Thoughts / preferences?

@filmaj
Copy link
Member Author

filmaj commented Apr 13, 2021

Went ahead and updated the docs to collapse the two into a single let services = await arc.services() call. Cleaner and consistent with the rest of the arc/functions API.

@filmaj
Copy link
Member Author

filmaj commented Apr 13, 2021

Updated the arc/functions API (as described in the docs) here: architect/functions#413

@filmaj
Copy link
Member Author

filmaj commented Apr 16, 2021

OK 4 PRs are up, can start reviewing them for anyone interested:

Going to work on updating the tests for these as folks hopefully can drop reviews.

I got the above combination working for an S3 Image Bucket plugin I have been working on, both locally and remotely.

@filmaj
Copy link
Member Author

filmaj commented Apr 20, 2021

Note to self: need to do a once-over of the docs for this and see what's missing. I think at a minimum the interfaces for extending the built-in http/events/tables services.

@filmaj
Copy link
Member Author

filmaj commented Apr 20, 2021

K expanded on the docs re: built-in sandbox services. I think the PRs are ready to merge, or at least cut an RC with.

@filmaj
Copy link
Member Author

filmaj commented Apr 23, 2021

Update: now have 5 PRs up related to this work, can start reviewing them for anyone interested:

The package PR includes the renamed createLambdaJSON -> createFunction helper method. This PR needs to be merged/deployed first before it can be leveraged in the deploy PR.

I also just updated the docs PR with all of these latest createFunction and invokeFunction additions.

Need to still work on the sandbox PR to get the tests passing.

Inching closer!

@filmaj
Copy link
Member Author

filmaj commented Apr 27, 2021

OK, Sandbox tests passing, I added a few more unit tests to the new modules and one extra integration test.

I think this is ready to cut an RC from.

@filmaj
Copy link
Member Author

filmaj commented May 11, 2021

A few final todo items:

Aiming to cut an RC from this tomorrow.

@filmaj
Copy link
Member Author

filmaj commented May 25, 2021

Alright, this is now merged into all the master branches and deployed as:

  • functions 3.14.0
  • architect 8.6.0, which bundles:
    • inventory 1.3.3
    • package 6.2.1
    • hydrate 1.9.11
    • create 1.4.2
    • deploy 2.6.0
    • sandbox 3.6.0
    • destroy 1.2.1
    • env 1.2.1
    • logs 2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants