-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service Discovery for @plugins
#1091
Comments
I def think we need to execute on option 2. The added bonus scope here is this will enable us to generalize the underlying storage mechanism for discovery and, should we need to, change it out. SSM is great but folks might want to Secrets Manager or CloudMap and enabling that flexibility would be extra awesome. (Also love the idea of making this abstraction complete enough that we can begin moving core primitives into plugins!) |
One thing that we should keep in mind is ensuring a third party local development experience can be crafted with this mechanism. Not sure if that comes built-into this mechanism somehow and we keep the details transparent to the plugin author, or if we have to explicitly surface that. Use case to use as a thought experiment for this is the way DynamoDB tables work today. Based on the value of |
I ran into a similar issue while adding configurable file paths to our starter app repos. We need this service discovery for our local sandbox experience as well. Example: There are two ways to fix this:
@filmaj my 2 sounds a lot like your 1 Should we do both of your solutions? |
I'm thinking we may want to make sandbox WAY more powerful and expose an API from it…maybe on another port idk. But it'd be cool to just ask the sandbox for table names, etc since it already looked those up. We can switch on ARC_ENV to figure out if we're doing that at runtime. |
@brianleroux I agree that a sandbox api could be generally useful. Currently arc modules interrogate inventory for resources, but they still need to know where the arc manifest is to do so. *Locally sandbox finds the arc manifest at start time. Inside a Lambda all deps are copied to root or node_modules so we always know the arc manifest will be at How do we avoid collisions and duplication when plugins generate resources? |
@dam exactly why I bring this up. Since environment variables seem to be 'good enough' for folks today, is that sufficient? Is expanding arc with this 'service map' and runtime discovery ability for plugins going too far?
For plugin-generated lambdas, the logical IDs are pre-determined if you use the
As it stands, the current plugin |
@filmaj for my current use case an environment variable will work. |
As I've been building more and more plugins, I've ran into a limitation of the environment variable approach. As you add more plugins and/or macros, you can't rely on the technique of adding environment variables to all Lambdas in the CloudFormation template. Since each plugin or macro runs once, and in sequence, middle-ware style, each plugin's execution context is not aware of any plugins that may wire themselves up to the CloudFormation template in the future. If I have two plugins that register their services by leaving their fingerprints on every Lambda in your app's environment variables, the total set of Lambdas is incomplete for the first plugin by the time the second one is running. |
Building upon the idea of using environment variables, what if we used CloudFormation template's Arc would convert CFN Outputs into runtime environment variables for all lambdas in the app in a single pass after all macros/plugins have spit out their CFN modifications - including new Outputs. These Outputs get automatically mapped to env vars for all Lambda runtimes. |
Currently there's a max of 200 Outputs per Stack (quotas here). That should be enough environment variables for anyone, right? But, unfortunately, a 4kb limit on env vars per Lambda. Would that be enough? One benefit to this approach is a performance boost: One downside is that this makes the service map 'static' per deploy. I suppose it already is - arc doesn't provide an API to modify the SSM-based service map - but I never thought about making it dynamic. Not sure if that would be useful... |
I'm going to take a stab at rejigging arc/functions so that the SSM lookup happens once, at Lambda bootup, for all SSM parameters associated to the app, and assembles table, events, queues and plugins parameters in one go on startup. Currently, this happens separately for each of tables, events and queues, meaning, every time you invoke Instead, when arc/functions boots up, can we execute a single SSM recursive lookup for all parameters, assembles a 'service map', and uses that service map to create the usual arc/functions runtime interfaces for events, tables and queues (not to mention populate the service map for plugin use)? The premium support thing I linked to above about the 4kb env var max even says that if you need more than 4kb of env vars to use SSM 😂 . So, I think, moving forward, we want plugin authors to standardize on SSM because:
Might be tricky to implement it as the lookup is async, but I'll see what I can hack up. |
I hacked up a branch of Here's the compare view: https://github.com/architect/functions/compare/service-map?expand=1 It's mostly the same code, the main differences are:
So the idea is that if plugin authors want to surface some data about their plugin resources or services, they simply add an SSM Parameter whose name follows the format |
Think we'll want to defer init to the callee (talked about this a bit on slack) otherwise this hit happens every coldstart and larger graphs could be expensive. this does mean implementor needs to consider caching on the runtime lookup side. usually best done outside the handler logic so warm invocations can reuse the values. I'm going to take a stab on sandbox tomorrow with @kristoferjoseph … think we'll shim an endpoint |
Ya good point re: deferred init. I'll tackle updating that. Sounds good re: sandbox. Makes sense re: shimming the SSM params interface. Would need to mock a single api ( |
K I updated the service-map branch of functions with the 'deferred init' functionality. So, with the enhancement PR to sandbox adding to it a new service discovery internal arc API (architect/sandbox#557), I would say we should wire that up to the service discovery mechanism in arc. One way I can think of connecting that work with the plugin work is to add a new method to the plugins interface, perhaps FYI a very back-of-napkin benchmark but I did some testing with an arc app deployed to us-west-2 using the service-map branch of arc/functions. Doing the SSM round trip in a deployed app to pull in parameters takes about 100ms (and this is only for a single page of SSM parameters; some of these operations may require multiple calls to SSM as SSM parameter search only supports retrieving 10 parameters at a time). With those timings in mind, I am thinking that perhaps in the future we could make this service discovery mechanism in arc smarter so that it can leverage one or more backing services to do service discovery. I am thinking we could try to leverage environment variables in functions as much as we can up to its maximum (of 4kb of env var data per function) before deferring to SSM. There is zero latency in using environment variables. Just some food for thought. |
An idea to cut potential SSM latency: aggregate all service discovery into a single parameter (as json) instead of across many paeans potentially requiring pagination. |
Maximum free tier parameter value size is 4kb - same as environment variables. There is an 'advanced tier' for Parameter Store that bumps this limit to 8kb that charges $0.05 per paremeter per month, and $0.05 per 10,000 API interactions, but I don't think that's feasible for arc, since every (pessimistic/non-cached) lambda invocation would very likely require that API call. Given these constraints, still think we should look at creating a standard interface in arc that leverages environment variables for service discovery - but really this would be more like deploy-time config injection and less like a dynamic service discovery mechanism. |
falls apart at scale also (eg. begin we'd be over the env var limits too) … think SSM is the right solution and caching values on cold start makes it a mostly rare 100ms hit which in most cases is totally acceptable. |
aight |
Regarding this point I made earlier:
I've prototyped this out in
Finally, what would this look like from the plugin author/module point of view? Like so:
What do you think? Based on this, I can write a client library/wrapper for my s3 image bucket plugin that uses |
Bumping this thread! @ryanblock / @brianleroux WDYT about my last comment? I like that this gives me as a plugin consumer transparent plugin support in both sandbox and deployed contexts. I dislike adding another method that plugin authors must implement. I'm torn! |
This is dope! I prototyped a mocking SSM this weekend instead of /_asd, want to look at that as an alt approach? |
Yes! |
Thanks for the PR prototype for the the SSM mock in sandbox @ryanblock ! (for reference, that is here: architect/sandbox#563) With that PR as a base, I will continue with a service discovery prototype using my above-mentioned new
Will keep everyone posted on progress in here as I move forward. |
Took a first stab at writing up the plugin and (node) runtime documentation expansion to tackle this: architect/arc.codes#341 |
I was having a read through of the first draft of the docs and I'm not sure about the Thoughts / preferences? |
Went ahead and updated the docs to collapse the two into a single |
Updated the arc/functions API (as described in the docs) here: architect/functions#413 |
OK 4 PRs are up, can start reviewing them for anyone interested:
Going to work on updating the tests for these as folks hopefully can drop reviews. I got the above combination working for an S3 Image Bucket plugin I have been working on, both locally and remotely. |
Note to self: need to do a once-over of the docs for this and see what's missing. I think at a minimum the interfaces for extending the built-in http/events/tables services. |
K expanded on the docs re: built-in sandbox services. I think the PRs are ready to merge, or at least cut an RC with. |
Update: now have 5 PRs up related to this work, can start reviewing them for anyone interested:
The I also just updated the docs PR with all of these latest Need to still work on the sandbox PR to get the tests passing. Inching closer! |
OK, Sandbox tests passing, I added a few more unit tests to the new modules and one extra integration test. I think this is ready to cut an RC from. |
A few final todo items:
Aiming to cut an RC from this tomorrow. |
Alright, this is now merged into all the master branches and deployed as:
|
This is related to the
@plugins
work currently in a release candidate (arc v8.5.0-RC.3). The RFC for the@plugins
work is in #1062.First came
@macros
, soon we will have@plugins
, but we're always looking to expand capabilities and tackle interesting use cases. Early on we identified that@plugins
may need a runtime component, that is, an ability to interact with the service(s) exposed by the plugin during runtime from within a Lambda. In my arc adventures, I have recently hit the need for this ability, so I am here to sketch out the two rough solutions I can imagine and collect input from the community - whether different approaches or feedback on the existing approaches I list out.To clarify the scope of this issue, let me use an existing feature of arc as a means of detailing what I mean by service discovery and runtime capabilities:
@tables
.@tables
allows someone to define DynamoDB tables, but also, via the use of@architect/functions
, gives other Lambdas within your project runtime access to these DynamoDB tables via an arc-specific API.So this issue / brainstorm is about discussing how can we offer this capability to third party architect plugins?
I have two options to offer, one unstructured and one structured. This is not an exhaustive list, just what was banging around in my head. I hope this issue can be a brainstorming session 😄
1. Environment variables
I have seen this technique employed in the wild with existing
@macros
: the solution for runtime discovery is to drop an environment variable into the CloudFormation template for your application's stack that contains the physical ID of the service you are wrapping. All Lambdas have access to this environment variable, and thus can use the AWS SDK to directly reference the service via its physical ID. For an example of this approach, see the macro-upload macro, which add an S3 bucket physical ID to the environment variables for all defined Lamdas.2. Generalize the existing arc service discovery mechanism.
The way architect implements service discovery today is composed of two parts:
package
time (right before deploying),package
will look at the inventory of your application and, depending on what it contains, will create SSM Parameters that follow a specific naming convention. The values for these parameters contain the physical IDs of the resources in question. The parameter names follow a format likeCloudFormationStackName/resourceType/resourceName
, i.e.MyAppStaging/events/email-outbox
. Physical IDs are needed to interact with the resource at runtime via the AWS SDK.@architect/functions
will search these parameters and create a hash table / object lookup.The (current) limitation for this working-today service discovery mechanism is that it is scoped only to three built-in arc pragmas:
@tables
,@events
and@queues
.To generalize the first part for use with plugins (creating the SSM Parameters), perhaps the plugin interface could be expanded with another method, maybe
plugins.services
, which is a function that the plugin author must implement if they are adding resources to the CloudFormation stack whose physical IDs they want to expose at runtime. This function accepts as arguments the parsed arc project manifest and app inventory, and is expected to return an array of logical IDs used to represent the services. Thepackage
SSM assembler can then leverage this method to create an SSM Parameter per logical ID exposed in this way, right before deploy, containing the physical ID of the AWS resource. These plugin-specific Parameters can be namespaced so it is clear they are plugin IDs, could be named e.g.CloudFormationStackName/plugins/pluginName/resourceName
, i.e.MyAppStaging/plugins/copper-plugin-cognito-user-pool/MyUserPool
.To generalize the second part for use with plugins (
@architect/functions
retrieving the physical IDs), the same mechanism could be employed as exists today: at runtime within a running Lambda context functions canlookup
parameters for the typeplugins
, construct the hash map of arc-project-resource-names to physical IDs, and provide that object on thearc/functions
module itself, i.e.arc.services
. So a parameter with nameMyAppStaging/plugins/copper-plugin-cognito-user-pool/MyUserPool
and value containing physical ID for the Cognito User Pool "MyUserPool" would yield a service map in arc functions that would contain the physical ID through a reference likearc.services.plugins['copper-plugin-cognito-user-pool'].MyUserPool
.Using the
arc.services
"service map" at runtime, any Lambda could then use the AWS SDK to interact with the service in question. Perhaps there is a stretch/bonus goal available here for somehow mounting a plugin-authored API wrapper around instantiating the AWS SDK resource / instance appropriately, but that is IMO syntactical sugar, plus we'd need to tackle the issue of how to distribute the plugin API wrapper across all project Lambdas (src/shared
? munging thearchitect/functions
module packaged in each Lambda with the plugin API wrapper?).I would prefer to aim for the latter option. One goal I have in my mind for arc plugins is to get to a reality where all built-in arc pragmas are authored as plugins. I think that yields a clean API and implementation and also provides loads of examples to the arc community for how to author their own plugins.
The text was updated successfully, but these errors were encountered: