Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws-lambda) Lambda authentication via EC2 IAM roles #2777

Closed

Conversation

Mgutjahr
Copy link

@Mgutjahr Mgutjahr commented Aug 4, 2017

Summary

Adds support for querying temporary credentials from EC2 metadata service instead of fetching long-lived key-pairs from the database.

The credentials are fetched from the EC2 metadata service and are cached for 60 seconds. I assume that many users are running kong on EC2 instances. For these users this is the recommended way to authenticate requests for AWS services as key-rotation and credential provisioning are managed by AWS.

For further details refer to the AWS documentation about EC2 roles

Full changelog

  • Added kong.plugins.aws-lambda.iam-role-credentials for fetching and caching temporary credentials from the metadata service (default 169.254.169.254 port 80)
  • Added plugins config flag use_ec2_iam_role to kong.plugins.aws-lambda.schema and declared aws_key and aws_secret as optional
  • Added self_check function to check whether either use_ec2_iam_role is set to true or both aws_key and aws_secret as configured
  • Added if branch in kong.plugins.aws-lambda.handler to either fetch credentials from the metadata service or use credentials stored in database

Copy link
Contributor

@p0pr0ck5 p0pr0ck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Mgutjahr for the PR! Overall I think this will be a very useful addition. A handful of mostly minor style/nitpick comments attached. Additionally, we would like to see tests, both for the schema change, and an integration test for the plugin behavior change as well. Admittedly mocking out something for this will be challenging; perhaps draw on something like #2287 for past discussions of how we've handled tests for this plugin.

}
},
self_check = function(schema, plugin_t, dao, is_update)
if not plugin_t["use_ec2_iam_role"] or plugin_t["use_ec2_iam_role"] == false then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use the dot notation to access table members in this case (e.g. plugin_t.use_ec2_iam_role)

local iam_role_credentials = get_credentials_from_iam_role()
opts.access_key = iam_role_credentials.access_key
opts.secret_key = iam_role_credentials.secret_key
opts.headers['X-Amz-Security-Token'] = iam_role_credentials.session_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor style nitpick, prefer to use double quotes instead of single quotes.


local CACHE_IAM_INSTANCE_CREDS_DURATION = 60 -- seconds to cache credentials from metadata service

local function get_iam_credentials_cache_key()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably be a constant instead of a function


local function get_iam_credentials_from_instance_profile()
local iam_profile_credentials, err = cache.get_or_set(get_iam_credentials_cache_key(), CACHE_IAM_INSTANCE_CREDS_DURATION,
fetch_iam_credentials_from_metadata_service)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see the fetch_iam_credentials_from_metadata_service call explicitly defining params, instead of relying on the defaults defined in the function itself. this would likely make testing easier.

iam_role_name .. "' with session token " .. iam_security_token_data['Token'])

return {
access_key = iam_security_token_data['AccessKeyId'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: prefer to use dot notation to access these field members

iam_role_name .. "' with session token " .. iam_security_token_data['Token'])

return {
access_key = iam_security_token_data['AccessKeyId'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style: spacing seems off here, please use two spaces to indent.

@@ -0,0 +1,91 @@
local http = require 'resty.http'
local json = require 'cjson'
local cache = require "kong.tools.database_cache"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style comment: please align assignment operators here

metadata_service_port = metadata_service_port or 80

local client = http.new()
local ok, err = client:connect(metadata_service_host, metadata_service_port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: please align assignment operators here

end

if role_name_request_res.status ~= 200 then
return nil, "[aws-lambda] Fetching role name from metadata service returned status code ".. role_name_request_res.status ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: can we try to adjust line breaks and such so as to avoid the line being much longer than 80 chars


local iam_security_token_data = json.decode(iam_security_token_request:read_body())

ngx.log(ngx.INFO, "[aws-lambda] Received temporary IAM credential from metadata service for role '" ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also seems like it should be a debug level log entry

@p0pr0ck5 p0pr0ck5 added pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. and removed pr/status/please review labels Aug 4, 2017
@Mgutjahr
Copy link
Author

Mgutjahr commented Aug 7, 2017

Hi @p0pr0ck5,
thanks for the detailed feedback and for pointing out references to possible solutions!

I addressed most of your recommendations and fixed the mentioned style issues. On the testing side I added a tests for the schema extension. In addition, I added an integration test which loads credentials from a mocked-out metadata service (using the same approach aws-lambda uses)

I decided not to add support for configuring the role the plugin should load as one EC2 instance can only have role configured at a time. This can be found from the AWS documentation:

An instance profile can contain only one IAM role. This limit cannot be increased.

This is also how almost all of the available AWS SDK's decided to implement support for instance roles.

Sorry again for the style issues and the missing tests.

@p0pr0ck5 p0pr0ck5 added pr/status/please review and removed pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. labels Aug 7, 2017
Copy link
Contributor

@p0pr0ck5 p0pr0ck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mgutjahr thanks for the refactors! a lot of these comments are ticky-tack style notes, just good to be consistent, certainly not a huge problem. we appreciate your feedback! at the moment we are in a merge freeze on master as the release of 0.11 stable is closing in (i believe @thibaultcha should have details on that), but barring that i think this is close to being merged. thanks again for the contribution!

@@ -81,4 +85,14 @@ return {
func = check_status,
},
},
self_check = function(schema, plugin_t, dao, is_update)
if not plugin_t.use_ec2_iam_role or plugin_t.use_ec2_iam_role == false then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this second condition is redundant, if not should handle both nil and false

metadata_service_host = metadata_service_host or '169.254.169.254'
metadata_service_port = metadata_service_port or 80

local iam_profile_credentials, err = cache.get_or_set(IAM_CREDENTIALS_CACHE_KEY, CACHE_IAM_INSTANCE_CREDS_DURATION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note here, this will require refactoring for the upcoming Kong 0.11, as the cache mechanism has changed and this function call is no longer valid. We should probably have PRs for 0.10.x and 0.11 and merge them simultaneously, with the latter being generated from rebasing this commit in its final form onto next and adjusting as needed. this might need coordination/communication with @thibaultcha prior to releasing a stable 0.11?

end

if role_name_request_res.status ~= 200 then
return nil, "[aws-lambda] Fetching role name from metadata service returned status code "..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style point: need a space between " and ..

end

if iam_security_token_request.status == 404 then
return nil, '[aws-lambda] Unable to request IAM credentials for role' .. iam_role_name ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style point: prefer double quotes " to single quote '

@@ -0,0 +1,88 @@
local http = require 'resty.http'
local json = require 'cjson'
local cache = require 'kong.tools.database_cache'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style point: prefer double quotes " to single quote ' for the entries (generally we prefer double quotes everywhere)

required = true,
},
use_ec2_iam_role = {
type="boolean",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: spacing around assignment operator

return {
access_key = iam_security_token_data.AccessKeyId,
secret_key = iam_security_token_data.SecretAccessKey,
session_token = iam_security_token_data.Token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor style comments:

  • align assignment operators
  • append a trailing comma to the end of array elements to minimize future diffs


local iam_role_name = role_name_request_res:read_body()

ngx.log(ngx.DEBUG, "[aws-lambda] IAM role '" .. iam_role_name ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here (and a few other places) we can use variadic arguments instead of string concatenation to build the message


local iam_security_token_request, err = client:request {
method = "GET",
path = "/latest/meta-data/iam/security-credentials/" .. iam_role_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: trailing comma here for consistency

AccessKeyId = "test_iam_access_key_id",
SecretAccessKey ="test_iam_secret_access_key",
Token = "test_session_token"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style: i think we're one space too far indented here?

@p0pr0ck5 p0pr0ck5 added pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. and removed pr/status/please review labels Aug 8, 2017
@Mgutjahr
Copy link
Author

Hi @p0pr0ck5,
I fixed the requested style issues. Hope it looks better now. Good call on the testing front, we should definitely add a test for a non-existing metadata service as this could be a possible error scenario.

On creating the test I realised that the metadata service call had no connection timeout configured, as this could eat up resources I added a 500ms timeout for the metadata service calls (this should be sufficient as the metadata service runs locally on EC2 instances. Requests usually take ~1ms).


local function fetch_iam_credentials_from_metadata_service(metadata_service_host, metadata_service_port)
local client = http.new()
client:set_timeout(500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dangerous to make such assumptions considering Kong might be running in a variety of infrastructures with different limitations or requirements out there. Such timeout variables should ideally be made configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure about the validity of this comment, given the very narrow infrastructure scope of the context at play here. Adding a tunable for this seems like unnecessary cognitive load.

Copy link
Member

@thibaultcha thibaultcha Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our "mantra" of being opinionated (in order to not oversight any potential use-case or to be compatible with as many infrastructures as possible), hints that we should provide customizable timeouts. In fact there shouldn't be a single timeout (to the best of my knowledge) that isn't configurable in Kong already, including this module itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is that inter-infrastructure compatibility is meaningless here. Is it worth sticking to a hard-line mantra in such a case? IMO it's not, but this isn't a hill for me to die on, just chiming in that this is a very specific use case :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your concerns, I tried to omit any unnecessary config properties to keep the plugin configuration relatively clean. Especially as the timeout would be another semi-optional config property (only applies if role based authentication is used). From my experience (using native AWS sdks on a variety of platforms) this is a property you would never need to touch. But maybe there are cases where you have too.

I checked other "official" SDK's from Amazon. Most set a default timeout which cannot be changed.

The timeouts from these SDK's are higher than the one set by me, but most of timeouts mentioned above apply for all services offered by the metadata service (and not just the credentials service).

I'd suggest that I increase the timeout to 5 seconds by default and expose it as a parameter to get_iam_credentials_from_instance_profile(). - This way we can tweak it for the test and exposing it to the plugins config afterwards would be trivial if needed.

If you want me to add it to the config I can do this of course too. Just let me know what you'd prefer.

}

if not role_name_request_res or role_name_request_res == "" then
ngx.log(ngx.ERR, "[aws-lambda] Could not fetch role name from metadata service: ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Some weird indentation here again)

},
aws_secret = {
type = "string",
required = true,
},
use_ec2_iam_role = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our policy is that no plugin config already in the database should be left in a previous version of the plugin's schema. This is because we want to avoid complicated code paths handling different versions of the same plugin's schema in the plugin business logic itself. This is a policy that has unfortunately not always been followed, and such oversight lead us to a few bugs/overhead maintenance issues recently.

So: new schema value, new migration for Cassandra/PostgreSQL :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thibaultcha,
are you referring to the use_ec2_iam_role config property introduced in this PR? - The aws_secret and aws_key were changed to semi-optional, so whatever config was in the database is still correct after the PR (it will continue to use secrets & keys from the db config if use_ec2_iam_role is not set / set to false). In other words, as long as use_ec2_iam_role is not set to true the old schema is still enforced.

Do you mean I should configure use_ec2_iam_role to a required property and set it to false by default through a migration script? I'm unsure of the benefits of this. Wouldn't it be better to remove the default=false flag here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed... I do agree with that reasoning. We kind of set this "unsaid" policy of enforcing migrations when introducing new values, but optional values would not benefit from it in any way, because they might be nil per definition already.

Alright then, this sounds good! Sorry for the noise!

@p0pr0ck5
Copy link
Contributor

@Mgutjahr can you rebase this commit to fix the conflict with the rockspec? After that I think we're good to go here.

@Mgutjahr
Copy link
Author

Mgutjahr commented Sep 10, 2017

Hi @p0pr0ck5 & @thibaultcha ,
sorry for my late reply. I rebased the branch to the current master branch. Note that I had to move the caching of IAM credentials to the handler due to the change of the caching in 0.11.

@Mgutjahr
Copy link
Author

Hi @thibaultcha, do you have any updates/comments on this PR?

@p0pr0ck5
Copy link
Contributor

p0pr0ck5 commented Dec 8, 2017

@Mgutjahr sorry for the delay on our end. Apologies, since we've had another release since the last change, can we update the rockspec here? I'd like to see this merged into 0.12. @thibaultcha thoughts?

@fyntic
Copy link

fyntic commented Feb 20, 2018

Hello @p0pr0ck5 , @thibaultcha and @Mgutjahr .

Sorry for disturb you, guys, but we really need this feature available in Kong for AWS.
When do you think it might be merged and released?

Thanks in advance.

@aloisbarreras
Copy link
Contributor

Is all that needs to happen just rebasing the rockspec? I'm happy to do that real fast if that will help this get merged in.

@vvondra
Copy link

vvondra commented Jun 4, 2018

is it ok if i re-open this fixed?

@farazs
Copy link

farazs commented Jul 16, 2018

@thibaultcha is all that's left for this to update the rockspec and resolve the handler.lua conflict? seems something trivial and we can merge this.

@Mgutjahr would you be able to do that or someone else can fork this and make the necessary updates. would be really nice to get this merged since it's so close and quite useful.

@Mgutjahr
Copy link
Author

@farazs I can try to rebase it this weekend. I want to test it at least once also on an actual EC2 on AWS with the current version of kong which is why it is a bit more work than just rebasing it.

@rkbasa rkbasa mentioned this pull request Jul 23, 2018
@tarciosaraiva
Copy link

Any chance of this getting reviewed soon?

Manuel Gutjahr added 3 commits October 21, 2018 13:17
Adds support for querying temporary credentials from EC2 metadata service
instead of fetching long-lived key-pairs from the database.
The credentials are fetched from the EC2 metadata service and are cached
for 60 seconds. I assume that many users are running kong on EC2 instances.
For these users this is the recommended way to authenticate requests for AWS
services as key-rotation and credential provisioning are managed by AWS.
For further details refer to the [AWS documentation about EC2 roles](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html)
@Mgutjahr Mgutjahr force-pushed the feat/aws-lambda-instance-profile-roles branch from 76184c0 to 70c116f Compare October 21, 2018 12:47
@Mgutjahr
Copy link
Author

Mgutjahr commented Oct 21, 2018

Hey, I rebased the PR again to the current master branch. The branch could be merged if you want to. There are just two things I'd like to point out:

  • The IAM role authentication currently only works if kong is running on an EC2 instance. It would not work if Kong is running on ECS, Fargate or EKS. We can add support for ECS and Fargate but I would not do this in the same PR. Inspiration on how to do this can be found in the various official AWS SDK's out there (e.g. Java or Node ) .. there is unfortunately no real documentation out there from Amazon (at least I could not find it)
  • This PR is also covered in PR Iam roles with proxy #3639. I'm ok if you close this PR and continue on Iam roles with proxy #3639, the PR mentioned has however additional features & functionality we might want to discuss in more detail. Especially proxying of the metadata service could be a bit risky from a security perspective (as you get IAM credentials through this service and proxying could increase the attack surface for a man in the middle attack)

@Tieske Tieske added pr/do not merge and removed pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. labels Mar 15, 2019
@CLAassistant
Copy link

CLAassistant commented Jul 1, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Manuel Gutjahr seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@guanlan guanlan closed this Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.