Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please cache results #15

Open
cpaelzer opened this issue Feb 10, 2020 · 3 comments
Open

Please cache results #15

cpaelzer opened this issue Feb 10, 2020 · 3 comments

Comments

@cpaelzer
Copy link

Hi,
I was reviewing the general concepts of EIC a while ago and wanted to now file this report for discussion here. Because one thing I started to wonder was about was what would happen if you have e.g. remotely driven automation that might have hundreds of ssh calls per second.

Obviously one could say "push a script to the system and execute that" but many automation solutions just don't work that way. In that context EIC will work like an amplifier which means every of those ssh logins will trigger a multitude of curl calls each adding latency and overhead.

I was wondering if it would seem reasonable to you to rate-limit this.
You could use timestamps and only re-check everything once every x seconds.

The first login won't find a timestamp and has to work it out, but every later login for some time doesn't need to do the same work over and over again.
That could help scalability and drop overhead a lot at almost no loss IMHO.

I have not found a "I already got my Auth-data, fast-path-skip" in the code - if there is one that I missed please just let me know and consider this almost resolved :-)

@LordAlfredo
Copy link
Contributor

Thank you for the request. I will bring this up with our product management, but I would not get my hopes up. There are two angles to consider why not: security and product goals.

I will avoid getting too deep into the system threat model, but as part of key verification at some point the active key list must be processed. By doing all of this within the scope of the ssh daemon's memory, there is practically nothing for malicious software to manipulate - it would have to crack the daemon process memory to add an undesired key, which would mean your system would have to already be totally compromised. On the other hand, a cache introduces a new potential attack surface.

As for product goals, the main focus of EC2 Instance Connect is

  • Simplify the process of managing ssh key handling for instances for end customers
  • Enable a ssh "pseudo-session" to the instance using IAM credentials
  • Enable scoping of ssh keys to a singular session
    I won't get into the full details of why a key is desired to be "single session" here, but there are a number of benefits from a security and auditing standpoint.

In an absolutely perfect world, we would not be doing the key timestamp piece that you've noted. Instead, a key would be trusted by the ssh daemon once and then never again (unless it was published through EIC a second time). The problem is, it turns out doing this is incredibly complex - if you check the instance's auth logs, you can even see that the ssh daemon pulls the set of available ssh keys multiple times. It's much more nuanced than just "trust this specific request ID once" and would either require a full-featured sibling daemon for sshd to hook into or would require deep changes to sshd itself. The 60 second expiration is an approximation for single-session scoping without needing to make these deeper, riskier changes to the ssh daemon (60 seconds in particular was chosen as sufficient time for all parts of the ssh handshake to complete in all testing).

@cpaelzer
Copy link
Author

Thanks for the Answer @LordAlfredo - I can see the Threat Model POV here. Maybe it can be a long term goal implemented inside the ssh daemon itself (or plugin, or sibling daemon, or maybe even a pam module or something like it) which could grant the benefits of reduced overhead/latency while at the same time not adding the same additional attach surface that an on-disk cache of any kind would do.

@raharper
Copy link

What about making use of the kernel keyring to store the session data and timestamps needed to implement a cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants