Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threat model for EdgeX Secret Store #1

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
5 participants
@bnevis-i
Copy link
Owner

commented Apr 2, 2019

To review the formatted document, go here and click on "View file" for README.md

@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 3, 2019

Discussion question: Should system management agent be modified to pass bootstrapping secrets to services or should system management and security be entirely separate?

Answer; No. They should be separate functionality. Current design does not hand-off from a native executor to SMA in the same way that a Linux kernel hands off to init.

@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 3, 2019

Discussion question: Should secret management be unsupported when using Linux OS executor or should Linux OS executor require UID/GID separation of users?

Answer: Yes. The two supported executors in the reference implementation are docker and snap. Implementations based on standard processes will be missing important threat mitigations.

@tonyespy
Copy link

left a comment

Wow...

Pretty thorough assessment Bryon!

This is a first pass at a review, but I have to say didn't expect such a comprehensive and lengthy assessment. I also mention somewhere in my comments that at first what I thought this document was an assessment of the current state of security, but in fact this a full-blown detailed design and plan on how to mitigate the existing vulnerabilities over the course of multiple releases.

It's not clear to me how much of this is actually being proposed for Edinburgh vs. the recent proposal put forth by Jim & Ting-yu. Frankly there's enough work detailed here for a couple of a releases, and I think it will be a challenge to identify how much of this is actually do-able in the Fuji time-frame.

Finally, while some folks will absolutely want the most secure implementation possible, there will be deployments of EdgeX where the OS and/or specialized hardware provide a greater level of protection than the base-line assumptions you've made. There's no mention of any choice of how much is turned on by default, or mandatory vs. optional.

Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/known_vulnerabilities.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/remediation_plan.md
@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 10, 2019

It's not clear to me how much of this is actually being proposed for Edinburgh vs. the recent proposal put forth by Jim & Ting-yu.

My understanding is Ting-yu will finish up Edinburgh, and then Jim and I will start working on some version of this for Fuji.

Show resolved Hide resolved doc/threat-model/architecture_diagrams.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/background.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/threat_model.md
Show resolved Hide resolved doc/threat-model/remediation_plan.md
@jpwhitemn

This comment has been minimized.

Copy link

commented Apr 15, 2019

README comments
SMA - I am not in favor of the SMA being required. There are too many cases where something else (Kubernetes, Swarm, Docker Compose, a script, etc.) will orchestrate, deploy and other wise manage EdgeX. I understand the need stated in the READMe file, but couldn't (shouldn't) we do this via some other service if it is required?

The SMA executor could be something other than a docker-compose or Linux OS executor. Those are just provided by ref impl, but others can and probably will exist.

Security Roadmap
same comments with regard to SMA

Remediation Plan
What happens when the platform does not offer hardware secure storage (TPM) or offers TEE. I think we need an abstraction for TPM or TEE and even file based alternative.
I also don't know that it is EdgeX's job to maintain TPM/TEE type code.
Per phase 4, what is meant by manditory MAC for EdgeX services?

I echo a lot of the sentiments...
-A lot of great thought and input to this - thanks Bryon
-What about the performance impact of much of this?
-What about the impact to development of much of this? Requiring dependencies creates issues
-Choice is critical in EdgeX. Security implementations suggested may not apply in many use cases such as those in a factory and running on a closed and inaccessible network.
-Some services (especially device services) may function on hardware that is incapable of some of the prescribed security. Have we considered these options?
-How do we crawl walk run this to the finish. The phases layout the general plan, but if the phases are equal to releases, this could be a challenge

@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 15, 2019

@jpwhitemn

SMA - I am not in favor of the SMA being required.

OK. Won't require it then.

The SMA executor could be something other than a docker-compose or Linux OS executor. Those are just provided by ref impl, but others can and probably will exist.

The feedback, that I have confirmed in in systems-management WG, is that the two executors are docker and snap. Someone running a Linux OS executor would have to understand that the reference mitigations may be insufficient.

Remediation Plan

Here are my thoughts: How about if the vault-worker read an environment variable of an executable that outputs a passphrase that is used to protect the Vault master key? This passphrase could be locked in a TEE, a TPM, or other secure hardware.

what is meant by mandatory MAC for EdgeX services

I have since learned that these types of protections (mandatory access control) are implemented by the snap executor.

Security implementations suggested may not apply in many use cases such as those in a factory and running on a closed and inaccessible network.

The threat model should be aligned with the intended use case of EdgeX. If a closed network were the assumption, that would mean that the secret engine would have be be replaced by someone who intended to use EdgeX on a non-closed network.

In a similar vein, the sentiment expressed thus far is that a docker or snap executor should be assumed; someone implementing on less than that would need to implement equivalent protections on their own as part of their security design.

Some services (especially device services) may function on hardware that is incapable of some of the prescribed security. Have we considered these options?

Yes. The action plan focuses on a software-first implementation. However, I have not been clear that hardware is an option. Moreover, the sentiment that a hardware implementation should be out-of-tree. Correct me if I am wrong.

@anonymouse64

This comment has been minimized.

Copy link

commented Apr 15, 2019

@jpwhitemn some comments/questions on your response:

SMA - I am not in favor of the SMA being required. There are too many cases where something else (Kubernetes, Swarm, Docker Compose, a script, etc.) will orchestrate, deploy and other wise manage EdgeX. I understand the need stated in the READMe file, but couldn't (shouldn't) we do this via some other service if it is required?

If this is the case then it seems like the token-issuing service will need to somehow integrate into that orchestration service. I think that this requires close design work between the SMA and the token-issuing service proposed here, such that the following situations are covered:

  1. Both an external orchestration mechanism and the SMA are used (like in the snap or in docker)
  2. Only the SMA is used with something like the mysterious Linux OS executor, where really all the bootstrapping that happens is that on system startup the external parts of EdgeX such as vault, consul, etc. are started, then the SMA starts up and the SMA is responsible for everything.
  3. The SMA is not used at all, i.e. k8s or swarm

With the following capabilities handled correctly:

  • System bootup, services are started and get their tokens from the token-issuing service correctly
  • A service such as core-data dies and is restarted by the orchestration mechanism if it exists (i.e. k8s or systemd or some such) and then somehow gets a new token from the token-issuing service
  • A service is requested to be restarted either through the SMA or through the orchestration mechanism and gets a new token from the token-issuing service

The SMA executor could be something other than a docker-compose or Linux OS executor. Those are just provided by ref impl, but others can and probably will exist.

I keep hearing reference to this Linux OS executor? Is this actually being worked on or did you mean to say the snap executor?

Security Roadmap
same comments with regard to SMA

Remediation Plan
What happens when the platform does not offer hardware secure storage (TPM) or offers TEE. I think we need an abstraction for TPM or TEE and even file based alternative.

Before we get to defining an abstraction layer, we have to define what's in scope and out of scope here. I think that utilizing a TPM or TEE is very much in-scope and so an abstraction layer may be needed.

Also note that while I agree that in general the system needs to accommodate running without hardware security and without security in general as it does today, however, we shouldn't try to target running securely in situations where the deployment system doesn't support any of the features we could take advantage of, such as AppArmor, SELinux, kernel namespaces in the case of containers, etc. We should evaluate different software and hardware confinement mechanisms and choose a set of features that we can rely on that is wide enough to allow for many different deployment scenarios but not every deployment scenario. Trying to securely support every possible combination will be too much work, and as I pointed out above, we shouldn't ignore the work that has already been done to make this easy for us in the form of linux security modules, etc.

I also don't know that it is EdgeX's job to maintain TPM/TEE type code.

I think that we could try to be generic as possible with a reference implementation, utilizing something like https://github.com/Microsoft/openenclave in order to write the most generically useful code possible for a reference implementation or abstraction layer.
Side-note: I realize the openenclave project currently only supports Intel SGX and not ARM TrustZone, but it's on the roadmap and it's supposed to support many such TEE's eventually

-Some services (especially device services) may function on hardware that is incapable of some of the prescribed security. Have we considered these options?

Can you explain more or give an example? Why would a device service be incapable of working within the prescribed security?

-How do we crawl walk run this to the finish. The phases layout the general plan, but if the phases are equal to releases, this could be a challenge

To me at least, I think it's too early to be putting these steps into concrete plans. We still are discussing the scope of a secret store and before we can decide on steps to implement, we have to finalize the design and scope of it.

@anonymouse64

This comment has been minimized.

Copy link

commented Apr 15, 2019

@jpwhitemn

SMA - I am not in favor of the SMA being required.

OK. Won't require it then.

The SMA executor could be something other than a docker-compose or Linux OS executor. Those are just provided by ref impl, but others can and probably will exist.

The feedback, that I have confirmed in in systems-management WG, is that the two executors are docker and snap. Someone running a Linux OS executor would have to understand that the reference mitigations may be insufficient.

I can't say I agree with that until we hear more details on what exactly this Linux OS executor is... If the Linux OS executor is simply running processes without any software or hardware confinement then I agree that they need to implement their own mitigations or run EdgeX within their own security confinement systems.

Remediation Plan

Here are my thoughts: How about if the vault-worker read an environment variable of an executable that outputs a passphrase that is used to protect the Vault master key? This passphrase could be locked in a TEE, a TPM, or other secure hardware.

Well if we are designing around a TEE, wouldn't the TEE have the passphrase inside it and not share it outside the TEE? This could mean for example that vault-worker runs inside the TEE or that Vault runs inside the TEE and the token-issuing service or vault-worker talks to Vault inside the TEE.

what is meant by mandatory MAC for EdgeX services

I have since learned that these types of protections (mandatory access control) are implemented by the snap executor.

To be clear, the snap implements both MAC and LSM based software confinement. MAC is implemented such that only root user can access sensitive files in the snap's config (such as the TLS keys for kong/vault), and only root can write to particular configuration files in the snap's config.
Docker similarly implements some LSM based software confinement, but it is much more generic and not specific to EdgeX at all, because AFAICT the docker containers are always run with the default seccomp and apparmor profiles that docker uses for all containers.

Security implementations suggested may not apply in many use cases such as those in a factory and running on a closed and inaccessible network.

The threat model should be aligned with the intended use case of EdgeX. If a closed network were the assumption, that would mean that the secret engine would have be be replaced by someone who intended to use EdgeX on a non-closed network.

In a similar vein, the sentiment expressed thus far is that a docker or snap executor should be assumed; someone implementing on less than that would need to implement equivalent protections on their own as part of their security design.

I agree with this sentiment. I also think that the current proposal can still work within a closed network like a factory? I don't see how anywhere in this design we depend upon access to the open internet.

Some services (especially device services) may function on hardware that is incapable of some of the prescribed security. Have we considered these options?

Yes. The action plan focuses on a software-first implementation. However, I have not been clear that hardware is an option. Moreover, the sentiment that a hardware implementation should be out-of-tree. Correct me if I am wrong.

I think you have been clear that hardware backed security is an option, but some level of software security will be mandatory to fit into this design.

@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 15, 2019

Remediation Plan

Here are my thoughts: How about if the vault-worker read an environment variable of an executable that outputs a passphrase that is used to protect the Vault master key? This passphrase could be locked in a TEE, a TPM, or other secure hardware.

Well if we are designing around a TEE, wouldn't the TEE have the passphrase inside it and not share it outside the TEE? This could mean for example that vault-worker runs inside the TEE or that Vault runs inside the TEE and the token-issuing service or vault-worker talks to Vault inside the TEE.

There are several possible ways to accommodate a TEE:

  • Devise some IPC boundary such that some input can originate from a TEE. I think this is in scope because hooks can be in the reference code.
  • Run an entire service (process, container) in a TEE. I think this is good and relatively easy to do, but out of scope because it is not expressible as GO code, and hardware-dependent. I think the only requirement here is to not do anything that interferes with this model.
  • Run part of a service in a TEE by splitting it into trusted and untrusted parts. I think this is where openenclave might have a role when it supports more than one type of enclave and offers support for a simulated enclave (so that it could be used even if there is no hardware support at all).
@bnevis-i

This comment has been minimized.

Copy link
Owner Author

commented Apr 19, 2019

Second round of reviews is starting. See content for pull request 2

@bnevis-i bnevis-i closed this Apr 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.