Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bicep's public registry feature should include security features to protect enterprise environments #5803

Open
yobyot opened this issue Jan 29, 2022 · 16 comments
Labels
enhancement New feature or request story: registry

Comments

@yobyot
Copy link

yobyot commented Jan 29, 2022

Bicep's public registry feature should include security features to protect enterprise environments

Problem description

Bicep is planning to release a version containing functionality that implements a public registry. While this is both commonplace and useful, from a security standpoint it is far from an unalloyed good.

Public registries, in the wrong hands (not just black hats but also those of amateur and non-security-oriented developers), can present a real threat to enterprise environments. Consider the experience of Alex Birsan who found it astonishingly easy to hack Apple and others, including Microsoft. His experience lead him to coin the term "dependency confusion" which describes the fundamental challenges of public registries.

In Bicep's case, the prospect of a supply chain attack is extraordinarily frightening. Because Bicep can deploy anything ARM can request of a back-end Azure provider, it's easy to imagine some disturbing attacks and not just of the supply chain (dependency confusion) variety. Some examples include:

  • Deploying a malware-infected VM image into an enterprise network.
  • Poisoning a web app and/or the owning web app service.
  • Deploying an Azure Data Factory pipeline to exfiltrate data.
  • Injecting a malformed Apache Spark or Python notebook into Databricks.
  • Removing or reducing defenses like firewall exclusions in network security groups to whitelist command and control servers.
  • Opening a storage account to "all networks" and/or turning off TLS-only connections.

(I just made these up as I write this -- making the case this is scarier than one might at first think. I'm not half or even a quarter as clever as the bad guys so just think what they could come up with.)

What should be done?

I'll leave it to the community and the Microsoft team to debate both the merits of this issue (though I think you have to believe in "alternate facts" to think this isn't a pressing issue) as well as the potential solution.

However, I hope any solution would offer at least some of the following capabilities:

  • Some way to audit deployments for inclusion of modules included from any registry (public or private). Maybe Bicep could insert automatic output variables with info about included modules. Then Azure Policy could be written to make sure they conform to standards.
  • A way to turn off, at a subscription level, the ability to include modules not on the file system of a local device. (This might make Cloud Shell less attractive to deploy from -- but who does that in production anyway?)
  • A more rigorous semantic versioning scheme, auditable by Azure Policy. No more "this release or later" open-doors just inviting some black hat in.
  • A "cop" for the public registry (ARIN for Bicep). I don't know who could or would do this. But a moderated registry might be feasible in some shape or form, though it would be a hard sell in my clients (and I'd predict generally in healthcare and finance).
  • A version of Bicep that doesn't support registries at all, be they public or private. I'll bet the dev team and the community will hate this idea but it's my personal favorite. In an large enterprise environment, dev workstations and their dev toolsets can be controlled centrally. This would be an ideal way to eliminate the problem altogether. Two versions isn't as crazy as it sounds; this is done all the time in products that have an freeware/commercial go-to-market.
@yobyot yobyot added the enhancement New feature or request label Jan 29, 2022
@ghost ghost added the Needs: Triage 🔍 label Jan 29, 2022
@alex-frankel
Copy link
Collaborator

Some notes:

  • At least in principle, the ask makes sense, though doing a quick search for other programming languages, I'm not seeing features like this built into other languages. It looks like a lot of organizations will set up a package approval process of some kind to manage this and then as part of CI automation, check to ensure only "allowed" packages are being used. @yobyot -- how have you seen this problem solved for other languages besides Bicep? I know that for us, all of source code is scanned for external dependencies and flagged if we attempt to use something that is not explicitly allowed. I don't think this uses any particular feature of the language itself. @majastrz / @anthony-c-martin / @shenglol -- not sure if you have more context on this one.
  • Regarding semantic versioning, we already do not allow the concept of "latest" for external registries. Is there something missing in our current implementation? In the public registry, we will not allow any code updates of a current module version. Any change to a module will result in a version bump.
  • Any implementation would need to be done client side. By the time it is sent to ARM, the module will have already been packaged into the transpiled ARM template. Even if we added metadata to identify in the template that an external module had been used, I would suspect it would be easy enough to remove/alter this metadata to get around the restriction.

One solution could look like the following:

  • Create a new linter rule like "list of allowed modules and/or allowed registries" and set this rule to be an error instead of a warning in bicepconfig. This would make authors aware at authoring time that they are attempting to use a disallowed module reference.
  • CI at the organization ensures that this linter rule is enabled with a specific list such that it is impossible to check in code without this rule being evaluated.
  • Organization runs an approval process to onboard new registries or modules which would change the prescribed bicepconfig.

A workflow like that should be able to be used to be completely restrictive (no external references) or partially restrictive. Thoughts?

@shenglol
Copy link
Contributor

shenglol commented Feb 2, 2022

Package managers like NuGet and npm allow users to specify package registries to use in config files. Although Bicep is a programming language, the package manager is built into the language server and CLI, so IMHO I feel like it makes sense to add support for whitelisting or blacklisting module registries. We would have to create a dedicated configuration section for this instead of adding a new linter rule though, since the Bicep linter is independent from module restoration (modules being blocked will be downloaded anyway regardless of linting errors).

@yobyot
Copy link
Author

yobyot commented Feb 2, 2022

CI at the organization ensures that this linter rule is enabled with a specific list such that it is impossible to check in code without this rule being evaluated.

I think this still devolves too much control to the individual developer's workstation. A developer could still attempt to use a non-approved registry and, on receiving the "error," try to work around it.

If I may, you may be overthinking the requirement a bit, @alex-frankel. While I understand you have to consider all the possible implementations of Bicep across many different enterprises users with very large, dispersed development groups will want the ability to "just shut it off." IOW, a more blunt-force, absolute and no-nuance way to control this. Managing lists and encouraging users to workaround something that's still there -- only generating errors -- isn't as complete a solution. A binary "on" or "off" is better, IMHO.

@alex-frankel
Copy link
Collaborator

IOW, a more blunt-force, absolute and no-nuance way to control this.

What about a property in bicepconfig.json that accepts a boolean such as allowModules, which would accept true or false? It would be on the org to have a CI test which would enforce that a bicepconfig.json exists and has this this one property (set to false). Would that work better for you @yobyot?

@wsmelton
Copy link

wsmelton commented Feb 3, 2022

I think it is unrealistic to put the whole ownership of security on Microsoft and Bicep itself. No other cloud provider has these type of controls for their deployment mechanisms (GCP or AWS). Minor things could be accomplished with config options, but in the end it is a tool that has to fully support performing a deployment as the user wants. Business rules and security controls have to done elsewhere which is why Azure offers RBAC controls on resources and lock mechanism to prevent unwanted deployments that have not been checked.

You are not going to be able to fully control what a dev can do from their workstation (or Azure Cloud Shell) except through the permissions of their account to your tenants. If you want full control over developers, don't give them rights to deploy anything from their work accounts. Put locks on all of your resources in Azure...there are ton of other things that can done here and I think is where the responsibility should be.

@yobyot
Copy link
Author

yobyot commented Feb 3, 2022

What about a property in bicepconfig.json...

I don't think this will cut the mustard, @alex-frankel. Consider this scenario:

  • An enterprise is the victim of an on-premises Active Directory attack that yields an identity synced via AD Connect that has Contributor (or worse, Owner) access to an Azure subscription.
  • The attacker doesn't bother with CI, turns off the setting and...
  • Using Azure PowerShell and/or the CLI and those purloined credentials, the attacker launches a VM in a Vnet with internal and external connectivity.
  • Game over. That VM can load whatever it wants and pivot to any attack it chooses.

That's all because Bicep views its role in an enterprise narrowly and/or places too heavy a requirement on the enterpise to monitor deployment pipelines.

I know it's not what you and your colleagues want to hear. But the best solution for Bicep is to make it very hard to use any external libraries by default.

@yobyot
Copy link
Author

yobyot commented Feb 3, 2022

No other cloud provider has these type of controls for their deployment mechanisms...

Thanks for joining the conversation, @wsmelton.

Sorry, I respectfully disagree that the failure of other cloud systems to address their configuration language security vulnerabilities means that Microsoft isn't obligated to do better.

@alex-frankel
Copy link
Collaborator

alex-frankel commented Feb 3, 2022

If they have compromised an organization so deeply that they can bypass the CI system and have owner permissions on a subscription, then they can deploy malicious resources through any number of channels and tools. They could cut bicep entirely out of the picture at that point, no?

I think the most actionable next step is to look at some prior art. Can you share some examples of good systems you have seen deployed for managing this problem in other programming languages? How is this handled in the C# and PowerShell ecosystems?

@yobyot
Copy link
Author

yobyot commented Feb 3, 2022

How is this handled in the C# and PowerShell ecosystems?

It's not -- and that's the problem. Just in the last few weeks, we've seen reports of government-linked hackers pivoting from the Log4j debacle to PowerShell to gain/maintain a foothold.

Can you share some examples of good systems you have seen deployed for managing this problem in other programming languages?

I'll leave this up to you and the team. Consider it a chance to innovate. :-) All I know is that you guys should spend the time and effort to get ahead of this and think about it fundamentally, not as a tack-on or accommodation.

@wsmelton
Copy link

wsmelton commented Feb 3, 2022

How is this handled in the C# and PowerShell ecosystems?

PowerShell ecosystems provides extremely detailed logging (more than any other language) and configuration options that the Enterprise would use to monitor how user accounts are utilizing PowerShell. The responsibility is placed fully on the Enterprise to implement. Lee Holme's has presented: Defending against PowerShell attacks - in theory, and in practice by Lee holmes.

PowerShell Gallery scans (PSScriptAnalyzer is used for some of this scanning) are performed when modules get published and if those scans find critical issues the owner is notified. If the owner never fixes it the module will be removed, or unpublished. (PowerShell Team can share further details as I'm only aware of a few details).

@wsmelton
Copy link

wsmelton commented Feb 3, 2022

I know it's not what you and your colleagues want to hear. But the best solution for Bicep is to make it very hard to use any external libraries by default.

Terraform does not prevent the use of external providers, and in fact requires it for those deploying to AWS and Azure (which not all of the Azure providers are owned by MS either). That is the biggest feature it offers in extensibility. I know a number of highly secure-minded companies (military/gov't branches, banks, hospitals, etc.) that utilize this tool for deployments. They have no issues meeting their compliance and security standards.

Bicep is just a DSL, ARM is the controlling arm of what gets deployed in Azure and does not even know it came from a Bicep file (JSON is generated before it even sends it to ARM API). I am finding it difficult to understand why the belief that Bicep needs to suddenly be the security guard for all of this.

What industry are you speaking of that needs the level of security you are describing? That may help in finding the right path to discuss within MS.

@alex-frankel
Copy link
Collaborator

I tend to agree that this is beyond the scope of Bicep to solve. Dependency management is very large problem to solve and we'd be happy to snap to industry-accepted solutions, but don't feel comfortable charting new territory here since Bicep is such a small project.

I think it makes sense to add some easy on/off switches for allowing external registries (or a specific set) and provide a way to test that the policy is being enforced, but beyond that I am not sure there is much more we can do here.

@BernieWhite
Copy link

In terms of PowerShell and .NET there is code signing. While is it up to the organization to configure the specifics some restrictions can be in place.

Some longer term options might be:

  • Provide a code signing option for the OCI artifacts/ JSON document.
  • Emit metadata for a SBOM type validation. i.e. deployment builds on these modules hashes.

Currently ARM/ Policy has no specific way to validate these, but maybe that is possible in the future or can be added to the tool chain as an easy step such as a bicep verify.

@alex-frankel
Copy link
Collaborator

Thes are great points and suggestions @BernieWhite! Thinking about this problem from the perspective of supply chain security is probably the most promising approach. @SteveLasker I know supply chain is top of mind for container registries. Are there any best practices for securing the supply chain of OCI artifacts?

@SteveLasker
Copy link

Hi folks,
Thanks @yobyot for the great summary, and @alex-frankel for the ping.

The asks from auditing to signing is right up the secure supply chain work we've been doing here at Microsoft with the SCITT project and contributing to the broad community through ORAS Artifacts and Notary v2

Your private registry isn't just crazy, it's actually a best practice to own the content you depende on. See Consuming Public Content for more detials. @alex-frankel references this as well here

Versioning is also greatness. Here are some opinions on how to version public content in registries.

For the importing/approval process, ACR is building a gated/import cache feature, that aligns with the consuming public content post above. Ralph is driving that feature and can provide more insight.

@shenglol is right-on with the configuration for the registry endpoint. This is a huge gap with container registries. Here are some early thoughts around this: (Is It Time to Change How We Reference Container Images?) (Enabling Artifact CLIs to Reference Environment Specific Registries Through Configuration)

With great power comes responsibility, and trust

@wsmelton and @yobyot, it looks like you're kinda hitting at the core. Only allow users to do as much as required (least privledge requirements), while also assuring you only consume content from entities you trust (signed and verified). And, I'm excited to see the team push forward with better security, and not be in the taillights of what others do. Security is about constantly moving forward, as the bad-folks keep catching up. Terraform is a good model to consider, as they have a great model that we see a LOT of use.

@BernieWhite seems to have teased in the ORAS Artifacts and Notary v2 work.

Notary v2 is on the cusp of releasing. We will have support across Azure and AWS, with others hopefully joining soon. The Zot project has also added support for Notary v2 through ORAS Artifact support.

Here's a script for how you can sign any artifact (bicep, sbom, container image, kids photos) and push the signature to a registry.

@BernieWhite
Copy link

BernieWhite commented May 29, 2024

Without having to add the whole set of features within the Bicep CLI alone. Some thoughts on how we could make this easier would be:

  • Additional OCI annotations that can be passed via bicep publish arguments in addition to existing documentation URI.
    • org.opencontainers.image.source to point to the repository where the source bicep is from.
    • org.opencontainers.image.revision to point to the git commit hash in the repository.
    • org.opencontainers.image.vendor to allow the distributing entity to be specified.
    • Would be great to also get a hover and click though on source on these when in VSCode.
  • Emit the digest of the module artifact once it is published in a form easy to parse (like JSON). Since signing and attaching artifacts based on digest is the recommended practise, instead of via the tag.
  • A dependency chain of each module with nested modules (digest, source, vendor).
    • i.e. In the example case, a module references two other modules, one from say the Azure Bicep registry and the other from a relative path within the repository.
    • It's not clear from the final module which is built JSON where the modules came from in the current case, but they are originating from two different parties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request story: registry
Projects
None yet
Development

No branches or pull requests

6 participants