AWS CodeArtifact is a fully managed artifact repository service that makes it easy for organizations to securely store and share software packages used for application development. On [launch date] we introduced a feature called “Package Origin Control” which allows customers to protect themselves against “dependency substitution“ or “dependency confusion" attacks.
While this feature protects new packages by default, packages which lived in CodeArtifact repositories prior to the feature release are not protected without explicit configuration.
The purpose of this toolkit is to provide repository administrators with an easy way to set Origin Control policies in bulk on packages that have not received the default protection because they pre-date feature release. This can be achieved by blocking upstream versions for internal packages. The toolkit also supports blocking publishing package versions to avoid the creation of a potentially vulnerable mixed state for external packages as well.
More information can be found on the origin control feature documentation as well as in the blog post announcing the availability of this toolkit.
The toolkit is comprised of two scripts: a first one called generate_package_configurations.py
for creating a manifest
file listing the packages in a domain alongside their proposed origin configuration to apply, and a second one named
apply_package_configurations.py
that reads the manifest file and applies the configuration within.
generate_package_configurations.py
can operate on a whole repository, or on a subset of packages
(specified either via filters, or though a list) and supports two origin control resolution modes:
- Manual: Supply the origin configuration yourself via a manifest file. This is an appropriate option if you already maintain a list of internal packages, or if they are published in a consistent internal namespace which allows for them to be easily selected.
- Automated: Identifies which packages should have their upstreams blocked by analyzing the upstream repository graph and external connections, looking for evidence that package versions are only available from the repository at hand- in which case it determines it can disable sourcing of upstream versions can be done without risk of breaking builds. This is a good option if you want a quick way to tighten your security posture without having to manually analyze your whole repository.
apply_package_configurations.py
takes the manifest file generated by generate_package_configurations.py
as in input,
and applies the origin control changes by calling the new PutPackageOriginConfiguration
API.
Precisely because it is meant to set these values in bulk, this script supports backup and revert operations by default, as well as dry-run and step-by-step confirmation options. If you identify an issue after applying origin control changes, you will be able to safely revert to the original, working configuration before trying again. See the Backup and restore section for details.
The toolkit only depends on the boto3
and tqdm
packages. In order to install, simply run:
pip install -r requirements.txt
The toolkit uses the same configuration as the AWS CLI to run. This means that you can either set one the following two environment variable sets:
AWS_PROFILE
or
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
Alternatively, you can use the --profile
flag to indicate what specific AWS CLI profile you want to use. Please note
that this flag is only used for authentication purposes and thus even if you have specified a region
parameter in your
AWS CLI profile, you will still be required to pass in a --profile
flag to the script.
Additionally, the account you are using to authenticate must have at least repository-level read permissions to run the first stage in manual mode, and read permissions on all repositories upstream of the target repository in auto mode. Stage 2 requires repository read permissions if the backup feature is enabled (default) as well as write permissions to execute unless you want to use dry-run mode(see the "More Options" section below).
The toolkit works on a per-repository basis. It is structured in two stages: in the first one a manifest is produced consisting of a CSV listing all the packages in the target repository, alongside their desired origin control configuration. The second stage is responsible for taking the generated CSV and setting the desired origin control configuration on every package listed within.
The first stage is invoked through generate_package_configurations.py
. It requires values for
domain
, repository
as well as region
to be supplied.
Origin configuration is always supplied as as string like
publish=[value],upstream=[value]
where [value]
can be either ALLOW
or BLOCK
. So by default all existing packages will have
publish=ALLOW,upstream=ALLOW
In order to tighten security for an internally-published package, you would want to disable upstream versions like
publish=ALLOW,upstream=BLOCK
Conversely, if you wanted to prevent users from publishing new versions to a package, you would set:
publish=BLOCK,upstream=ALLOW
These settings are always supplied as a tuple and should be thought of as working in concert.
Once the repository, domain, and region values have been supplied, you must select which packages to generate origin control configurations for. It is possible to select either all packages within the repository, or a subset.
In order to select a subset of packages available in the repository, two options are available: either by supplying a list of package names or through a query. Please note that multiple namespaces and package formats aren't supported at once and you will have to repeat this operation explicitly for each one.
Working with a supplied packages list is as easy as specifying the input file name, which should have one name per line.
For example, if you wanted to BLOCK
upstreams for some internal npm
packages as listed in an inputfile.log
file:
internal-package-1
internal-package-2
internal-package-3
You would call the first stage script:
python generate_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--from-list inputfile.csv
--format npm
Alternatively, you can select the packages in question:
python generate_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--format maven
--namespace example-namespace
Once you selected a package set, you have two ways of bulk-setting the origin configuration for each package in it:
the simplest is by explicitly setting the policy via the --set-restrictions
flag, which we refer to as "manual" mode.
For example
python generate_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--format pypi
--prefix some-prefix
--set-restrictions upstream=ALLOW,publish=BLOCK
Otherwise, you can use "automatic" mode simply by omitting the above flag. This mode is meant for administrators who want the most hassle-free experience: the toolkit will try to identify packages which can have their upstreams blocked safely, and will otherwise fall back on allowing upstreams.
The heuristic will block acquisition of new versions from upstreams if and only if the target repository doesn't have direct access to an external connection, and no versions of the package are available via any of the upstreams, either because the target repository doesn't have any upstreams or because none of the upstreams have the package. Therefore, we assume there isn’t an immediate external connection attached to the repository for the package format(s) you are trying to run this script against.
In order to generate the list of new origin control configuration for the same subset of packages as in the
previous example, simply omit the --set-restrictions
flag and run:
python generate_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--format pypi
--prefix some-prefix
By default the script will save its results to a file called origin_configuration.csv
. You can use the --output
flag
to change this to a path of your liking.
Once a well-formed CSV has been produced, it can be fed to the second stage, apply_package_configurations.py
.
The same parameters as before (region
, domain
, repository
) must be provided even though they are also present
in the CSV columns. This is to ensure there is no ambiguity and to confirm you are operating on the right repository.
Invoking the second stage on origin_configuration.csv
therefore looks like this:
python apply_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--input origin_configuration.csv
--validate-only
: Verifies that the CSV is well-formed--dry-run
: Doesn't actually call the API, but shows what the script would do.--trace
: Enables a more verbose mode.--list-failed
: In case of failure, lists packages that have failed to update the origin control configuration.--retry-failed
: Tries again to set the origin control configuration for packages that have failed to do so.--ask-confirmation
: Requires step-by-step confirmation for all write actions.--num-workers
: Controls the number of parallel workers making calls to CodeArtifact (default: 4)
By default, before changing any origin control configurations the script will back up the existing configuration for
every package it touches (this behavior can be disabled with the --no-backup
flag). Should you want to revert to the
previous configuration, you can simply use the --revert
flag on the same input file.
python apply_package_configurations.py
--region us-west-2
--domain test-domain
--repository test-repository
--input origin_configuration.csv
--restore
This software is released under the Apache 2.0 license.