diff --git a/docs/faq.rst b/docs/faq.rst index fab51aa115..615bc63c1b 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -95,6 +95,30 @@ existing data, allowing for more comprehensive analysis and insights. It's essential to set up :ref:`MatchCode.io ` before executing this pipeline. +What input types are supported? +------------------------------- + +ScanCode.io supports **multiple input types** for your projects: + +- **File Upload**: Upload archives, source files, packages, or SBOMs directly. + See :ref:`inputs_file_upload`. + +- **Download URL**: Provide an HTTP/HTTPS URL to fetch remote files. + See :ref:`inputs_download_url`. + +- **Package URL (PURL)**: Reference packages from popular registries (npm, PyPI, + Maven, Cargo, NuGet, RubyGems, and more) using the PURL specification. + See :ref:`inputs_package_url`. + +- **Docker Reference**: Fetch Docker images directly from container registries + using the ``docker://`` syntax. + See :ref:`inputs_docker_reference`. + +- **Git Repository**: Clone a Git repository using its HTTPS URL. + See :ref:`inputs_git_repository`. + +For complete details on all input methods, refer to the :ref:`inputs` documentation. + What is the difference between scan_codebase and scan_single_package pipelines? ------------------------------------------------------------------------------- diff --git a/docs/index.rst b/docs/index.rst index 8a32c9e26c..d1ce2299c6 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -47,11 +47,12 @@ In this documentation, you’ll find: custom-pipelines scanpipe-pipes project-configuration - policies - data-models + inputs output-files command-line-interface rest-api + policies + data-models automation webhooks application-settings diff --git a/docs/inputs.rst b/docs/inputs.rst new file mode 100644 index 0000000000..cf16e9d3ed --- /dev/null +++ b/docs/inputs.rst @@ -0,0 +1,203 @@ +.. _inputs: + +Inputs +====== + +ScanCode.io supports multiple input types for projects, providing flexibility in how +you provide data for analysis. This section covers all supported input methods. + +.. _inputs_file_upload: + +File Upload +----------- + +You can **upload files directly** to a project through the Web UI or REST API. +Supported file types include archives (e.g., ``.tar``, ``.zip``, ``.tar.gz``), +individual source files, pre-built packages, and **SBOMs** (SPDX or CycloneDX in +JSON format). + +When uploading through the Web UI, navigate to your project and use the upload +interface in the "Inputs" panel. + +For REST API uploads, refer to the :ref:`rest_api` documentation for endpoint details. + +.. _inputs_download_url: + +Download URL +------------ + +Instead of uploading files directly, you can provide a **URL pointing to a remote file**. +ScanCode.io will fetch the file and add it to your project inputs. + +**HTTP and HTTPS URLs** are supported:: + + https://example.com/path/to/archive.tar.gz + +The fetcher handles HTTP redirects and extracts the filename from either the +``Content-Disposition`` header or the URL path. + +.. tip:: + For files behind authentication, see :ref:`inputs_authentication`. + +.. _inputs_package_url: + +Package URL (PURL) +------------------ + +ScanCode.io integrates with most package repositories using the +`Package URL (PURL) specification `_. + +A **PURL** is a URL string used to identify and locate a software package in a +mostly universal and uniform way across package managers and ecosystems. + +The **general PURL syntax** is:: + + pkg://@?# + +Cargo (Rust) +^^^^^^^^^^^^ + +Fetches packages from `crates.io `_:: + + pkg:cargo/rand@0.7.2 + +Resolves to: ``https://crates.io/api/v1/crates/rand/0.7.2/download`` + +RubyGems +^^^^^^^^ + +Fetches packages from `rubygems.org `_:: + + pkg:gem/bundler@2.3.23 + +Resolves to: ``https://rubygems.org/downloads/bundler-2.3.23.gem`` + +npm +^^^ + +Fetches packages from the `npm registry `_:: + + pkg:npm/is-npm@1.0.0 + +Resolves to: ``https://registry.npmjs.org/is-npm/-/is-npm-1.0.0.tgz`` + +Hackage (Haskell) +^^^^^^^^^^^^^^^^^ + +Fetches packages from `Hackage `_:: + + pkg:hackage/cli-extras@0.2.0.0 + +Resolves to: ``https://hackage.haskell.org/package/cli-extras-0.2.0.0/cli-extras-0.2.0.0.tar.gz`` + +NuGet (.NET) +^^^^^^^^^^^^ + +Fetches packages from `nuget.org `_:: + + pkg:nuget/System.Text.Json@6.0.6 + +Resolves to: ``https://www.nuget.org/api/v2/package/System.Text.Json/6.0.6`` + +GitHub +^^^^^^ + +Fetches release archives from `GitHub `_ repositories:: + + pkg:github/aboutcode-org/scancode-toolkit@3.1.1?version_prefix=v + +Resolves to: ``https://github.com/aboutcode-org/scancode-toolkit/archive/v3.1.1.tar.gz`` + +The ``version_prefix`` qualifier is used when the repository tags include a prefix +(commonly ``v``) before the version number. + +Bitbucket +^^^^^^^^^ + +Fetches archives from `Bitbucket `_ repositories:: + + pkg:bitbucket/robeden/trove@3.0.3 + +Resolves to: ``https://bitbucket.org/robeden/trove/get/3.0.3.tar.gz`` + +GitLab +^^^^^^ + +Fetches archives from `GitLab `_ repositories:: + + pkg:gitlab/tg1999/firebase@1a122122 + +Resolves to: ``https://gitlab.com/tg1999/firebase/-/archive/1a122122/firebase-1a122122.tar.gz`` + +Maven (Java) +^^^^^^^^^^^^ + +Fetches artifacts from Maven repositories. The default repository is Maven Central:: + + pkg:maven/org.apache.commons/commons-io@1.3.2 + +Resolves to: ``https://repo.maven.apache.org/maven2/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar`` + +You can specify an alternative repository using the ``repository_url`` qualifier:: + + pkg:maven/org.apache.commons/commons-io@1.3.2?repository_url=https://repo1.maven.org/maven2 + +You can also fetch POM files or source JARs using the ``type`` and ``classifier`` +qualifiers:: + + pkg:maven/org.apache.commons/commons-io@1.3.2?type=pom + pkg:maven/org.apache.commons/commons-math3@3.6.1?classifier=sources + +.. _inputs_docker_reference: + +Docker Reference +---------------- + +ScanCode.io can **fetch Docker images directly** from container registries using the +``docker://`` reference syntax. + +Examples:: + + docker://nginx:latest + docker://alpine:3.22.1 + docker://ghcr.io/perfai-inc/perfai-engine:main + docker://osadl/alpine-docker-base-image:v3.22-latest + +The Docker image fetcher uses `Skopeo `_ under +the hood. When fetching multi-platform images, ScanCode.io automatically selects the +first available platform. + +For private registries requiring authentication, see the following settings: + +- :ref:`SCANCODEIO_SKOPEO_CREDENTIALS ` +- :ref:`SCANCODEIO_SKOPEO_AUTHFILE_LOCATION ` + +.. _inputs_git_repository: + +Git Repository +-------------- + +You can provide a **Git repository URL** as project input. The repository will be cloned +(with only the latest commit history) at the start of pipeline execution. + +Example:: + + https://github.com/aboutcode-org/scancode.io.git + +.. note:: + SSH URLs (``git@github.com:...``) are not supported. Use HTTPS URLs instead. + +.. _inputs_authentication: + +Authentication +-------------- + +For files hosted on private servers or behind authentication, several settings are +available to configure credentials. See :ref:`scancodeio_settings_fetch_authentication` +for details on: + +- :ref:`Basic authentication ` +- :ref:`Digest authentication ` +- :ref:`HTTP request headers ` (e.g., for GitHub tokens) +- :ref:`.netrc file ` +- :ref:`Docker private registries `