Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41893][BUILD] Publish SBOM artifacts #39401

Closed
wants to merge 3 commits into from
Closed

[SPARK-41893][BUILD] Publish SBOM artifacts #39401

wants to merge 3 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 5, 2023

What changes were proposed in this pull request?

This PR aims to publish SBOM artifacts.

Why are the changes needed?

Here is an article to give some context.

Software Bill of Materials (SBOM) are additional artifacts containing the aggregate of all direct and transitive dependencies of a project. The US Government (based on NIST recommendations) currently accepts only the three most popular SBOM standards as valid, namely: CycloneDX, Software Identification (SWID) tag, Software Package Data Exchange® (SPDX).

This PR uses CycloneDX maven plugin, a lightweight software bill of materials (SBOM) standard designed for use in application security contexts and supply chain component analysis.

For example, spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml and spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.json files are attached to spark-tags_2.12-3.4.0-SNAPSHOT.jar.

$ ls -al ~/.m2/repository/org/apache/spark/spark-tags_2.12/3.4.0-SNAPSHOT
total 2488
drwxr-xr-x  12 dongjoon  staff      384 Jan  4 23:36 .
drwxr-xr-x   4 dongjoon  staff      128 Jan  4 23:36 ..
-rw-r--r--   1 dongjoon  staff      492 Jan  4 23:36 _remote.repositories
-rw-r--r--   1 dongjoon  staff     1955 Jan  4 23:36 maven-metadata-local.xml
-rw-r--r--   1 dongjoon  staff    16310 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.json
-rw-r--r--   1 dongjoon  staff    14045 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
-rw-r--r--   1 dongjoon  staff  1162027 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-javadoc.jar
-rw-r--r--   1 dongjoon  staff    16272 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-sources.jar
-rw-r--r--   1 dongjoon  staff    12453 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-test-sources.jar
-rw-r--r--   1 dongjoon  staff    10387 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-tests.jar
-rw-r--r--   1 dongjoon  staff    15181 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT.jar
-rw-r--r--   1 dongjoon  staff     5822 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT.pom

Does this PR introduce any user-facing change?

Yes, but dev-only changes.

How was this patch tested?

Manually test.

$ mvn install -DskipTests
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 10.501 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 12.900 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 24.315 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 25.406 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 36.217 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 31.532 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 33.338 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 19.204 s]
[INFO] Spark Project Core ................................. SUCCESS [05:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:20 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:41 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:36 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [06:44 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:10 min]
[INFO] Spark Project ML Library ........................... SUCCESS [05:48 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.132 s]
[INFO] Spark Project Hive ................................. SUCCESS [02:49 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 50.149 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  6.706 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 44.131 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:08 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [01:45 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:19 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 11.574 s]
[INFO] Spark Avro ......................................... SUCCESS [01:33 min]
[INFO] Spark Project Connect Common ....................... SUCCESS [ 48.653 s]
[INFO] Spark Project Connect Server ....................... SUCCESS [01:28 min]
[INFO] Spark Project Connect Client ....................... SUCCESS [ 19.989 s]
[INFO] Spark Protobuf ..................................... SUCCESS [01:24 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  49:49 min
[INFO] Finished at: 2023-01-05T02:06:51-08:00
[INFO] ------------------------------------------------------------------------

$ tree ~/.m2/repository/org/apache/spark | grep cyclonedx.xml
│   │   ├── spark-avro_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-catalyst_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect-client-jvm_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect-common_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-graphx_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-hive_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-kvstore_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-launcher_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-mllib-local_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-mllib_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-network-common_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-network-shuffle_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-parent_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-protobuf_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-repl_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sketch_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sql-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sql_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming-kafka-0-10-assembly_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-token-provider-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
    │   ├── spark-unsafe_2.12-3.4.0-SNAPSHOT-cyclonedx.xml

@github-actions github-actions bot added the BUILD label Jan 5, 2023
@dongjoon-hyun
Copy link
Member Author

cc @srowen and @HyukjinKwon

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 5, 2023 07:44
@dongjoon-hyun
Copy link
Member Author

Ah, it seems that I missed some failures. I convert this as Draft. Let me dig this.

[WARNING] An unexpected issue occurred attempting to resolve the effective pom for  org.xerial.snappy:snappy-java:1.1.8.4
org.apache.maven.project.ProjectBuildingException: Some problems were encountered while processing the POMs:
[ERROR] Unknown packaging: bundle @ line 6, column 16

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 5, 2023 09:51
@dongjoon-hyun
Copy link
Member Author

The PR is ready for review now. Could you review when you have some time?
@srowen , @HyukjinKwon , @cloud-fan , @viirya , @sunchao , @huaxingao

@srowen
Copy link
Member

srowen commented Jan 5, 2023

Seems fine to me. I'm not sure if the maven release plugin will also push this to Maven Central, but maybe that's not essential. Do the files look plausible, like they appear to contain the transitive dependencies and more or less match what's in the "deps" files in the repo?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 5, 2023

Thank you, @srowen .

  1. It goes to Maven Central. I did this at the Apache ORC project first via ORC-1342. Here is the published SBOM on the snapshot versions after merging ORC-1342. Apache Spark snapshot also will have this. So, we can verify this before releasing.
**1.9.0**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.9.0-SNAPSHOT/orc-core-1.9.0-20230105.074036-185-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.9.0-SNAPSHOT/orc-core-1.9.0-20230105.074036-185-cyclonedx.json

**1.8.2**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.8.2-SNAPSHOT/orc-core-1.8.2-20230105.074040-16-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.8.2-SNAPSHOT/orc-core-1.8.2-20230105.074040-16-cyclonedx.json

**1.7.8**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.7.8-SNAPSHOT/orc-core-1.7.8-20230105.074050-2-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.7.8-SNAPSHOT/orc-core-1.7.8-20230105.074050-2-cyclonedx.json
  1. While deps files are only for Spark binary distributions, this is a jar-level manifest. For example, spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.xml shows its dependency only without Kubernetes dependencies.
$ cat ~/.m2/repository/org/apache/spark/spark-core_2.12/3.4.0-SNAPSHOT/spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.json  | jq .components | grep \"name\" | head -n5
    "name": "avro",
    "name": "jackson-core",
    "name": "commons-compress",
    "name": "avro-mapred",
    "name": "avro-ipc",

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it basically generates aggregate of dependencies as xml and json files and attaches into jar files, right?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the context, looks good to me.

@viirya
Copy link
Member

viirya commented Jan 5, 2023

Looks good but maybe wait for a while for others to chime in if they have some opinions.

@dongjoon-hyun
Copy link
Member Author

Yes, right. Thank you, @viirya .

So it basically generates aggregate of dependencies as xml and json files and attaches into jar files, right?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me too.

@dongjoon-hyun
Copy link
Member Author

Thank you, @sunchao

@dongjoon-hyun
Copy link
Member Author

Thank you all. Let me merge this.

@dongjoon-hyun
Copy link
Member Author

We can see the published SBOM tomorrow after tomorrow's snapshot publishing.

@dongjoon-hyun
Copy link
Member Author

@viirya
Copy link
Member

viirya commented Jan 7, 2023

Thanks @dongjoon-hyun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants