Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package URL #227

Open
aamedina opened this issue Apr 15, 2024 · 3 comments
Open

Package URL #227

aamedina opened this issue Apr 15, 2024 · 3 comments
Assignees
Milestone

Comments

@aamedina
Copy link
Contributor

aamedina commented Apr 15, 2024

d3f:PackageURL

I'm not exactly sure how to best model this yet, but I think it's important to have a way to represent package URLs and be able to make queries based on the type of the package (Maven, NPM, etc.) and its data components (namespace, name, version, qualifiers, subpath).

Before I open a PR for this I wanted to get feedback on the approach I'm taking and if this would be useful to others. I plan on using these properties to annotate software composition analysis with D3FEND and would like to use PURLs to uniquely identify software packages across databases.

Definition

@prefix d3f: <http://d3fend.mitre.org/ontologies/d3fend.owl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

d3f:PackageURL a owl:Class ;
    rdfs:label "Package URL" ;
    rdfs:subClassOf d3f:Identifier, 
        [ a owl:Restriction ;
          owl:onProperty d3f:identifies ;
          owl:someValuesFrom d3f:SoftwarePackage
        ],
        [ a owl:Restriction ;
          owl:onProperty d3f:package-type ;
          owl:someValuesFrom d3f:PackageURLType
        ] ;
        [ a owl:Restriction ;
          owl:onProperty d3f:package-namespace ;
          owl:someValuesFrom xsd:string
        ],
        [ a owl:Restriction ;
          owl:onProperty d3f:package-name ;
          owl:someValuesFrom xsd:string
        ],
        [ a owl:Restriction ;
          owl:onProperty d3f:package-version ;
          owl:someValuesFrom xsd:string
        ],
        [ a owl:Restriction ;
          owl:onProperty d3f:package-qualifiers ;
          owl:someValuesFrom xsd:string
        ],
        [ a owl:Restriction ;
          owl:onProperty d3f:package-subpath ;
          owl:someValuesFrom xsd:string
        ] ;
    d3f:definition """purl stands for package URL.

A purl is a URL composed of seven components:

scheme:type/namespace/name@version?qualifiers#subpath

Components are separated by a specific character for unambiguous parsing.

The definition for each components is:

* scheme: this is the URL scheme with the constant value of \"pkg\". This is not modeled in the RDF representation as it is tautological.
* type: the package \"type\" or package \"protocol\" such as maven, npm, nuget, gem, pypi, etc. Required.
* namespace: some name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization. Optional and type-specific.
* name: the name of the package. Required.
* version: the version of the package. Optional.
* qualifiers: extra qualifying data for a package such as an OS, architecture, a distro, etc. Optional and type-specific.
* subpath: extra subpath within a package, relative to the package root. Optional.

Components are designed such that they form a hierarchy from the most significant component on the left to the least significant component on the right.""" ;
    rdfs:isDefinedBy <https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst> ;
    rdfs:seeAlso <https://github.com/package-url/purl-spec> .

d3f:PackageURLType a owl:Class ;
    rdfs:label "Package URL Type" ;
    rdfs:seeAlso <https://github.com/package-url/purl-spec> ;
    d3f:definition """Each package manager, platform, type, or ecosystem has its own conventions and protocols to identify, locate, and provision software packages.

The package type is the component of a package URL that is used to capture this information with a short string such as maven, npm, nuget, gem, pypi, etc.""" .

# Example individual for Maven, the PURL structured value for the type is stored using the rdf:value property

d3f:PackageURLType-Maven a owl:NamedIndividual, d3f:PackageURLType ;
    rdfs:label "Maven" ;
    rdf:value "maven" ;
    rdfs:isDefinedBy <https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#maven> ;
    rdfs:comment "maven for Maven JARs and related artifacts" ;
    skos:example "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1", 
        "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?type=pom",
        "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?classifier=sources",
        "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?type=zip&classifier=dist",
        "pkg:maven/net.sf.jacob-projec/jacob@1.14.3?classifier=x86&type=dll",
        "pkg:maven/net.sf.jacob-projec/jacob@1.14.3?classifier=x64&type=dll" ;
    rdfs:seeAlso <https://repo.maven.apache.org/maven2> ;
    d3f:definition """The default repository is https://repo.maven.apache.org/maven2.

    The group id is the ``namespace`` and the artifact id is the ``name``.

    Known qualifiers keys are: ``classifier`` and ``type`` as defined in the POM documentation. Note that Maven uses a concept / coordinate called packaging which does not map directly 1:1 to a file extension. In this use case, we need to construct a link to one of many possible artifacts. Maven itself uses type in a dependency declaration when needed to disambiguate between them.""" .

d3f:package-property a owl:ObjectProperty ;
    rdfs:subPropertyOf d3f:associated-with ;
    rdfs:domain d3f:SoftwarePackage ;
    rdfs:label "package-property" ;
    d3f:definition "x package-property y: The package x has the object property y." .

d3f:package-data-property a owl:DatatypeProperty ;
    rdfs:subPropertyOf d3f:d3fend-artifact-data-property ;
    rdfs:domain d3f:SoftwarePackage ;
    rdfs:label "package-data-property" ;
    d3f:definition "x package-data-property y: The package x has the data property y." .

d3f:package-type a owl:ObjectProperty, owl:FunctionalProperty ;
    rdfs:subPropertyOf d3f:package-property ;
    rdfs:range d3f:PackageURLType ;
    rdfs:label "package-type" ;
    skos:example "maven", "npm", "nuget", "gem", "pypi" ;
    d3f:definition "x package-type y: The package x has the type y." .

d3f:package-namespace a owl:DatatypeProperty, owl:FunctionalProperty ;
    rdfs:subPropertyOf d3f:package-data-property ;
    rdfs:label "package-namespace" ;
    d3f:definition "x package-namespace y: The package x has the namespace y." .

d3f:package-name a owl:DatatypeProperty, owl:FunctionalProperty ;
    rdfs:subPropertyOf d3f:package-data-property ;
    rdfs:label "package-name" ;
    d3f:definition "x package-name y: The package x has the name y." .

d3f:package-version a owl:DatatypeProperty, owl:FunctionalProperty ;
    rdfs:subPropertyOf d3f:package-data-property ;
    rdfs:label "package-version" ;
    d3f:definition "x package-version y: The package x has the version y." .

d3f:package-qualifiers a owl:ObjectProperty ;
    rdfs:subPropertyOf d3f:package-data-property ;
    rdfs:label "package-qualifiers" ;
    skos:example "arch=i386", "platform=java", "repository_url=gcr.io" ;
    d3f:definition "x package-qualifiers y: The package x has the qualifiers y." .

d3f:package-subpath a owl:DatatypeProperty, owl:FunctionalProperty ;
    rdfs:subPropertyOf d3f:package-data-property ;
    rdfs:label "package-subpath" ;
    d3f:definition "x package-subpath y: The package x has the subpath y." .

d3f:purl a owl:ObjectProperty, owl:InverseFunctionalProperty ;
    rdfs:subPropertyOf d3f:package-property ;
    rdfs:range xsd:anyURI ;
    rdfs:label "purl" ;
    d3f:definition "A package URL (purl) is a URL for identifying software packages." .

Examples

<pkg:bitbucket/birkenfeld/pygments-main@244fd47e07d1014f0aed9c> a d3f:PackageURL .
<pkg:deb/debian/curl@7.50.3-1?arch=i386&distro=jessie> a d3f:PackageURL .
<pkg:docker/cassandra@sha256:244fd47e07d1004f0aed9c> a d3f:PackageURL .
<pkg:docker/customer/dockerimage@sha256:244fd47e07d1004f0aed9c?repository_url=gcr.io>
a d3f:PackageURL .

asserting triples:

:jruby-launcher d3f:purl <pkg:gem/jruby-launcher@1.1.2?platform=java> .

could materialize:

<pkg:gem/jruby-launcher@1.1.2?platform=java> a d3f:SoftwarePackage .
<#curl-debian> a d3f:SoftwarePackage ;
    d3f:package-type d3f:PackageURLType-Deb ;
    d3f:package-namespace "debian" ;
    d3f:package-name "curl" ;
    d3f:package-version "7.50.3-1" ;
    d3f:package-qualifiers "arch=i386", "distro=jessie" .
<#batik-anim> a d3f:SoftwarePackage ;
    d3f:package-type d3f:PackageURLType-Maven ;
    d3f:package-namespace "org.apache.xmlgraphics" ;
    d3f:package-name "batik-anim" ;
    d3f:package-version "1.9.1" .

References

@netfl0
Copy link
Contributor

netfl0 commented Apr 26, 2024

Definition

@prefix d3f: <http://d3fend.mitre.org/ontologies/d3fend.owl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

d3f:PackageURL a owl:Class ;
    rdfs:label "Package URL" ;
    rdfs:subClassOf d3f:Identifier, 

^ Should be subclass of d3f:URL

    [ a owl:Restriction ;
      owl:onProperty d3f:identifies ;
      owl:someValuesFrom d3f:SoftwarePackage
    ],
    [ a owl:Restriction ;
      owl:onProperty d3f:package-type ;
      owl:someValuesFrom d3f:PackageURLType

...

skos:example "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1", 
    "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?type=pom",
    "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?classifier=sources",
    "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?type=zip&classifier=dist",
    "pkg:maven/net.sf.jacob-projec/jacob@1.14.3?classifier=x86&type=dll",
    "pkg:maven/net.sf.jacob-projec/jacob@1.14.3?classifier=x64&type=dll" ;
rdfs:seeAlso <https://repo.maven.apache.org/maven2> ;
d3f:definition """The default repository is https://repo.maven.apache.org/maven2.

The group id is the ``namespace`` and the artifact id is the ``name``.

Known qualifiers keys are: ``classifier`` and ``type`` as defined in the POM documentation. Note that Maven uses a concept / coordinate called packaging which does not map directly 1:1 to a file extension. In this use case, we need to construct a link to one of many possible artifacts. Maven itself uses type in a dependency declaration when needed to disambiguate between them.""" .

d3f:package-property a owl:ObjectProperty ;
rdfs:subPropertyOf d3f:associated-with ;

^^ This is key and something we've needed to decide on. This issue is forcing me to decide :)

These are more "schema" oriented fields versus our intent with d3f:associated-with. Associated for us is sort of short hand for "inferentially associated with". We use these to produce all of our various inferred relationships. We almost need a high-level property called something like d3f:schema-property to indicate its a bit outside of our general model. This is where we'd want to fold in OCSF fields as well, I think you've had some other content that might fall under there, you had a OCSF property generation script. I'd like to see them be in our proper namespace, but with links back to OCSF. CC @hack-sentinel

In general most of these we'd want to be in sync with OCSF. If we uncover an issue with OCSF, we should engage them to help improve OCSF to make it ontologically sound.

rdfs:domain d3f:SoftwarePackage ;
rdfs:label "package-property" ;
d3f:definition "x package-property y: The package x has the object property y." .

d3f:package-data-property a owl:DatatypeProperty ;
rdfs:subPropertyOf d3f:d3fend-artifact-data-property ;
rdfs:domain d3f:SoftwarePackage ;
rdfs:label "package-data-property" ;
d3f:definition "x package-data-property y: The package x has the data property y." .

....

@netfl0 netfl0 self-assigned this Apr 26, 2024
@netfl0 netfl0 added this to the 0.16.0 milestone Apr 26, 2024
@netfl0 netfl0 assigned aamedina and unassigned netfl0 Apr 26, 2024
@aamedina
Copy link
Contributor Author

Thank you for your feedback! Will be taking what you said into consideration and come up with a pass at this I'll throw at a PR.

Here is the OCSF extension branch for reference (a bit out of date, for 1.1.0-dev, but same concepts apply): https://github.com/aamedina/d3fend-ontology/tree/aamedina/ocsf/extensions/ocsf, which uses SPARQL-Generate to produce https://github.com/aamedina/d3fend-ontology/blob/aamedina/ocsf/extensions/ocsf/dataset/ocsf.ttl

An advantage of OCSF's dictionary design is that is plays well with the RDF notion of how properties are related to classes. OCSF classes can have their own "restrictions" which pull in OCSF dictionary properties. We don't need to have process specific or package specific properties, but we could have one single super property that indicates OCSF alignment and have the process/package etc properties (whatever they end up looking like) inherit from that (like d3f:schema-property or d3f:ocsf-property to indicate OCSF provenance for the property semantics.)

@netfl0
Copy link
Contributor

netfl0 commented Apr 26, 2024

Schema property could be used to unify and relate the various schemas, thus generic d3f:schema property is more appropriate.

If we intend to assert provenance, we can do an rdfs:definedBy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants