Semantic Data Model (Ontology) for Company Data

The euBusinessGraph project aims at simplifying cross-border and cross-lingual collection, reconciliation, aggregation and analysis of company-related information from several authoritative and non-authoritative sources.

The euBusinessGraph has drawn on the experience of its data providers and technology providers to tackle the complex task of combining company data from multiple sources. We have defined a common semantic model (ontology) to represent companies and their attributes in a consistent way.

Based on project needs and provider datasets
Rooted in and reuses existing ontologies and datasets
Expressed in comprehensive EBG Semantic Model doc
Formalized as ebg: ontology using schema:(domain|range)Includes
Also defines URL patterns and authorities/lookup lists to use
Will be validated with RDF Shapes

Common Semantic Model (ontology)

We created an initial company data model considering related works, data available from the partners, and the needs of their business cases. The model covers the following requirements:

Capture the concept of a company and represent different types of companies.
Represent company jurisdictions and registration information.
Capture company contact information, such as the address and other locations.
Capture social data of companies, such as their websites (together with Web languages), RSS/Atom feeds and Wikipedia URLs.
Answer if a company is publicly traded or not, if it is state owned or not, and if it is registered in a startup register.
Support languages: EN, IT, NO.

In developing the company data model we have reused from appropriate ontologies such as:

EU Core Vocabs: W3C Org, RegOrg, Location, Person (not W3C)
schema.org: widely used, some relevant properties (e.g., dates)
ADMS: datasets and identifiers

Since none of the existing ontologies covers the complete scope we need, we reuse where possible and extend and compose by:

Add some classes and properties of our own (ebg: ontology)
Use schema:(domain|range)Includes instead of rdfs:(domain|range) for easier composition (polymorphic vs monomorphic)

In addition we define RDF Shapes (SHACL and ShEx) to validate incoming data.

Company data

In its first release, the model focuses on capturing key company information present in official registers such as legal name, registered address and economic classification, and also information coming from online resources related to the company such as company websites, blogs and social media accounts. These aspects are explicitly incorporated into the model and describe company information that is shared across data providers and directly accessible through the graph. Additionally, the model supports advertising other company related information available from data providers directly.

Identifier System

We have performed a thorough analysis of identifiers in the context of euBusinessGraph. From the analysis of the different identifier systems and the requirements of the business cases of the project, we singled out key aspects about identifiers and addressed them in the common semantic model.

Achieving matching and reconciliation across jurisdictions and registers requires careful modelling of identifier use. This release models the different cases through properties that describe the lifecycle of each identifier issued and by encoding a series of characteristics of the identifier system to which the identifier belongs. We follow a pragmatic approach when describing identifier systems in terms of these characteristics.

We model expectations of a particular system that should help determine to which extent an indicator can be used for matching and reconciliation. Additionally, we model web resources that are frequently found for identifier systems such as search endpoints, templates for building identifier URLs through which company information can be reached and other resources that describe the system’s rules. Finally, the model supports the representation of the different agents that are in charge of setting and maintaining rules, issuing identifiers and publishing identifier databases.

GitHub Repository

This repository contains the sources for the euBusinessGraph Semantic Model for representing company-related data. Here we will keep:

Prefixes file
Instance model file in Turtle format
Instance model files for diagrams
Generated ontology file in RDF format
Generated online documentation using LODE
RDF shapes for validation
RDF data (e.g. NACE csv sheet), conversion scripts and resulting RDF
Diagrams for the master document (links to full-size diagrams and source files here)

References

For further details about the euBusinessGraph ontology:

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
data		data
example		example
model		model
ontology		ontology
partners		partners
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

example

example

model

model

ontology

ontology

partners

partners

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Semantic Data Model (Ontology) for Company Data

Common Semantic Model (ontology)

Company data

Identifier System

GitHub Repository

References

About

Releases

Packages

Contributors 10

Languages

euBusinessGraph/eubg-data

Folders and files

Latest commit

History

Repository files navigation

Semantic Data Model (Ontology) for Company Data

Common Semantic Model (ontology)

Company data

Identifier System

GitHub Repository

References

About

Resources

Stars

Watchers

Forks

Languages