SimE4KG - Release
Pre-releaseSimE4KG: Explainable Distributed multi-modal Semantic Similarity Estimation for Knowledge Graphs
This Release includes all of the most recent developments for the SimE4KG framework.
SimE4KG is the Explainable Distributed In-Memory multi-modal Semantic Similarity Estimation for Knowledge Graphs.
Overview
In this release, we introduce multiple changes to the Sansa Stack to offer the SimE4KG functionalities
The content is structured as follows:
- Databricks Notebooks
- ReadMe of novel Modules
- Novel Classes
- Unit Tests
- Data Sets
- Further Reading
SimE4KG Databricks Notebook
To showcase in a hands-on session the usage of SimE4KG modules, we introduce multiple Databricks Notebooks. Those show the Full pipeline but also dedicated parts like the SmartFeature Extractor. Within the notebooks, you can see the mixture of Explanations, Sample code, and the output of the code snippets. With the Notebooks, you can reproduce the functionality within your browser without a need to install the Framework locally.
The Notebooks can be found here:
- SimE4KG Databricks Notebook for sample pipeline building including outputs
- SmartFeatureExtractor Databricks Notebook for multi-modal feature extraction with the novel Smart Feature Extractor
- SimE4KG Semantic Pipeline for Similarity Based Recommendations Sample Pipeline using semantified results to create recommendations
- Further Use cases are ongoing developed and can be found here
ReadME
The novel modules of SimE4KG are documented within the SANSA ML ReadMe. For quick links especially to the high-level SimE4KG Transformer and the SmartFeatureExtractor, you can use these two links:
- SimE4KG/Dasim Transformer ReadMe which is the high leveled Similarity Estimation transformer calling the entire pipeline
- SmartFeatureExtractor ReadMe which is the novel developed generic multi-modal feature extractor transformer
Novel Classes
Novel Classes developed within this release are especially the Dasim Transformer and the SmartFeature extractor but also the corresponding unit test as well as the Evaluation scripts to test module performance:
- DasimTransformer Class, Unit Test
- Smart Feature Extractor Class, Unit Test
- Evaluation Classes like data size scalability, feature availability evaluation, Smartfeature extractor evaluation, and many more ...
Datasets
As starting point to play around with the developments of this framework, we recommend the Linked Movie Data Base RDF Knowledge Graph. This KG represents in millions of triples data about movies and consists of multi modal features like lists of URIs as the lists of actors, numeric features like the runtime but also timestamp data like the release date. For purposes of Unit test, we propose also an extract of this data which follow the same schema.
Further Reading
If you are interested into further reading and background information of other related modules we recommend the following papers:
- Distributed semantic analytics using the SANSA stack
- Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets
- DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
- DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs
Other
- In addition, we provide the full jar of this version below