Skip to content

✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph

License

Notifications You must be signed in to change notification settings

MaastrichtU-IDS/sparql-profiler

Repository files navigation

✨ SPARQL endpoint profiler

PyPI - Version PyPI - Python Version license code style - black

Test package Publish package

A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph.

This package follows the recommendations defined by the HCLS Community Profile (Health Care and Life Sciences) to generate the metadata about the content of a SPARQL endpoint.

📦️ Installation

This package requires Python >=3.7, simply install it with:

pip install sparql-profiler

🪄 Usage

⌨️ Use as a command-line interface

You can easily use the sparql-profiler from your terminal after installing with pip.

Run profiling

Quickly profile a small SPARQL endpoint to generate HCLS descriptive metadata for each graph:

sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg

Profiling a bigger SPARQL endpoint will take more times:

sparql-profiler profile https://bio2rdf.org/sparql

Display more debugging logs with -l debug:

sparql-profiler profile https://bio2rdf.org/sparql -l debug

Profile a SPARQL endpoint to run a profiling method specific to Bio2RDF:

sparql-profiler profile https://bio2rdf.org/sparql --profiler bio2rdf

You can also add additional metadata for the dataset distribution after answering questions about it (description, license, etc) by running this command:

sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg -q

Help

See all options for the profile command with:

sparql-profiler profile --help

Get a full rundown of all available commands with:

sparql-profiler --help

🐍 Use with python

Use the sparql-profiler in python scripts:

from sparql_profiler import SparqlProfiler

sp = SparqlProfiler("https://graphdb.dumontierlab.com/repositories/umids-kg")
print(sp.metadata.serialize(format="turtle"))

🧑‍💻 Development setup

The final section of the README is for if you want to run the package in development, and get involved by making a code contribution.

📥️ Clone

Clone the repository:

git clone https://github.com/MaastrichtU-IDS/sparql-profiler
cd sparql-profiler

🐣 Install dependencies

Install Hatch, this will automatically handle virtual environments and make sure all dependencies are installed when you run a script in the project:

pip install --upgrade hatch

Install the dependencies in a local virtual environment:

hatch -v env create

Alternatively, if you are already handling the virtual environment yourself or installing in a docker container you can use:

pip install -e ".[test,dev]"

🏗️ Run in development

You can easily run the sparql-profiler in your terminal with hatch while in development to profile a specific SPARQL endpoint:

hatch run sparql-profiler profile https://graphdb.dumontierlab.com/repositories/umids-kg

☑️ Run tests

Make sure the existing tests still work by running pytest. Note that any pull requests to the fairworkflows repository on github will automatically trigger running of the test suite;

hatch run test

To display all print():

hatch run test -s

🧹 Code formatting

The code will be automatically formatted when you commit your changes using pre-commit. But you can also run the script to format the code yourself:

hatch run fmt

Check the code for errors, and if it is in accordance with the PEP8 style guide, by running flake8 and mypy:

hatch run check

♻️ Reset the environment

In case you are facing issues with dependencies not updating properly you can easily reset the virtual environment with:

hatch env prune

🏷️ New release process

The deployment of new releases is done automatically by a GitHub Action workflow when a new release is created on GitHub. To release a new version:

  1. Make sure the PYPI_TOKEN secret has been defined in the GitHub repository (in Settings > Secrets > Actions). You can get an API token from PyPI at pypi.org/manage/account.
  2. Increment the version number in the pyproject.toml file in the root folder of the repository.
  3. Create a new release on GitHub, which will automatically trigger the publish workflow, and publish the new release to PyPI.

You can also manually trigger the workflow from the Actions tab in your GitHub repository webpage.

Notes

SPARQL profiler of the Yumaka viewer (https://umaka-viewer.dbcls.jp/, code in https://github.com/dbcls/umakaparser) written in Java which should be able to work with large graphs: https://bitbucket.org/yayamamo/tripledataprofiler/src/master/src/jp/ac/rois/dbcls/TripleDataProfiler.java

Run:

java -jar TripleDataProfiler.jar -ep https://bio2rdf.org/sparql

Build:

javac -cp commons-cli-1.2.jar:commons-lang3-3.3.2.jar:apache-jena-2.11.1/lib/*:./src ./src/jp/ac/rois/dbcls/TripleDataProfiler.java

The guy is expecting we figure out by ourselves where to get his shit** deprecated versions of jena. Can't use basic maven in 2021, I am so tired about this kind of work done in research. There is no respect whatsoever by anyone for their own work.

About

✨ A package to profile SPARQL endpoints to extract the nodes and relations represented in the knowledge graph

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages