Open Knowledge Graphs is a static, daily-refreshed catalog of ontology and semantic software records sourced from Wikidata. It publishes both machine-readable artifacts (Turtle + JSON) and a searchable browser UI.
- Site: https://openknowledgegraphs.com/
- Semantic Search API: https://api.openknowledgegraphs.com/
- Ontology schema (Turtle): https://openknowledgegraphs.com/ontology.ttl
- Ontologies dataset (Turtle): https://openknowledgegraphs.com/data/ontologies.ttl
- Ontologies dataset (JSON): https://openknowledgegraphs.com/data/ontologies.json
- Software dataset (Turtle): https://openknowledgegraphs.com/data/software.ttl
- Software dataset (JSON): https://openknowledgegraphs.com/data/software.json
Semantic search over the full catalog.
GET https://api.openknowledgegraphs.com/search?q=movie+ontology&limit=5
GET https://api.openknowledgegraphs.com/ontologies?q=healthcare+vocabulary
GET https://api.openknowledgegraphs.com/software?q=rdf+triplestore
Parameters: q (required), category, type (ontology|software), limit (default 20, max 100)
Categories: Life Sciences & Healthcare, Geospatial, Government & Public Sector, International Development, Finance & Business, Library & Cultural Heritage, Technology & Web, Environment & Agriculture, General / Cross-domain
scripts/fetch_data.pyqueries Wikidata (WDQS), normalizes records, and writes:data/ontologies.ttldata/ontologies.jsondata/software.ttldata/software.json
- Category enrichment is maintained in
data/categories.json:- one-time backfill:
scripts/classify_categories.py - incremental on daily refresh:
scripts/fetch_data.py
- one-time backfill:
site/index.html+site/app.js+site/style.cssrender the client-side catalog UI.- GitHub Actions refresh data daily and deploy the static site + datasets to GitHub Pages.
data/: published datasets and category mappingsscripts/: data refresh and LLM classification scriptssite/: static frontend (HTML/CSS/JS + assets).github/workflows/: CI/CD for data refresh and Pages deployontology.ttl: ontology and SHACL shape definitions
- Python 3.11+ (3.12 recommended)
pip
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython scripts/fetch_data.pyOptional category backfill (Anthropic):
export ANTHROPIC_API_KEY=your_key_here
python scripts/classify_categories.pypython -m http.server 8000Then open: http://localhost:8000/site/
There is no server-side API; the JSON files are the API surface.
Top-level object:
{
"generatedAt": "2026-03-08T03:21:55Z",
"items": []
}items[] fields:
- Required:
title(string)wikidataId(IRI string to Wikidata page)types(string array, may contain multiple values)
- Optional (omitted when absent):
description(string)homepage(IRI string)partOf(string)licenses(string array)category(string, one of predefined domain categories)
Top-level object:
{
"generatedAt": "2026-03-08T03:21:55Z",
"items": []
}items[] fields:
- Required:
title(string)wikidataId(IRI string to Wikidata page)types(string array)
- Optional:
description(string)homepage(IRI string)licenses(string array)latestVersion(string)releaseDate(ISO date string)
Schema source: https://openknowledgegraphs.com/ontology.ttl
| Class | Description |
|---|---|
okg:Resource |
Base class for all catalog resources |
okg:Ontology |
Ontology resources |
okg:ControlledVocabulary |
Controlled vocabulary resources |
okg:Taxonomy |
Taxonomy resources |
okg:Software |
Software/tooling resources |
okg:License |
License nodes attached to resources |
| Property | Range | Notes |
|---|---|---|
okg:title |
xsd:string |
required; max 1 |
okg:wikidataId |
IRI | required; max 1 |
okg:description |
xsd:string |
optional; max 1 |
okg:category |
okg:Category |
optional; max 1 |
okg:homepage |
IRI | optional; max 1 |
okg:hasLicense |
okg:License |
optional; multi-valued |
okg:partOf |
xsd:string |
optional; max 1 |
okg:latestVersion |
xsd:string |
software only; optional; max 1 |
okg:releaseDate |
xsd:date |
software only; optional; max 1 |
okg:licenseName |
xsd:string |
license node label |
SHACL constraints are defined in okg:ResourceShape, okg:SoftwareShape, and related shapes in ontology.ttl.
File: .github/workflows/update-data.yml
- Trigger: daily at
0 6 * * *(06:00 UTC) + manual dispatch - Installs Python dependencies
- Runs
python scripts/fetch_data.py - Commits changed data files as
github-actions[bot]with:chore(data): refresh catalog from Wikidata
File: .github/workflows/deploy.yml
- Trigger: pushes to
mainaffectingsite/**,data/**,ontology.ttl, or workflow file - Builds Pages artifact from:
site/(frontend)data/(datasets)ontology.ttl(schema)
- Deploys via GitHub Pages actions
- Fork the repository.
- In your fork, enable GitHub Pages with source set to GitHub Actions.
- (Optional) Configure a custom domain:
- add
site/CNAME - set DNS records
- enable HTTPS in Pages settings
- add
- If using category classification, add
ANTHROPIC_API_KEYas a repository secret or environment variable for the workflow runtime. - Run
Update Catalog Datamanually once to generate/refresh data. - Push any change to
site/,data/, orontology.ttlto trigger deploy.
The Streamlit app has been removed from main (app.py no longer exists). See the full migration and feature mapping guide:
Legacy reference data model remains available in dist/catalog.ttl (ignored in git).
fetch_data.pyfails with HTTP/timeout errors:- rerun; the script has retry/backoff for WDQS throttling
- No category assignments added:
- confirm
ANTHROPIC_API_KEYis set - run
python scripts/classify_categories.pymanually
- confirm
- Site loads but data is empty locally:
- serve from repo root and open
http://localhost:8000/site/ - verify
data/*.jsonexists and is valid JSON
- serve from repo root and open
- Workflow runs but no commit is created:
- no data diff detected in tracked outputs
No. The catalog includes both open and proprietary resources if they are represented in Wikidata.
Primary metadata is sourced from Wikidata queries. Category labels can be added via LLM classification and then frozen in data/categories.json.
Wikidata coverage is uneven. Optional fields (homepage, license, version, release date, category) are omitted when unavailable.
Daily at 06:00 UTC via GitHub Actions, plus manual runs.
See CONTRIBUTING.md for contribution workflow and quality checks.
- Data source: Wikidata (CC0)
- Code license: MIT (see
LICENSE)