Skip to content


Repository files navigation

Data Science Ontology

Build Status DOI

The Data Science Ontology is a knowledge base about data science that aims to:

  • catalog and formalize the concepts of data science
  • semantically annotate popular software packages for data science
  • power new AI assistants for data scientists

To learn more about the Data Science Ontology, start here.

The Data Science Ontology is young but growing! We welcome contributions of concepts and annotations. Learn how to contribute. For improvements to the web frontend, please visit the dedicated frontend repository.

Developer documentation

Getting started on your machine

Ensure jq, pandoc-citeproc, and npm are installed.

To install the JavaScript-based dependencies: npm install

To build the ontology into the build folder: npm run build

To validate the ontology after building: npm run validate

Uploading to a CouchDB database

The following steps assume using the IBM Cloud free tier, but can be adjusted to other CouchDB services.

  1. Create a Cloudant resource.
  2. On the Service Details page, choose Launch Cloudant Dashboard.
  3. On the Databases page, choose Create Database.
  4. Name your database data-science-ontology, choose Non-partitioned and choose Create.
  5. On the Account page, under the Settings tab, copy the External Endpoint (preferred) value, and assign it to the COUCH_URL environment variable (note: do not use a trailing slash).
  6. Use IAM to set up an API key and assign it to the IAM_API_KEY environment variable.
  7. Run npm run upload-couchdb.

If you want to re-run step 7 after a new build, run npm run clean-couchdb first. Note that this removes all non-design documents from your database.