Skip to content
Link Wikidata items to large catalogs
Python Shell
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker restore shared folder permissions due to Docker writing files as root Jul 19, 2019
docs minor change Jul 18, 2019
samples translate spaces to tabs & remember header May 17, 2019
scripts remove the validator, see #344 Jul 30, 2019
soweego handle claims with no value in the case-insensitive case Aug 29, 2019
.dockerignore tidy up Jul 18, 2019
.gitignore tidy up Jul 18, 2019
.pylintrc Fixes many importer pylint issues May 23, 2019
.travis.yml the PR should be properly checked out now May 27, 2019 refurbish Jul 5, 2019 rename test to dev Jul 18, 2019
Dockerfile.pipeline Another log level fix Feb 13, 2019 more eloquent shared folder between Docker and the local machine Nov 15, 2018
LICENSE add GPL 3 license Jul 13, 2018 fix link Jul 18, 2019 rename test to dev Jul 18, 2019
pipeline-logging-config.json Pipeline log is written into a gzip Jun 27, 2019
requirements.txt downgrade to previous tensorboard version Jul 4, 2019

soweego: link Wikidata to large catalogs

Build Status Documentation Status License

soweego is a pipeline that connects Wikidata to large-scale third-party catalogs.

soweego is the only system that makes statisticians, epidemiologists, historians, and computer scientists agree. Why? Because it performs record linkage, data matching, and entity resolution at the same time. Too easy, they all seem to be synonyms!

Oh, soweego also embeds Machine Learning and advocates for Linked Data.

Is soweego similar to the Go game?

Official Project Page

soweego is made possible thanks to the Wikimedia Foundation:



Get Ready

Install Docker and Compose, then enter soweego:

$ git clone
$ cd soweego
$ ./docker/
Building soweego


Now it's too late to get out!

Run the Pipeline

Piece of cake:

:/app/soweego# python -m soweego run CATALOG

Pick CATALOG from discogs, imdb, or musicbrainz.

These steps are executed by default:

  1. import the target catalog into a local database;
  2. link Wikidata to the target with a supervised linker;
  3. synchronize Wikidata to the target.

Results are in /app/shared/results.

Use the Command Line

You can launch every single soweego action with CLI commands:

:/app/soweego# python -m soweego
Usage: soweego [OPTIONS] COMMAND [ARGS]...

  Link Wikidata to large catalogs.

  -l, --log-level <TEXT CHOICE>...
                                  Module name followed by one of [DEBUG, INFO,
                                  WARNING, ERROR, CRITICAL]. Multiple pairs
  --help                          Show this message and exit.

  importer  Import target catalog dumps into a SQL database.
  ingester  Take soweego output into Wikidata items.
  linker    Link Wikidata items to target catalog identifiers.
  run       Launch the whole pipeline.
  sync      Sync Wikidata to target catalogs.

Just two things to remember:

  1. you can always get --help;
  2. each command may have sub-commands.


The best way is to import a new catalog. Please also have a look at the guidelines.


The source code is under the terms of the GNU General Public License, version 3.

You can’t perform that action at this time.