Skip to content
Link Wikidata items to large catalogs
Python Shell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker restore shared folder permissions due to Docker writing files as root Jul 19, 2019
docs minor change Jul 18, 2019
samples translate spaces to tabs & remember header May 17, 2019
scripts remove the validator, see #344 Jul 30, 2019
soweego handle claims with no value in the case-insensitive case Aug 29, 2019
.dockerignore tidy up Jul 18, 2019
.gitignore tidy up Jul 18, 2019
.pylintrc Fixes many importer pylint issues May 23, 2019
.travis.yml the PR should be properly checked out now May 27, 2019
CONTRIBUTING.md refurbish Jul 5, 2019
Dockerfile.dev rename test to dev Jul 18, 2019
Dockerfile.pipeline Another log level fix Feb 13, 2019
Dockerfile.prod more eloquent shared folder between Docker and the local machine Nov 15, 2018
LICENSE add GPL 3 license Jul 13, 2018
README.md fix link Jul 18, 2019
docker-compose.dev.yml rename test to dev Jul 18, 2019
pipeline-logging-config.json Pipeline log is written into a gzip Jun 27, 2019
requirements.txt downgrade to previous tensorboard version Jul 4, 2019
user-config.py

README.md

soweego: link Wikidata to large catalogs

Build Status Documentation Status License

soweego is a pipeline that connects Wikidata to large-scale third-party catalogs.

soweego is the only system that makes statisticians, epidemiologists, historians, and computer scientists agree. Why? Because it performs record linkage, data matching, and entity resolution at the same time. Too easy, they all seem to be synonyms!

Oh, soweego also embeds Machine Learning and advocates for Linked Data.

Is soweego similar to the Go game?

Official Project Page

soweego is made possible thanks to the Wikimedia Foundation:

https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego

Documentation

https://soweego.readthedocs.io/

Highlights

Get Ready

Install Docker and Compose, then enter soweego:

$ git clone https://github.com/Wikidata/soweego.git
$ cd soweego
$ ./docker/run.sh
Building soweego
...

root@70c9b4894a30:/app/soweego#

Now it's too late to get out!

Run the Pipeline

Piece of cake:

:/app/soweego# python -m soweego run CATALOG

Pick CATALOG from discogs, imdb, or musicbrainz.

These steps are executed by default:

  1. import the target catalog into a local database;
  2. link Wikidata to the target with a supervised linker;
  3. synchronize Wikidata to the target.

Results are in /app/shared/results.

Use the Command Line

You can launch every single soweego action with CLI commands:

:/app/soweego# python -m soweego
Usage: soweego [OPTIONS] COMMAND [ARGS]...

  Link Wikidata to large catalogs.

Options:
  -l, --log-level <TEXT CHOICE>...
                                  Module name followed by one of [DEBUG, INFO,
                                  WARNING, ERROR, CRITICAL]. Multiple pairs
                                  allowed.
  --help                          Show this message and exit.

Commands:
  importer  Import target catalog dumps into a SQL database.
  ingester  Take soweego output into Wikidata items.
  linker    Link Wikidata items to target catalog identifiers.
  run       Launch the whole pipeline.
  sync      Sync Wikidata to target catalogs.

Just two things to remember:

  1. you can always get --help;
  2. each command may have sub-commands.

Contribute

The best way is to import a new catalog. Please also have a look at the guidelines.

License

The source code is under the terms of the GNU General Public License, version 3.

You can’t perform that action at this time.