Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download and manage Wikidata dumpfiles #8

Closed
mkroetzsch opened this issue Feb 18, 2014 · 2 comments
Closed

Download and manage Wikidata dumpfiles #8

mkroetzsch opened this issue Feb 18, 2014 · 2 comments
Assignees
Labels

Comments

@mkroetzsch
Copy link
Member

There should be a component to download and manage dump files in the format provided for Wikidata.org. It should access dumps from a specified location, find out which dumps are available, and fetch dumps as needed. Relevant types of dumps (current revisions, full, daily) should be distinguished and treated suitably. The component should provide access to any of these files transparently (without requiring accessing components to know about their location or compression format).

@mkroetzsch mkroetzsch added this to the Wikidata Toolkit 0.1 milestone Feb 18, 2014
@mkroetzsch mkroetzsch self-assigned this Feb 18, 2014
mkroetzsch added a commit that referenced this issue Feb 21, 2014
There are several types of dump files provided: full dumps,
dumps of current versions, and incremental (daily) dumps.
The are found online and (after being downloaded once), also
locally. The class MediaWikiDumpFile represents such a dump
and provides transparent access to its contents (whereever
it comes from, whatever type it is).

The WmfDumpFileManager provides methods for creating
MediaWikiDumpFile objects from the data seen online or in local
files. Classes that implement the interface DumpFileProcessor
can ask WmfDumpFileManager to call them for all relevant dump
files in the right order. This is the preferred way of processing
all dumps.

There are no tests yet. Testing will require suitable mock objects
to simulate the Web and the file systems. The code provides places
to inject these mock objects.

This contributes to issue #8.
@nlothian
Copy link

nlothian commented Mar 2, 2014

Are these dumps going to be in RDF format?

@mkroetzsch
Copy link
Member Author

We are talking about the dumps at http://dumps.wikimedia.org/ here. The goal of this task is to be able to import the data from Wikidata into our system. We plan to create RDF output dumps from that data, but this is another issue: #14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants