Don't download dump files that are not done yet #63

vrandezo · 2014-04-21T22:08:39Z

E.g. using wdtk-example, it is right now already downloading the April 20 dump, but the dump is not fully generated yet. Can we first check whether a dump is complete and available before starting to download it?

mkroetzsch · 2014-04-22T07:13:58Z

I thought that we are doing this. The script always fetches the maxrevid before attempting a download, and it only starts if this id can be found. There are also some kind of checks for the "done" status independent of this, but maybe there is a gap there in one case. What kind of dump are you talking about, daily or current or full?

vrandezo · 2014-04-22T17:52:41Z

It was the current one.

guenthermi · 2014-04-22T23:03:18Z

I got the same problem by testing my json-serializer example code. At first I got the error that there is no maximal revision id. Later it downloaded the incomplete dump and reported that it finished the processing after downloading and processing only 160 MB of the dump file (wikidatawiki-20140420-pages-meta-current).

mkroetzsch · 2014-04-23T07:06:28Z

Confirmed. We have code that checks the md5sums to see if a dump is done, but the download does not use this and relies on the maxrevid alone. My assumption was that the maxrevid is not published before the dump is done, but that seems to be wrong.

* Fix issue #63 by checking availability explicitly * Finding most recent dump of some type now checks availability * Better logging output to show which dumps are found/processed * Updated tests to work with new code

Improve dump downloading behaviour

vrandezo · 2014-04-23T22:42:24Z

Thx!

mkroetzsch added the bug label Apr 23, 2014

mkroetzsch self-assigned this Apr 23, 2014

mkroetzsch mentioned this issue Apr 23, 2014

Improve dump downloading behaviour #65

Merged

vrandezo referenced this issue Apr 23, 2014

Merge pull request #65 from Wikidata/issue-#63

11fb88e

Improve dump downloading behaviour

vrandezo closed this as completed Apr 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't download dump files that are not done yet #63

Don't download dump files that are not done yet #63

vrandezo commented Apr 21, 2014

mkroetzsch commented Apr 22, 2014

vrandezo commented Apr 22, 2014

guenthermi commented Apr 22, 2014

mkroetzsch commented Apr 23, 2014

vrandezo commented Apr 23, 2014

Don't download dump files that are not done yet #63

Don't download dump files that are not done yet #63

Comments

vrandezo commented Apr 21, 2014

mkroetzsch commented Apr 22, 2014

vrandezo commented Apr 22, 2014

guenthermi commented Apr 22, 2014

mkroetzsch commented Apr 23, 2014

vrandezo commented Apr 23, 2014