Skip to content

Commit

Permalink
Catch unexpected API errors in getPageTitlesAPI
Browse files Browse the repository at this point in the history
Apparently the initial JSON test is not enough, the JSON can be broken
or unexpected in other ways/points.
Fallback to the old scraper in such a case.

Fixes #295 , perhaps.

If the scraper doesn't work for the wiki, the dump will fail entirely,
even if maybe the list of titles was almost complete. A different
solution may be in order.
  • Loading branch information
nemobis committed May 19, 2018
1 parent 59c4c54 commit 1ff5af7
Showing 1 changed file with 3 additions and 8 deletions.
11 changes: 3 additions & 8 deletions dumpgenerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# -*- coding: utf-8 -*-

# dumpgenerator.py A generator of dumps for wikis
# Copyright (C) 2011-2016 WikiTeam developers
# Copyright (C) 2011-2018 WikiTeam developers
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
Expand Down Expand Up @@ -396,16 +396,11 @@ def getPageTitles(config={}, session=None):

titles = []
if 'api' in config and config['api']:
r = session.post(config['api'], params={'action': 'query', 'list': 'allpages', 'format': 'json'}, timeout=30)
try:
test = getJSON(r)
titles = getPageTitlesAPI(config=config, session=session)
except:
test = None
if not test or ('warnings' in test and 'allpages' in test['warnings'] and '*' in test['warnings']['allpages']
and test['warnings']['allpages']['*'] == 'The "allpages" module has been disabled.'):
print "Error: could not get page titles from the API"
titles = getPageTitlesScraper(config=config, session=session)
else:
titles = getPageTitlesAPI(config=config, session=session)
elif 'index' in config and config['index']:
titles = getPageTitlesScraper(config=config, session=session)

Expand Down

0 comments on commit 1ff5af7

Please sign in to comment.