Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause when running with "jawiki-20160111-pages-articles-multistream.xml" #52

Closed
YMMS opened this issue Mar 2, 2016 · 3 comments
Closed

Comments

@YMMS
Copy link

YMMS commented Mar 2, 2016

Hi Sir,
I ran your tool "wikiextractor.py" on Ubuntu 14.04 with "jawiki-20160111-pages-articles-multistream.xml". But it paused as some place.
The last output lines are as below:
779 WARNING: Template errors in article 'ファイル:Hakutaka-485.jpeg' (303793): title(0) recursion(5, 0, 0)
780 WARNING: Template errors in article 'ファイル:Maizurukoyuuransen01.jpg' (303877): title(0) recursion(5, 0, 0)
781 WARNING: Template errors in article 'ファイル:Chigasakishiyakusyo041111.jpg' (303902): title(0) recursion(5, 0, 0)

And the process state is as below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22415 yangming 20 0 517612 397328 2012 R 99.7 0.2 134:37.88 python

@attardi
Copy link
Owner

attardi commented Mar 5, 2016

Looks like the extractor processes have run into a deadlock.
For the moment, try running with a single process, using option:

--processes 1

@attardi
Copy link
Owner

attardi commented Mar 6, 2016

Please check that you are using the latest version 2.51 or later since for me it is working on that particular dump.

@YMMS
Copy link
Author

YMMS commented Mar 8, 2016

Thank you Sir.

@YMMS YMMS closed this as completed Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants