Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopped making progress #60

Closed
ghost opened this issue Apr 25, 2016 · 8 comments
Closed

Stopped making progress #60

ghost opened this issue Apr 25, 2016 · 8 comments

Comments

@ghost
Copy link

ghost commented Apr 25, 2016

I am processing the dump of 20160305.

The script ran for about 20 hours and then just stopped making any further progress. I saw two Unix processes but they were both sleeping.

The last few outputs were:

WARNING: Template errors in article 'Shawn Matthias' (15299966): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Rainey Street Historic District (Austin, Texas)' (15301930): title(0) recursion(116, 0, 0)
WARNING: Template errors in article 'Alfred Neuland' (15304281): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Humberto Mariles' (15305453): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Rubén Uriza' (15305737): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Santiago Ramírez' (15306967): title(2) recursion(0, 0, 0)
@attardi
Copy link
Owner

attardi commented Apr 25, 2016

This looks like a bug in an earlier version that has been fixed.
Which version are you running?

On 25 apr 2016, at 07:12, Graham Wheeler notifications@github.com wrote:

I am processing the dump of 20160305.

The script ran for about 20 hours and then just stopped making any further progress. I saw two Unix processes but they were both sleeping.

The last few outputs were:

WARNING: Template errors in article 'Shawn Matthias' (15299966): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Rainey Street Historic District (Austin, Texas)' (15301930): title(0) recursion(116, 0, 0)
WARNING: Template errors in article 'Alfred Neuland' (15304281): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Humberto Mariles' (15305453): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Rubén Uriza' (15305737): title(2) recursion(0, 0, 0)
WARNING: Template errors in article 'Santiago Ramírez' (15306967): title(2) recursion(0, 0, 0)

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub #60

@ghost
Copy link
Author

ghost commented Apr 25, 2016

I pulled from github; the copy I pulled has a changelog with most recent entry dated 2016-03-23, and a version 2.42 in setup.py. So latest it seems.

@attardi
Copy link
Owner

attardi commented Apr 25, 2016

Try running with a single process using option:

—processes 1

On 25 apr 2016, at 16:11, Graham Wheeler notifications@github.com wrote:

I pulled from github; the copy I pulled has a changelog with most recent entry dated 2016-03-23, and a version 2.42 in setup.py


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #60 (comment)

@attardi
Copy link
Owner

attardi commented Apr 25, 2016

Which dump are you using?

On 25 apr 2016, at 16:11, Graham Wheeler notifications@github.com wrote:

I pulled from github; the copy I pulled has a changelog with most recent entry dated 2016-03-23, and a version 2.42 in setup.py


You are receiving this because you commented.
Reply to this email directly or view it on GitHub #60 (comment)

@ghost
Copy link
Author

ghost commented Apr 26, 2016

The full English dump. Its about 55GB.

It would be great to have some kind of resume in cases like this so I don't
have to throw away many hours of compute time.

On Monday, April 25, 2016, Giuseppe Attardi notifications@github.com
wrote:

Which dump are you using?

On 25 apr 2016, at 16:11, Graham Wheeler <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I pulled from github; the copy I pulled has a changelog with most recent
entry dated 2016-03-23, and a version 2.42 in setup.py


You are receiving this because you commented.
Reply to this email directly or view it on GitHub <
https://github.com/attardi/wikiextractor/issues/60#issuecomment-214350726>


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#60 (comment)

@ghost
Copy link
Author

ghost commented Apr 27, 2016

I restarted the job about 36 hours ago and its progressed beyond that point. Not sure what happened.

@benjaminderei
Copy link

benjaminderei commented May 7, 2016

Same problem on latest wikiextractor with latest french dump !

WARNING: Template errors in article 'Canadair CL-215' (795038): title(1) recursion(0, 0, 0)
WARNING: Template errors in article 'Douglas DC-8' (795419): title(1) recursion(0, 0, 0)
WARNING: Template errors in article 'Douglas DC-7' (795676): title(1) recursion(0, 0, 0)
WARNING: Template errors in article 'Wikipédia:Bulletin des administrateurs/2006/Semaine 22' (797869): title(0) recursion(142, 0, 0)

and when i kill the task:
ERROR: Processing page: 767718 Wikipédia:Le Bistro/24 octobre 2006
the link to the blocking article: https://fr.wikipedia.org/wiki/Wikipédia:Le_Bistro/24_octobre_2006

@benjaminderei
Copy link

Update: I have relaunched the process with only one thread and it seems to continue further.

@attardi attardi closed this as completed Aug 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants