Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'maximum template recursion' error after a few hours #2

Closed
agoyaliitk opened this issue Apr 10, 2015 · 15 comments
Closed

'maximum template recursion' error after a few hours #2

agoyaliitk opened this issue Apr 10, 2015 · 15 comments

Comments

@agoyaliitk
Copy link

Can you explain why this error occurs?
I used the updated version of the script uploaded yesterday.
Now it's giving this error.

Traceback (most recent call last):
File "./WikiExtractor.py", line 1797, in
main()
File "./WikiExtractor.py", line 1793, in main
process_data(input_file, args.templates, output_splitter)
File "./WikiExtractor.py", line 1621, in process_data
extract(id, title, page, output)
File "./WikiExtractor.py", line 132, in extract
text = clean(text)
File "./WikiExtractor.py", line 1256, in clean
text = expandTemplates(text)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 313, in expandTemplates
res += text[cur:]
MemoryError

@attardi
Copy link
Owner

attardi commented Apr 10, 2015

On 4/10/2015 09:00, agoyaliitk wrote:

Can you explain why this error occurs?


Reply to this email directly or view it on GitHub
#2.

Because thre are template definitions that invoke themselves recursively.
In the case in question the template invocation

{{Multiple sclerosis}}

expands to a body

{{Navbox
| name = Demyelinating diseases of CNS
| title = [[Multiple sclerosis]] and other [[demyelinating disease]]s of
[[Centr
al nervous system|CNS]]([[ICD-10 Chapter VI: Diseases of the nervous
system#%28G3
5–G37%29 Demyelinating diseases of the central nervous system|G35–G37]],
[[List of
ICD-9 codes 320–359: diseases of the nervous system#Other disorders of
the cent
ral nervous system %28340–349%29|340–341]])
|bodyclass = hlist
|{{Multiple sclerosis|state=expanded}})
| titlestyle = background: Silver;
...

and the template expansion procedure would keep expanding forever.
Templates are to be considered as macros, in which recursion is not allowed.

I added a check on the depth of recursive expansion, similar to the one
used in the official code from MediaWiki, to handle these malformed
templates.

-- Beppe

@agoyaliitk
Copy link
Author

What can I do now to get past this?
It's giving the memory error.

@attardi
Copy link
Owner

attardi commented Apr 10, 2015

Please tell me the ID number of the article, that was printed before the Traceback, and the version of wikipedia dump you are using, so that I can investigate.

@agoyaliitk
Copy link
Author

I guess the id you are asking for would be 66512.
Wikipedia dump https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

I have again attached the error with some more detail.
Thanks for your help:)

INFO:root:66495 Final Fantasy III
INFO:root:66496 Hippogriff
INFO:root:66499 Informal sector
INFO:root:66505 Secrecy
INFO:root:66511 MX record
INFO:root:66512 Fern
WARNING:root:Reached max template recursion: 16
WARNING:root:Reached max template recursion: 16
Traceback (most recent call last):
File "./WikiExtractor.py", line 1797, in
main()
File "./WikiExtractor.py", line 1793, in main
process_data(input_file, args.templates, output_splitter)
File "./WikiExtractor.py", line 1621, in process_data
extract(id, title, page, output)
File "./WikiExtractor.py", line 132, in extract
text = clean(text)
File "./WikiExtractor.py", line 1256, in clean
text = expandTemplates(text)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 313, in expandTemplates
res += text[cur:]
MemoryError

@agoyaliitk
Copy link
Author

Wikipedia dump
enwiki-latest-pages-articles.xml.bz2

06-Apr-2015 22:06

11820881800

@sanja7s
Copy link

sanja7s commented Apr 10, 2015

I get a similar error (I edited the file a bit as I need only raw text output, no titles or urls, but that should not have changed anything in the core program):

File "WikiExtractor_v27s.py", line 789, in expandTemplate
params = templateParams(parts[1:], depth)
File "WikiExtractor_v27s.py", line 416, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "WikiExtractor_v27s.py", line 327, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "WikiExtractor_v27s.py", line 828, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "WikiExtractor_v27s.py", line 333, in expandTemplates
res += text[cur:]
MemoryError

And in my case, it has reached 313280 articles before this error. The last article is:

945695 Canada at the 1904 Summer Olympics

It is a rather interesting memory consumption that I was seeing during the execution, so I took a screentshot at some point:

mem_consumption_wikiextract

and the Wikipedia dump I use is:
-- 2015-03-07 Recombine articles, templates, media/file descriptions, and primary meta-pages.
-- enwiki-20150304-pages-articles.xml.bz2 10.9 GB

@attardi
Copy link
Owner

attardi commented Apr 11, 2015

I fixed a few issues and I was able to process the latest Wikipedia dump.
Processing the dump requires about 3GB of memory and runs for several hours.
I have added the option:
--no-templates
for extracting text without expanding templates, as in the previous releases of WikiExtractor.
This reduces the memory needed to about 500MB and speeds up significantly the processing, but all templates will be replaced with blanks.

@agoyaliitk
Copy link
Author

I will try it again.
Thanks

@agoyaliitk
Copy link
Author

No luck.
Still giving the same error.

INFO:root:66499 Informal sector
INFO:root:66505 Secrecy
INFO:root:66511 MX record
INFO:root:66512 Fern
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
Traceback (most recent call last):
File "./WikiExtractor.py", line 1838, in
main()
File "./WikiExtractor.py", line 1834, in main
process_data(input_file, args.templates, output_splitter)
File "./WikiExtractor.py", line 1658, in process_data
extract(id, title, page, output)
File "./WikiExtractor.py", line 154, in extract
text = clean(text)
File "./WikiExtractor.py", line 1293, in clean
text = expandTemplates(text)
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 838, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 799, in expandTemplate
params = templateParams(parts[1:], depth+1)
File "./WikiExtractor.py", line 423, in templateParams
parameters = [expandTemplates(p, depth) for p in parameters]
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 799, in expandTemplate
params = templateParams(parts[1:], depth+1)
File "./WikiExtractor.py", line 423, in templateParams
parameters = [expandTemplates(p, depth) for p in parameters]
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 838, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 338, in expandTemplates
res += text[cur:]
MemoryError

@attardi
Copy link
Owner

attardi commented Apr 12, 2015

Processing that file on my machine required 5GB of memory.
So it is possible that on your machine the memory gets exhausted.

You can try reducing the maximum depth of recursion, by setting for example

maxTemplateRecursionLevels = 8

If that does not help, you will have to disable templates with option

--no-templates.

Let me know.

-- Beppe

On 4/11/2015 22:39, agoyaliitk wrote:

No change.
Giving the same error again.

INFO:root:66499 Informal sector
INFO:root:66505 Secrecy
INFO:root:66511 MX record
INFO:root:66512 Fern
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
Traceback (most recent call last):
File "./WikiExtractor.py", line 1838, in
main()
File "./WikiExtractor.py", line 1834, in main
process_data(input_file, args.templates, output_splitter)
File "./WikiExtractor.py", line 1658, in process_data
extract(id, title, page, output)
File "./WikiExtractor.py", line 154, in extract
text = clean(text)
File "./WikiExtractor.py", line 1293, in clean
text = expandTemplates(text)
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 838, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 799, in expandTemplate
params = templateParams(parts[1:], depth+1)
File "./WikiExtractor.py", line 423, in templateParams
parameters = [expandTemplates(p, depth) for p in parameters]
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 799, in expandTemplate
params = templateParams(parts[1:], depth+1)
File "./WikiExtractor.py", line 423, in templateParams
parameters = [expandTemplates(p, depth) for p in parameters]
File "./WikiExtractor.py", line 331, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 838, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 338, in expandTemplates
res += text[cur:]
MemoryError


Reply to this email directly or view it on GitHub
#2 (comment).

@agoyaliitk
Copy link
Author

I'll try.
thanks

@cifkao
Copy link

cifkao commented Apr 12, 2015

I have a similar problem with this article: INFO:root:1908699 Lepospondyli. It takes a lot more time than other articles and then I get this output:

INFO:root:1908699       Lepospondyli
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
WARNING:root:Max template recursion exceeded!
WARNING:root:Skipping page with empty title
Traceback (most recent call last):
  File "wikiextractor/WikiExtractor.py", line 1838, in <module>
    main()
  File "wikiextractor/WikiExtractor.py", line 1834, in main
    process_data(input_file, args.templates, output_splitter)
  File "wikiextractor/WikiExtractor.py", line 1658, in process_data
    extract(id, title, page, output)
  File "wikiextractor/WikiExtractor.py", line 154, in extract
    text = clean(text)
  File "wikiextractor/WikiExtractor.py", line 1293, in clean
    text = expandTemplates(text)
  File "wikiextractor/WikiExtractor.py", line 331, in expandTemplates
    res += expandTemplate(text[s+2:e-2], depth+l)
  File "wikiextractor/WikiExtractor.py", line 799, in expandTemplate
    params = templateParams(parts[1:], depth+1)
  File "wikiextractor/WikiExtractor.py", line 423, in templateParams
    parameters = [expandTemplates(p, depth) for p in parameters]
  File "wikiextractor/WikiExtractor.py", line 331, in expandTemplates
    res += expandTemplate(text[s+2:e-2], depth+l)
  File "wikiextractor/WikiExtractor.py", line 799, in expandTemplate
    params = templateParams(parts[1:], depth+1)
  File "wikiextractor/WikiExtractor.py", line 423, in templateParams
    parameters = [expandTemplates(p, depth) for p in parameters]
  File "wikiextractor/WikiExtractor.py", line 331, in expandTemplates
    res += expandTemplate(text[s+2:e-2], depth+l)
  File "wikiextractor/WikiExtractor.py", line 838, in expandTemplate
    ret = expandTemplates(template, depth + 1)
  File "wikiextractor/WikiExtractor.py", line 338, in expandTemplates
    res += text[cur:]
MemoryError

I had 8 GB of memory reserved for the process.

@agoyaliitk
Copy link
Author

Got the same error as cifkao.
Memory error after article no. 1908699 Lepospondyli

@attardi
Copy link
Owner

attardi commented Apr 15, 2015

I have committed a new version that should fix the memory problems.
I completely revised the strategy of parameter evaluation.
For example, in article n. 3616279 Arthrodira, there was a very deep dendogram whose expansion was exponential on depth.
Now parameters are expanded before substitution and this solves the problem.
Please try it.

@attardi attardi closed this as completed Apr 15, 2015
@attardi
Copy link
Owner

attardi commented Apr 21, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants