New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'maximum template recursion' error after a few hours #2
Comments
On 4/10/2015 09:00, agoyaliitk wrote:
{{Multiple sclerosis}} expands to a body {{Navbox and the template expansion procedure would keep expanding forever. I added a check on the depth of recursive expansion, similar to the one -- Beppe |
What can I do now to get past this? |
Please tell me the ID number of the article, that was printed before the Traceback, and the version of wikipedia dump you are using, so that I can investigate. |
I guess the id you are asking for would be 66512. I have again attached the error with some more detail. INFO:root:66495 Final Fantasy III |
Wikipedia dump 06-Apr-2015 22:06 11820881800 |
I get a similar error (I edited the file a bit as I need only raw text output, no titles or urls, but that should not have changed anything in the core program): File "WikiExtractor_v27s.py", line 789, in expandTemplate And in my case, it has reached 313280 articles before this error. The last article is: 945695 Canada at the 1904 Summer Olympics It is a rather interesting memory consumption that I was seeing during the execution, so I took a screentshot at some point: and the Wikipedia dump I use is: |
I fixed a few issues and I was able to process the latest Wikipedia dump. |
I will try it again. |
No luck. INFO:root:66499 Informal sector |
Processing that file on my machine required 5GB of memory. You can try reducing the maximum depth of recursion, by setting for example maxTemplateRecursionLevels = 8 If that does not help, you will have to disable templates with option --no-templates. Let me know. -- Beppe On 4/11/2015 22:39, agoyaliitk wrote:
|
I'll try. |
I have a similar problem with this article:
I had 8 GB of memory reserved for the process. |
Got the same error as cifkao. |
I have committed a new version that should fix the memory problems. |
To all of you who complained about memory or speed problems with
WikiExtractor, I released a new version that performs better and keeps a
cache of parsed templates.
I have tested it on the English Wikipedia and it runs 5 times faster,
while using 20% more memory (4GB).
There were also numerous bug fixes.
There is also a new command line option --xml that attempts to produce
HTML instead of pure text, preserving headings, lists and links.
Thank you for your patience.
…-- Beppe Attardi
|
Can you explain why this error occurs?
I used the updated version of the script uploaded yesterday.
Now it's giving this error.
Traceback (most recent call last):
File "./WikiExtractor.py", line 1797, in
main()
File "./WikiExtractor.py", line 1793, in main
process_data(input_file, args.templates, output_splitter)
File "./WikiExtractor.py", line 1621, in process_data
extract(id, title, page, output)
File "./WikiExtractor.py", line 132, in extract
text = clean(text)
File "./WikiExtractor.py", line 1256, in clean
text = expandTemplates(text)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 769, in expandTemplate
params = templateParams(parts[1:], depth)
File "./WikiExtractor.py", line 396, in templateParams
parameters = [expandTemplates(p, frame) for p in parameters]
File "./WikiExtractor.py", line 307, in expandTemplates
res += expandTemplate(text[s+2:e-2], depth+l)
File "./WikiExtractor.py", line 808, in expandTemplate
ret = expandTemplates(template, depth + 1)
File "./WikiExtractor.py", line 313, in expandTemplates
res += text[cur:]
MemoryError
The text was updated successfully, but these errors were encountered: