NameError: global name 'templatePrefix' is not defined #34

ehsanasgarian · 2015-10-04T03:38:16Z

I encountered the problem after running WikiExtractor.py (with python 2.7 in Windows 8.1 x64) on an farsi wiki dump.
Can you explain why this error occurs?

python h:\wiki\WikiExtractor.py h:\wiki\fawiki-20150602-pages-articles.xml.bz2 -cb 5M -o h:\wiki\extracted --processes 1
INFO: Preprocessing 'h:\wiki\fawiki-20150602-pages-articles.xml.bz2' to collect template definitions: this may take some time.
INFO: Preprocessed 100000 pages
INFO: Preprocessed 200000 pages
INFO: Preprocessed 300000 pages
INFO: Preprocessed 400000 pages
INFO: Preprocessed 500000 pages
INFO: Preprocessed 600000 pages
INFO: Preprocessed 700000 pages
INFO: Preprocessed 800000 pages
INFO: Preprocessed 900000 pages
INFO: Preprocessed 1000000 pages
INFO: Preprocessed 1100000 pages
INFO: Preprocessed 1200000 pages
INFO: Preprocessed 1300000 pages
INFO: Preprocessed 1400000 pages
INFO: Preprocessed 1500000 pages
INFO: Preprocessed 1600000 pages
INFO: Preprocessed 1700000 pages
INFO: Preprocessed 1800000 pages
INFO: Preprocessed 1900000 pages
INFO: Preprocessed 2000000 pages
INFO: Preprocessed 2100000 pages
INFO: Preprocessed 2200000 pages
INFO: Loaded 109314 templates in 685.3s
INFO: Starting page extraction from h:\wiki\fawiki-20150602-pages-articles.xml.bz2.
INFO: Using 1 extract processes.
Process Process-2:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "h:\wiki\WikiExtractor.py", line 2427, in extract_process
Extractor(*job[:3]).extract(out) # (id, title, page)
File "h:\wiki\WikiExtractor.py", line 423, in extract
text = clean(self, text)
File "h:\wiki\WikiExtractor.py", line 1896, in clean
text = extractor.expandTemplates(text)
File "h:\wiki\WikiExtractor.py", line 479, in expandTemplates
res += wikitext[cur:s] + self.expandTemplate(wikitext[s+2:e-2])
File "h:\wiki\WikiExtractor.py", line 636, in expandTemplate
title = fullyQualifiedTemplateTitle(title)
File "h:\wiki\WikiExtractor.py", line 1121, in fullyQualifiedTemplateTitle
return templatePrefix + ucfirst(templateTitle)
NameError: global name 'templatePrefix' is not defined

xt2357 · 2015-10-16T05:16:49Z

I have the same problem too.

attardi · 2015-10-16T09:58:35Z

I tested the extractor (version 2.39) on the farsi dump you mention and it works correctly on my ubuntu machine.

The variable templatePrefix is a global variable that is assigned a value obtained from this field in the siteinfo xml element:
<namespace key="10" case="first-letter">الگو</namespace>
within function load_templates.

xt2357 · 2015-10-16T12:05:56Z

I downloaded the source code of version 2.39 from this site, but I found that variable 'templatePrefix' has no definition in global scope, I mean every appearance of identifier 'templatePrefix' is in a function, my pycharm ide also says that there is a mistake of 'Global variable 'templatePrefix' is undefined at the module level.'

attardi · 2015-10-16T13:08:08Z

It is declared as global wherever it is used.
That cannot be the cause of your problems, otherwise it would have failed within function load_templates, when it is first used.
Anyhow, you can try to add an assignment at the beginning of the file
templatePrefix = ''
and see if that fixes it.

xt2357 · 2015-10-16T13:29:13Z

I added the statement ' templatePrefix = '' ' at the beginning of the file, but I encountered a new exception as below:

INFO: Starting page extraction from zhwiki-20151002-pages-articles-multistream.xml.
INFO: Using 3 extract processes.
Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda\lib\multiprocessing\process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "F:\wiki_dump\WikiExtractor.py", line 2431, in reduce_process
output.write(ordering_buffer.pop(next_ordinal))
File "F:\wiki_dump\WikiExtractor.py", line 2137, in write
self.reserve(len(data))
File "F:\wiki_dump\WikiExtractor.py", line 2132, in reserve
if self.file.tell() + size > self.max_file_size:
ValueError: I/O operation on closed file

it seems like the process terminated because of some unexpected reasons(does the error message : 'Process Process-1:' means that the process terminated with returned value -1?)

attardi · 2015-10-16T13:55:45Z

You are using 3 processes and process n. 1 raises the error.
You are running under Windows: it might be that the implementation of memory buffers works differently from linux.

xt2357 · 2015-10-16T14:19:42Z

thanks for your advice, I'll try it in linux.

attardi · 2015-10-16T14:21:08Z

You might try using the Python version of StringIO.
Modify the import to this:

from StringIO import StringIO

xt2357 · 2015-10-16T14:37:45Z

Unfortunately it doesn't work in my environment, thank you anyway :)

attardi closed this as completed Oct 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NameError: global name 'templatePrefix' is not defined #34

NameError: global name 'templatePrefix' is not defined #34

ehsanasgarian commented Oct 4, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

NameError: global name 'templatePrefix' is not defined #34

NameError: global name 'templatePrefix' is not defined #34

Comments

ehsanasgarian commented Oct 4, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015

attardi commented Oct 16, 2015

xt2357 commented Oct 16, 2015