Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError: global name 'templatePrefix' is not defined #34

Closed
ehsanasgarian opened this issue Oct 4, 2015 · 9 comments
Closed

NameError: global name 'templatePrefix' is not defined #34

ehsanasgarian opened this issue Oct 4, 2015 · 9 comments

Comments

@ehsanasgarian
Copy link

I encountered the problem after running WikiExtractor.py (with python 2.7 in Windows 8.1 x64) on an farsi wiki dump.
Can you explain why this error occurs?

python h:\wiki\WikiExtractor.py h:\wiki\fawiki-20150602-pages-articles.xml.bz2 -cb 5M -o h:\wiki\extracted --processes 1
INFO: Preprocessing 'h:\wiki\fawiki-20150602-pages-articles.xml.bz2' to collect template definitions: this may take some time.
INFO: Preprocessed 100000 pages
INFO: Preprocessed 200000 pages
INFO: Preprocessed 300000 pages
INFO: Preprocessed 400000 pages
INFO: Preprocessed 500000 pages
INFO: Preprocessed 600000 pages
INFO: Preprocessed 700000 pages
INFO: Preprocessed 800000 pages
INFO: Preprocessed 900000 pages
INFO: Preprocessed 1000000 pages
INFO: Preprocessed 1100000 pages
INFO: Preprocessed 1200000 pages
INFO: Preprocessed 1300000 pages
INFO: Preprocessed 1400000 pages
INFO: Preprocessed 1500000 pages
INFO: Preprocessed 1600000 pages
INFO: Preprocessed 1700000 pages
INFO: Preprocessed 1800000 pages
INFO: Preprocessed 1900000 pages
INFO: Preprocessed 2000000 pages
INFO: Preprocessed 2100000 pages
INFO: Preprocessed 2200000 pages
INFO: Loaded 109314 templates in 685.3s
INFO: Starting page extraction from h:\wiki\fawiki-20150602-pages-articles.xml.bz2.
INFO: Using 1 extract processes.
Process Process-2:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "h:\wiki\WikiExtractor.py", line 2427, in extract_process
Extractor(*job[:3]).extract(out) # (id, title, page)
File "h:\wiki\WikiExtractor.py", line 423, in extract
text = clean(self, text)
File "h:\wiki\WikiExtractor.py", line 1896, in clean
text = extractor.expandTemplates(text)
File "h:\wiki\WikiExtractor.py", line 479, in expandTemplates
res += wikitext[cur:s] + self.expandTemplate(wikitext[s+2:e-2])
File "h:\wiki\WikiExtractor.py", line 636, in expandTemplate
title = fullyQualifiedTemplateTitle(title)
File "h:\wiki\WikiExtractor.py", line 1121, in fullyQualifiedTemplateTitle
return templatePrefix + ucfirst(templateTitle)
NameError: global name 'templatePrefix' is not defined

@xt2357
Copy link

xt2357 commented Oct 16, 2015

I have the same problem too.

@attardi
Copy link
Owner

attardi commented Oct 16, 2015

I tested the extractor (version 2.39) on the farsi dump you mention and it works correctly on my ubuntu machine.

The variable templatePrefix is a global variable that is assigned a value obtained from this field in the siteinfo xml element:
<namespace key="10" case="first-letter">الگو</namespace>
within function load_templates.

@xt2357
Copy link

xt2357 commented Oct 16, 2015

I downloaded the source code of version 2.39 from this site, but I found that variable 'templatePrefix' has no definition in global scope, I mean every appearance of identifier 'templatePrefix' is in a function, my pycharm ide also says that there is a mistake of 'Global variable 'templatePrefix' is undefined at the module level.'

@attardi
Copy link
Owner

attardi commented Oct 16, 2015

It is declared as global wherever it is used.
That cannot be the cause of your problems, otherwise it would have failed within function load_templates, when it is first used.
Anyhow, you can try to add an assignment at the beginning of the file
templatePrefix = ''
and see if that fixes it.

@xt2357
Copy link

xt2357 commented Oct 16, 2015

I added the statement ' templatePrefix = '' ' at the beginning of the file, but I encountered a new exception as below:

INFO: Starting page extraction from zhwiki-20151002-pages-articles-multistream.xml.
INFO: Using 3 extract processes.
Process Process-1:
Traceback (most recent call last):
File "C:\Anaconda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda\lib\multiprocessing\process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "F:\wiki_dump\WikiExtractor.py", line 2431, in reduce_process
output.write(ordering_buffer.pop(next_ordinal))
File "F:\wiki_dump\WikiExtractor.py", line 2137, in write
self.reserve(len(data))
File "F:\wiki_dump\WikiExtractor.py", line 2132, in reserve
if self.file.tell() + size > self.max_file_size:
ValueError: I/O operation on closed file

it seems like the process terminated because of some unexpected reasons(does the error message : 'Process Process-1:' means that the process terminated with returned value -1?)

@attardi
Copy link
Owner

attardi commented Oct 16, 2015

You are using 3 processes and process n. 1 raises the error.
You are running under Windows: it might be that the implementation of memory buffers works differently from linux.

@xt2357
Copy link

xt2357 commented Oct 16, 2015

thanks for your advice, I'll try it in linux.

@attardi
Copy link
Owner

attardi commented Oct 16, 2015

You might try using the Python version of StringIO.
Modify the import to this:

from StringIO import StringIO

@xt2357
Copy link

xt2357 commented Oct 16, 2015

Unfortunately it doesn't work in my environment, thank you anyway :)

@attardi attardi closed this as completed Oct 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants