Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global name 'extract' is not defined #5

Closed
cifkao opened this issue Apr 12, 2015 · 3 comments
Closed

global name 'extract' is not defined #5

cifkao opened this issue Apr 12, 2015 · 3 comments

Comments

@cifkao
Copy link

cifkao commented Apr 12, 2015

The multithreaded version doesn't work with the --article option. WikiExtractor.py:1831 causes NameError: global name 'extract' is not defined.

@zimmeee
Copy link

zimmeee commented Apr 13, 2015

Does the single-threaded version work with the --article option? If so, how do you invoke it?

python WikiExtractor.py -a /Wikipedia/Train.xml --threads 1

Returns the same extract undefined error

@attardi
Copy link
Owner

attardi commented Apr 13, 2015

I have a fix for this problem, but I cannot commit at this time, since there are a number of other fixes coming.
The change for this is to replace the call to extract() with:
Extractor(id, title, [page]).extract()
and then change the method extract to this:
def extract(self, out=sys.stdout):
text = ''.join(self.page)
url = get_url(self.id)
header = '\n' % (self.id, url, self.title)
# Separate header from text with a newline.
header += self.title + '\n\n'
header = header.encode('utf-8')
text = clean(text)
footer = "\n\n"
if out != sys.stdout:
out.reserve(len(header) + len(text) + len(footer))
out.write(header)
for line in compact(text):
out.write(line.encode('utf-8'))
out.write('\n')
out.write(footer)

@attardi
Copy link
Owner

attardi commented Apr 15, 2015

Fixed in latest commit.

@attardi attardi closed this as completed Apr 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants