Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mw.xml_dump crashes when encountering LiquidThreads #12

Open
he7d3r opened this issue Oct 13, 2014 · 2 comments
Open

mw.xml_dump crashes when encountering LiquidThreads #12

he7d3r opened this issue Oct 13, 2014 · 2 comments

Comments

@he7d3r
Copy link
Contributor

he7d3r commented Oct 13, 2014

When I execute the code

from mw import xml_dump
import sys

def page_info(dump, path):
    for page in dump:
        yield page.id, page.namespace, page.title

for page_id, page_namespace, page_title in xml_dump.map(["ptwikibooks-20140905-pages-meta-current.xml"], page_info):
    print("\t".join([str(page_id), str(page_namespace), page_title]))

I get the following error after a few moments:

35575   0   Geometria descritiva/Introdução
Failed while processing dump 'ptwikibooks-20140905-pages-meta-current.xml': 
Traceback (most recent call last):
  File "<xml_dump>/processor.py", line 35, in run
    for out in self.process_dump(dump, path):
  File "lqt.py", line 7, in page_info
    for page in dump:
  File "<xml_dump>/iteration/iterator.py", line 112, in load_pages
    yield Page.from_element(sub_element)
  File "<xml_dump>/iteration/page.py", line 110, in from_element
    "a <page>: '{0}'".format(tag))
mw.xml_dump.errors.MalformedXML: Unexpected tag found when processing a <page>: 'DiscussionThreading'

35576   0   Geometria descritiva
Traceback (most recent call last):
  File "lqt.py", line 10, in <module>
    for page_id, page_namespace, page_title in xml_dump.map(["ptwikibooks-20140905-pages-meta-current.xml"], page_info):
  File "<xml_dump>/map.py", line 86, in map
    re_raise(error, processor)
  File "<xml_dump>/map.py", line 12, in re_raise
    raise error
mw.xml_dump.errors.MalformedXML: Unexpected tag found when processing a <page>: 'DiscussionThreading'

Trying again, I got a different error

35576   0   Geometria descritiva
Traceback (most recent call last):
  File "lqt.py", line 10, in <module>
    for page_id, page_namespace, page_title in xml_dump.map(["ptwikibooks-20140905-pages-meta-current.xml"], page_info):
  File "<xml_dump>/map.py", line 100, in map
    re_raise(error, path)
NameError: name 'path' is not defined
@he7d3r
Copy link
Contributor Author

he7d3r commented Oct 13, 2014

For the record, a similar error happens if I run the examples/xml_dump.iteration.py with the same dump from ptwikibooks:

251676
Traceback (most recent call last):
  File "lqt.py", line 21, in <module>
    for page in dump:
  File "<xml_dump>/iteration/iterator.py", line 112, in load_pages
    yield Page.from_element(sub_element)
  File "<xml_dump>/iteration/page.py", line 110, in from_element
    "a <page>: '{0}'".format(tag))
mw.xml_dump.errors.MalformedXML: Unexpected tag found when processing a <page>: 'DiscussionThreading'

@halfak halfak changed the title mw.xml_dump is not compatible with LiquidThreads mw.xml_dump crashes when encountering LiquidThreads Aug 25, 2015
@halfak
Copy link
Member

halfak commented Aug 25, 2015

This error is addressed with #39.

But we still don't support LiquidThreads, so I filed #40 to track that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants