Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with sections from RfA pages #218

Open
ananth1996 opened this issue Jun 7, 2019 · 3 comments
Open

Problem with sections from RfA pages #218

ananth1996 opened this issue Jun 7, 2019 · 3 comments

Comments

@ananth1996
Copy link

I'm trying to parse the sections from RfA pages such as https://en.wikipedia.org/wiki/Wikipedia:Requests_for_adminship/7. Using the get_sections() seems to always return 1 even if I use skip_style_tags=True . Is there any fix for this? The filter_headings() functions returns all the headings?
I want to parse the Support, Oppose and Negate votes. Is there any better way to do this in python?

@earwig
Copy link
Owner

earwig commented Jun 9, 2019

Hi @ananth1996,

The issue is basically that the entire RfA content is inside a <div> tag, and get_sections() expects headings to be nodes at the top level of the wikicode. Since all headings are inside that <div>, it considers the entire page to be one section.

Here's a cheap workaround:

>>> code = mwparserfromhell.parse(text, skip_style_tags=True)
>>> if code:
...     first = code.get(0)
...     if isinstance(first, mwparserfromhell.nodes.Tag) and first.tag == 'div':
...         code = first.contents
...
>>> len(code.get_sections())
9

I'll think more about a way to fix this inside the parser.

@ananth1996
Copy link
Author

Thank you for the workaround, it is working properly.
I also wanted to ask if there is any way particular way to iterate through list items such as some methods in wikitextparser?. I am also looking to extract the user signature at the end of every vote and was wondering if there is a template or general regex pattern already available in some parser.
Thanks in advance.

@earwig
Copy link
Owner

earwig commented Jun 10, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants