New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with sections from RfA pages #218
Comments
Hi @ananth1996, The issue is basically that the entire RfA content is inside a Here's a cheap workaround: >>> code = mwparserfromhell.parse(text, skip_style_tags=True)
>>> if code:
... first = code.get(0)
... if isinstance(first, mwparserfromhell.nodes.Tag) and first.tag == 'div':
... code = first.contents
...
>>> len(code.get_sections())
9 I'll think more about a way to fix this inside the parser. |
Thank you for the workaround, it is working properly. |
I don’t think there’s a good built-in way to do that, unfortunately. You would need to do some manual node iteration. For example: for each unnested li tag, find the last wikilink to a user page or user talk page before the next li tag. Something like that might work.
… On Jun 10, 2019, at 4:06 AM, Ananth Mahadevan ***@***.***> wrote:
Thank you for the workaround, it is working properly.
I also wanted to ask if there is any way particular way to iterate through list items such as some methods in wikitextparser?. I am also looking to extract the user signature at the end of every vote and was wondering if there is a template or general regex pattern already available in some parser.
Thanks in advance.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm trying to parse the sections from RfA pages such as https://en.wikipedia.org/wiki/Wikipedia:Requests_for_adminship/7. Using the get_sections() seems to always return 1 even if I use
skip_style_tags=True
. Is there any fix for this? Thefilter_headings()
functions returns all the headings?I want to parse the Support, Oppose and Negate votes. Is there any better way to do this in python?
The text was updated successfully, but these errors were encountered: