Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text node inside Template params shall strip leading and trailing whitespaces #266

Closed
DaxServer opened this issue Apr 4, 2021 · 3 comments

Comments

@DaxServer
Copy link

DaxServer commented Apr 4, 2021

Possibly related to #55 #265

{{Cite web | url= https://www.example.com | title=Example }}

Parsing and filtering the template and extracting the url from the template returns whitespace at both ends.

import mwparserfromhell
from urllib.parse import urlparse

text = '{{Cite web | url= https://www.example.com | title=Example }}'
wikicode = mwparserfromhell.parse(text)

templates = wikicode.filter_templates()
print(templates[0].get('url').value)

o = urlparse(str(templates[0].get('url').value))
print(o)

The URL is not parsed. It returns

ParseResult(scheme='', netloc='', path=' https://www.example.com ', params='', query='', fragment='')

when it should return,

ParseResult(scheme='https', netloc='www.example.com', path='', params='', query='', fragment='')
@JJMC89
Copy link

JJMC89 commented Apr 4, 2021

Strip the value instead of casting it to a string.

>>> urlparse(templates[0].get('url').value.strip())
ParseResult(scheme='https', netloc='www.example.com', path='', params='', query='', fragment='')

@lahwaacz
Copy link
Contributor

lahwaacz commented Apr 4, 2021

This is about templates, not headings, so those issues are not relevant. mwparserfromhell keeps the whitespace, because serializing the parsed object should give the original wikitext exactly, even whitespace changes should not be there. Calling .strip() when you need is just a minor user inconvenience.

@DaxServer
Copy link
Author

Thanks for your answers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants