Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag type maching case sensitive #245

Closed
Pimmetje opened this issue Oct 13, 2018 · 5 comments
Closed

Tag type maching case sensitive #245

Pimmetje opened this issue Oct 13, 2018 · 5 comments

Comments

@Pimmetje
Copy link

When the form uses upper casing the matching would fail.

if tag.get("type") in ("radio", "checkbox"):

Suggest adding .lower()
if tag.get("type").lower() in ("radio", "checkbox"):

@moy
Copy link
Collaborator

moy commented Oct 13, 2018

Not sure the upper-case version is correct HTML, but anyway, we should be tolerant in what we accept.

Patch follows.

@Pimmetje
Copy link
Author

Never trust user input :)

I could not find any hard reference but this one comes close to what i think is right https://stackoverflow.com/a/19808575/559333

@moy
Copy link
Collaborator

moy commented Oct 13, 2018

The stackoverflow link is about tag names, but the issue here is about the attribute's content.

@Pimmetje
Copy link
Author

Your right: That asks for more research.
"Attribute names are case-insensitive, but attribute values may be case-sensitive." [1]
So i guess you could be right.
If i read [2] i get the feeling this reference is more like u have to use the same casing in the document. But i can't find a reference stating the case requirements for basic attributes. So i think it's best to assume they could be anything.

[1] http://www.htmlhelp.com/reference/html40/structure.html#attributes
[2] http://www.htmlhelp.com/reference/html40/values.html

@moy
Copy link
Collaborator

moy commented Oct 13, 2018

Anyway, "how HTML should be?" is not very important in real life for tools like MechanicalSoup. "How the HTML you're parsing is?" is the right question (somehow, well-designed websites have a proper API to avoid having to parse HTML, so they don't need MechanicalSoup ...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants