-
Notifications
You must be signed in to change notification settings - Fork 39
Enable huge_tree option for parse_xml #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Importing DFN metadata for EduGAIN fails because the provided XML document triggers a huge input lookup: Error parsing http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-edugain+idp-metadata.xml: internal error: Huge input lookup As metadata can be large quite frequently it seems sensible to enable this option by default.
|
Nice - Is there any reason to think that there are negative side-effects so we need to wrap this in a configuration setting? |
|
I am not too familiar with the security aspects of XML. I assume that an adversary that can manipulate traffic on metadata download would be easier able to cause a denial of service event with this setting enabled than without, as the metadata can only be validated after it has been parsed. (Yay XML security) I am not sure if this scenario is reason enough to warrant making this configurable. If yes, I would need a bit of guidance how I should best approach this. |
|
Wrapping this in a config should not be hard tho - I could do that after I merge the PR. |
|
@c00kiemon5ter do you have any experience with this option? |
|
quoting https://lxml.de/api/lxml.etree.XMLParser-class.html
This shouldn't be needed for what we do - normally.. The deepest node has 6 levels, so that is not an issue, but it seems that this particular feed contains In general, I don't think most people will need this. And most probably, fixing some of the logos might fix this issue for this feed as well (we need to figure out the actual limits that trigger the "Huge input lookup" error). |
|
to me this sounds like it should be wrapped in a default-off config option at the very least |
|
I agree; it's good to have that option. Having off by default to catch deeply nested nodes is great, as this shouldn't happen in such documents. Catching very big text nodes might also indicate that the feed could be further optimized. By having that option we give users the choice to consume the feed, and the action is explicit - it is done by them knowing why it is needed. |
Importing DFN metadata for EduGAIN fails because the provided XML
document triggers a huge input lookup:
Error parsing http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-edugain+idp-metadata.xml: internal error: Huge input lookup
As metadata can be large quite frequently it seems sensible to enable
this option by default.
All Submissions: