Skip to content

Conversation

@cfra
Copy link
Contributor

@cfra cfra commented Mar 23, 2021

Importing DFN metadata for EduGAIN fails because the provided XML
document triggers a huge input lookup:

Error parsing http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-edugain+idp-metadata.xml: internal error: Huge input lookup

As metadata can be large quite frequently it seems sensible to enable
this option by default.

All Submissions:

  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what problem you are trying to solve with this PR?
  • Have you added information on what your changes do and why you chose this as your solution?
  • Have you written new tests for your changes?
  • Does your submission pass tests?
  • This project follows PEP8 style guide. Have you run your code against the 'flake8' linter?

Importing DFN metadata for EduGAIN fails because the provided XML
document triggers a huge input lookup:

   Error parsing http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-edugain+idp-metadata.xml: internal error: Huge input lookup

As metadata can be large quite frequently it seems sensible to enable
this option by default.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 68.381% when pulling 1d8f4c8 on cfra:feature/huge-trees into ea26ca4 on IdentityPython:master.

@leifj
Copy link
Contributor

leifj commented Mar 23, 2021

Nice - Is there any reason to think that there are negative side-effects so we need to wrap this in a configuration setting?

@cfra
Copy link
Contributor Author

cfra commented Mar 25, 2021

I am not too familiar with the security aspects of XML.

I assume that an adversary that can manipulate traffic on metadata download would be easier able to cause a denial of service event with this setting enabled than without, as the metadata can only be validated after it has been parsed. (Yay XML security)

I am not sure if this scenario is reason enough to warrant making this configurable.

If yes, I would need a bit of guidance how I should best approach this.

@leifj
Copy link
Contributor

leifj commented Mar 25, 2021

Wrapping this in a config should not be hard tho - I could do that after I merge the PR.

@leifj leifj requested a review from c00kiemon5ter March 25, 2021 21:25
@leifj
Copy link
Contributor

leifj commented Mar 25, 2021

@c00kiemon5ter do you have any experience with this option?

@c00kiemon5ter
Copy link
Member

c00kiemon5ter commented Mar 25, 2021

quoting https://lxml.de/api/lxml.etree.XMLParser-class.html

huge_tree - disable security restrictions and support very deep trees and very long text content (only affects libxml2 2.7+)

This shouldn't be needed for what we do - normally..

The deepest node has 6 levels, so that is not an issue, but it seems that this particular feed contains mdui:UIInfo/mdui:Logo elements that contain serialized images - data:image/png;base64,... - some of which blow up the text-node size - ie, see the logo for <md:EntityDescriptor entityID="https://idp.ung.si/idp/20111102">.

In general, I don't think most people will need this. And most probably, fixing some of the logos might fix this issue for this feed as well (we need to figure out the actual limits that trigger the "Huge input lookup" error).

@leifj
Copy link
Contributor

leifj commented Mar 26, 2021

to me this sounds like it should be wrapped in a default-off config option at the very least

@c00kiemon5ter
Copy link
Member

I agree; it's good to have that option. Having off by default to catch deeply nested nodes is great, as this shouldn't happen in such documents. Catching very big text nodes might also indicate that the feed could be further optimized. By having that option we give users the choice to consume the feed, and the action is explicit - it is done by them knowing why it is needed.

@leifj leifj merged commit 0458423 into IdentityPython:master Mar 30, 2021
@leifj
Copy link
Contributor

leifj commented Mar 30, 2021

Fixed - a config option is in #210 4357c4b

@cfra cfra deleted the feature/huge-trees branch March 30, 2021 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants