Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lenient DomXmpParser #123

Open
wants to merge 1 commit into
base: 2.0
Choose a base branch
from

Conversation

gunnar-ifp
Copy link

The XMP box library is nice, but out in the wild are PDF files that fail parsing. For example dc.create is a Bag instead of a Seq.

Ideally the parser would have a mode where it tries to read as many properties as possible by simply discarding unreadable ones. This is not good if you want to write back a PDF but if you just want to extract Metadata, such a mode would be nice. In this case this invalid dc.creator value would be dropped. This would require doing some more work.

I've seen that there is a non strict parsing mode, which I don't think should be confused with this proposed lenient mode, but as the name suggests it should be less strict. So in this mode Sequences could be read fom Bags and vice versa. I left Alt cardinality as an error because it doesn't really fit in.

Maybe in one of the modes an element that should be an array but isn't could automagically be wrapped into one...

(I also believe that a Bag could always be read from a Sequence...)

Allow to read bags as sequence and vice versa in non-strict parsing mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant