Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

Parsing seems to fail on innocuous attributes #216

Open
technicaltitch opened this issue Apr 27, 2020 · 25 comments
Open

Parsing seems to fail on innocuous attributes #216

technicaltitch opened this issue Apr 27, 2020 · 25 comments

Comments

@technicaltitch
Copy link

technicaltitch commented Apr 27, 2020

I have a lot of files to parse, generated by pdf.js. As far as I can tell, and as far as I've found parsing them myself into a database, and rendering in Firefox and Chrome, the SVG is OK. However the majority of my files get the error:

Error: invalid attribute value at 1:29465.

Here are some examples of the characters at the associated positions:

The second 5 in fill="rgb(255,242,0)" line 1 pos 20350:

page00004.svg.zip

The t in width in stroke-width="1.5px" line 1 pos 2224:

page00008.svg.zip

Others were stroke-dashoffset="0px" and stroke-linejoin="".

All pages except one fail in this book:

Tigrigna Student Textbook - Grade 11.zip

Is there anything I can do to relax the parsing, eg, so it just ignores attributes it doesn't like rather than fails the file? I am only SAX parsing as XML but the documents seem well formed (if undoubtedly pretty complex). I can't see a pattern, but any hints on what might be going wrong so I can preprocess?

Huge thanks

@RazrFalcon
Copy link
Owner

It doesn't like empty attributes. So you should try removing them.

As for errors, the is a bug, so it prints line/column in bytes, not in characters. And your files has some multibyte characters, so it produces an invalid position.

@JoKalliauer
Copy link
Contributor

@technicaltitch

You might think of running svgomg first, that resolves the problem.

Comparision between optimizers for page00004.svg.zip
-svgo produces: SVGO.svg.txt
-scour produces: scour.svg.txt
-svgcleaner produces: svgcleaner.svg.txt

@daviddeutsch
Copy link

@RazrFalcon I found another issue and I think I've narrowed it down to this example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="250" height="250" viewbox="0 0 950 250">
  <defs>
    <clipPath id="1">
      <path style="opacity:0" d="M 0,157.57797 H 250 V -92.422028 H 0 Z" transform="translate(700,92.422028)"/>
    </clipPath>
  </defs>
        <g clip-path="url(#1)">
      </g>
</svg>

The error seems to be: Any ID that starts with a number. I understand that this is not valid XML, but perhaps the error could be a bit more helpful, here?

@RazrFalcon
Copy link
Owner

What it prints now?

@daviddeutsch
Copy link

@RazrFalcon not sure whether you were replying to my request, but the error was the same as in the OP message:

Error: invalid attribute value at 8:23.

so right at the url(#1) mark. When I change the id to #a, it works just fine.

@RazrFalcon
Copy link
Owner

RazrFalcon commented May 5, 2020

@daviddeutsch Thanks, the error is definitely obscure. And yes, id cannot start from a number. But it will be relaxed in the next release, since this is just too common.

@daviddeutsch
Copy link

@RazrFalcon Thank you. I agree that it does make sense to relax the restriction here.

And thank you for svgcleaner! I was desparately looking for a replacement for svgo which just kept breaking in a million ways and this one has been a true blessing 🙇

@JoKalliauer
Copy link
Contributor

@daviddeutsch Every svg optimizer has its advantages and disavantages, and correctness of svgs is not svgo's strength see https://github.com/RazrFalcon/svgcleaner#correctness

advantages of svgcleaner compared to svgo

  • svgo is quite buggy (but every optimizer has bugs)
  • svgcleaner is incredibly fast (few exceptions)
  • CSS support is imho a bit better (depends on the files)
  • fast support/answer on bug/feature reports (generally)
  • make workarounds for renderer, see (e.g. stroke-dasharray : comma - space #113 or https://tools.wmflabs.org/svgworkaroundbot/ or https://phabricator.wikimedia.org/T217990 )
  • option for group-by-style
  • option for resolve-use
  • imho prettier text-ident
  • generally works for text containing xml:space:"preserve"
  • throws errors&warning, if it detects something invalid/unsupported
  • more options for different values of precission

disadvantages of svgcleaner compared to svgo

But as you already highlighted, correctness might be the most important feature.

@daviddeutsch
Copy link

daviddeutsch commented May 5, 2020

@JoKalliauer Indeed. I need to process a lot of SVGs and even slight errors are completely unacceptable.

It's astonishing to me that svgo is as lossy/destructive as it is with its default settings and the options you have don't really get you to a lossless place. I've tried for about three years now and have finally given up when the (indeed, nice if it'd work) mergePaths completely destroyed a rendered text.

That it's also a lot faster is really the icing on the cake. I'm kicking myself for not finding it earlier.

Hey @RazrFalcon what can I do to help out? I mean on the project in general.

@RazrFalcon
Copy link
Owner

@daviddeutsch Nothing, really. The project has some fundamental flaws, so it has to be rewritten from scratch. It will take 2-3 months, and I simply don't have so much time at the moment.

Also, the new version will lose most of the options, since "selective cleaning" is a dead end and can't be implemented correctly. This will probably disappoint some users.

@daviddeutsch
Copy link

@RazrFalcon to be honest, the "cleaning" part of svgcleaner doesn't even interest me 😉

Sorry to hear that there is so much work to be done. It would be helpful if you could provide some details on the fundamental flaws you mentioned. If somebody else decides to fork or copy your work (perhaps in another language), knowing the problems you ran into would be a great asset.

@RazrFalcon
Copy link
Owner

The main problem is that you have to know how SVG works very-very well. I thought that I do, but then I wrote resvg and learned that I didn't know anything. There are just too many edge-cases in SVG and svgcleaner ignores most of them.

The main flaw of svgcleaner is that it tries to preserve the input XML. This is a dead end. You just can't. XML/SVG simply aren't lossless. So in a new version, the SVG structure reconstruction will be mandatory. Which will make most of the options obsolete.

Also, svgcleaner doesn't really support filters. But I guess this can be fixed without a rewrite.

@daviddeutsch
Copy link

@RazrFalcon Interesting! That brings me back to my original plan. Since I really only need a very fast svg compressor, I'd be willing to trade percentage points of compression for speed and going trough your - very excellent! - documentation, maybe it'd be easy to get most of the cleaning options down so you get, say, 90% of the "perfect" compression while staying away from those pitfalls that you mentioned.

So the plan that I might try one day is to take your documentation, take all the rules that have a meaningful impact on file size (say, everything that reduces a file by more than 20%) and implement them in something fast like luajit.

@RazrFalcon
Copy link
Owner

What do you mean by compressor?

@daviddeutsch
Copy link

@RazrFalcon minifying the SVG - throwing out everything that is unnecessary to increase download speed.

@JoKalliauer
Copy link
Contributor

@daviddeutsch. I think the save options, where you stay away from any pitfall weill be lower than 10% of the perfect optimization. CSS can break almost anything. (I don't know if there exist any optimization without any pitfall, maybe display="none".) A 100% safe optimization will have neglegtable reduction.

@daviddeutsch
Copy link

@JoKalliauer So far, I've only worked on HTML compression which, to my surprise, improves upon gzip compression with simple tricks like removing all the whitespace outside of text content.

From that experience, I would very much agree - you often only need a handful of tools to get 80 or 90% of the way to "perfect" file size reduction and those last 10-20% are simply not worth the effort, especially because you end up treading dangerous waters where you think your optimization works but you actually broke your content for some browser or use case. So not only do you have to invest time into programming a complex solution (for negligible benefit), you actually have to maintain that complexity which can be a tough lot and sometimes even a moving target.

'Choose your battles wisely', or so the say 😉

@RazrFalcon
Copy link
Owner

@daviddeutsch I see. I thought about a gzip compression.

Essentially, this is what svgc is trying to do. Remove as much as possible, without changing the way an image is rendered. Sadly, I do not have time to work on it right now.

And svgc doesn't implement a lot of things yet. Text processing doesn't exist. Filters are resaved as is. Etc.

By the way, the hardest part of an app like svgcleaner is not to clean an SVG, but to parse it correctly in the first place. You will need a very good XML (DTD, namespaces, etc.) and CSS (parser + selector) libraries. Then you have to parse SVG value types (something like Path Data or Transform list), which is not trivial. And then you have to create an SVG tree (not an XML tree). The tree itself is pretty tricky, because you have to preprocess whitespaces correctly to get a valid text (yes, xml:space and stuff). And use/xlink resolving isn't trivial too. And only then you can actually start "cleaning" an SVG.

@RazrFalcon
Copy link
Owner

@daviddeutsch Also, you can use resvg-test-suite as a test polygon. As long as you can process all those files correctly - you are pretty much done. And this test suite contains most absurd SVG you could find. Sadly, I wrote it after svgcleaner. So it would probably fail pretty badly.

@daviddeutsch
Copy link

daviddeutsch commented May 11, 2020

@RazrFalcon You are right, which is why I usually don't do parsing 😉

The full code for the html compressor isn't in any shape that I'd be comfortable to share, but it's literally just text substitution. This, for instance, is how I compress whitespace:

local html = io.read("*all")

html = string.gsub(html, "[%s\r\n]*<", "<")

html = string.gsub(html, ">[%s\r\n]+<", "><")

I haven't done any kind of scientific comparison, but I just kept adding small optimizations like that until I came near (like within 15% of) the best "perfect compression" html tools that I found and called it a day.

Thank you very much for mentioning the test suite. I'm sure it'll be a valuable asset 🙇

@daviddeutsch
Copy link

@RazrFalcon Woah, I just realized that your resvg cli tool supports basically all of the inkscape CLI options that I need - exporting individual IDs (to SVGs, even! with --dump-svg), exporting to pngs, even listing all IDs with their coordinates.

I think the only thing that is missing is explicitly deciding whether the exported SVG is cropped to the size of the element or if it

inkscape is of course very slow for those tasks, like exporting an individual ID takes 300ms - and I often run tasks where I'm exporting hundreds of those… So I was already on the lookout for an alternative.

@RazrFalcon
Copy link
Owner

RazrFalcon commented May 11, 2020

@daviddeutsch Yeap, and it has a far better SVG support.

explicitly deciding whether the exported SVG is cropped to the size of the element or if it

What do you mean by that? Automatic SVG size detection?

@daviddeutsch
Copy link

daviddeutsch commented May 11, 2020

@RazrFalcon In inkscape CLI, exporting an ID means that a 50x50px rectangle on a 200x200px viewBox will result in a 50x50px image. If you use --export-area-page, you get a 200x200px image.

In my application, I'm taking multiple input SVGs that have the same viewBox size, export some elements of each and compose those into a new file - by copying the elements into a new SVG with the same viewBox. So for that, I need to keep the coordinates as they were in the original source SVG.

In step 2, I take that composite SVG and split it up into files (some PNG, some SVG) which I position in an HTML file with the coordinates that I read out earlier (well, a percentage value so it's somewhat 'responsive'). For this step, I need cropped images where every SVG element starts at 0,0, internally.

@RazrFalcon
Copy link
Owner

Yes, right now you have to compose those images manually.

I will think about adding such an option to resvg.

@daviddeutsch
Copy link

@RazrFalcon That'd be great! Thanks for considering! 🙇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants