Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem file encoding (Umlaute) for external PlantUML diagrams #586

Open
2 of 3 tasks
wumpz opened this issue Jul 15, 2022 · 4 comments
Open
2 of 3 tasks

problem file encoding (Umlaute) for external PlantUML diagrams #586

wumpz opened this issue Jul 15, 2022 · 4 comments

Comments

@wumpz
Copy link

wumpz commented Jul 15, 2022

  • Bug report
  • Feature request
  • Question

I am not sure, if this is the right place or the asciidoctor-diagram project. So hopefully here is the right one.

My maven projects source code is / should be completely UTF-8. Now I want to build a maven site and the pages should be asciidoctor files and integrate an PlantUML diagram, which comes from a file. This diagram is generated but seems to have always the wrong encoding but the internal diagrams are correct.

So how do I tell asciidoctor, that this diagram files should be UTF-8?

What I did / tried so far:

  1. changed file.encoding while starting maven (-Dfile.encoding=UTF-8)
  2. defined project source encoding in maven
  3. defined project reporting encoding in maven
  4. different Java versions
  5. tried to configure default_external parameter, which had no effect
  6. changed defined project encodings, to get some change

BTW my environment is Windows 11, Java 8, 11, 17, Maven 3.6, 3.8.

I attached a minimal maven project (asciidoctor1.zip) . Just run site:site or look into the target directory I sent.

Look into target/site directory:

  • diag-....png is correct. It is defined using UTF-8 in overview.adoc image

  • test_class_utf8.png is wrong. It is defined using UTF-8 in test_class_utf8.puml image

  • test_class_cp1252.png is correct. It is defined using CP1252 in test_class_cp1252.puml image

So it seems that asciidoctor (diagrams) tries to always use Cp1252 for external PlantUML files, which is strange, since I already reset file encoding to UTF-8.

So what did I wrong?

@abelsromero
Copy link
Member

There's something here, but I need to setup a Windows vm, so it may take some extra time to answer.

Files should already be UTF-8, Asciidoctor does not understand other encodings, and in non-Win OSs the example just crashes when processing the cp1252 file. Why in Windows cp1252 works and utf-8 is what I need to research, we only use project.build.sourceEncoding to copy resources which you don't do in the example.

I understand that the end goal is to have all files in UTF-8 right? mixing encodings is not going to work ever.

@wumpz
Copy link
Author

wumpz commented Jul 15, 2022

Right. All should be UTF-8. I just included this cp1252 to test and got lucky. However using ISO-8859-1 works as well, same encoding at least for those characters.

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

@abelsromero
Copy link
Member

If you remove this cp1252 stuff does a non Windows machine render the utf pumls right?

Yes.
In fact non-Windows (testing MacOs now) totally crash with org.jruby.exceptions.ArgumentError: (ArgumentError) asciidoctor: FAILED: <stdin>: Failed to load AsciiDoc document - invalid byte sequence in UTF-8. That's a common thing for ppl to ask about asciidoctor, you can find several reports googling for it.

That's why I am pluzzed that you get the opposite effect and need to do research. I know Windows does not crash, but using cp1252 as default 🤔

@wumpz
Copy link
Author

wumpz commented Jul 15, 2022

Strange. This should be the same as starting java with -Dfile.encoding=UTF-8. Is there another instance of JVM started somehow in the rendering process? At the moment in windows Cp1252 is the standard encoding in Java but in Linux and MacOs its UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants