Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another Chinese character issue #754

Closed
rkhlin opened this issue Sep 4, 2021 · 16 comments
Closed

Another Chinese character issue #754

rkhlin opened this issue Sep 4, 2021 · 16 comments
Labels

Comments

@rkhlin
Copy link

rkhlin commented Sep 4, 2021

Hi there,

I was using the Flying Saucer library before and it all went ok until I upgrade my Spring framework to v5 so that I have to look for a newer library to generate PDF with Chinese in it.

I had read quite a few posts and tried many ways, but still seeing ###. Not sure what is going on. Do I miss any step?
Please help.

My css file:
@charset "UTF-8";
@font-face {
font-family: "Arial Unicode MS";
src: url('arialuni.ttf');
}

  • {
    font-family: "Arial Unicode MS",sans-serif;
    }

HTML file:

<title>Testing</title>

每星期三
Every Wednesday

Java:
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream("E:/Temp/out.pdf")) {
PdfRendererBuilder builder = new PdfRendererBuilder();
// builder.useFont(new File("E:/Temp/arialuni.ttf"),"Arial Unicode MS");
builder.useFastMode();
builder.withUri("file:///E:/Temp/index.html");
builder.toStream(os);
builder.run();
}
}

Result:

out

@rototor
Copy link
Contributor

rototor commented Sep 5, 2021

In your Java example you are not registering a font with CJK characters.

You must register a font which contains the characters you want to display. Note: the builtin sans-serif does not support anything outside of latin1.

"Arial Unicode MS" may contain the characters you need. But it depends on the platform (i.e. what exact Windows version did you install, will not work on Linux, macOS or any other Unix). You must specify the full path to it.

You may try for example the BabelHanFont (https://www.babelstone.co.uk/Fonts/Han.html), you need the (rather big) .ttf font file (https://www.babelstone.co.uk/Fonts/Download/BabelStoneHan.ttf).

The best is to register the font in the builder:

builder.useFont(new File("/path/to/BabelStoneHan.ttf"),"BabelStoneHan");

and then reference it in CSS:

* { font-family: "BabelStoneHan",sans-serif; }

In opposite to browsers OpenHTMLtoPDF does not try to use local OS depending fonts to substitute missing glyphs. You have to provide fonts with all required characters as .ttf files. If a glyph can not be found in one font, the next fallback-font in the font-family list is tried. If no font has the required glyph you get a # sign.

If you want to reference fonts by pure CSS you can do that, you just have to give the full Classpath-URL for it, e.g.:

@font-face {
	font-family: "Noto Sans";
	src: url("/de/resolveit/ww/common/print/htmlpdf/NotoSans-Regular.ttf");
	/*noinspection CssUnknownProperty*/
	-fs-pdf-font-embed: embed;

}

I.e. in this case the font must be part of your classpath. Both of this approaches work for me, so they should also work for you. Including "Arial Unicode MS" in you project jars is a bad idea because of the license. You may better use BabelStoneHan. Or UnDotum (https://kldp.net/unfonts/).

@rkhlin
Copy link
Author

rkhlin commented Sep 6, 2021

Thanks rototor,

I downloaded BabelStoneHan.ttf you suggested.
I develop in Win 10, deploy to Linux or FreeBSD.
Run with JDK8.

I tried:
File font = new File("E:/Temp/BabelStoneHan.ttf");
builder.useFont(font, "BabelStoneHan");

Also tried CSS approach:
@font-face {
font-family: "BabelStoneHan";
src: url('BabelStoneHan.ttf');
-fs-pdf-font-embed: embed;
}

  • {
    font-family: "BabelStoneHan", sans-serif;
    }

BUT I still get ###

Console:
com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
com.openhtmltopdf.load INFO:: SAX XMLReader in use (parser): com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
com.openhtmltopdf.load INFO:: Loaded document in ~15ms
com.openhtmltopdf.load INFO:: TIME: parse stylesheets 52ms
com.openhtmltopdf.match INFO:: media = print
com.openhtmltopdf.match INFO:: Matcher created with 160 selectors
com.openhtmltopdf.general INFO:: Using fast-mode renderer. Prepare to fly.

I am wondering, can it be a library conflict issue.

My library snapshot is attached.

I tried to debug on openhtmltopdf source, but not sure which Java file is the one to decide to get the font to print or ###? Since there is no ERROR message can help me to find it.

lib

@rototor
Copy link
Contributor

rototor commented Sep 6, 2021

The classpath should be fine. When trying the CSS approach, you must put the font into your classpath, i.e. if you use maven, it should be in src/main/resources/your/domain/and/package/BableStoneHan.ttf and you would reference it with

@font-face {
font-family: "BabelStoneHan";
src: url('your/domain/and/package/BabelStoneHan.ttf');
-fs-pdf-font-embed: embed;
}
{
font-family: "BabelStoneHan", sans-serif;
}

Please check that the maven build worked correctly, i.e. that the target/classes directory contains your/domain/and/package/BabelStoneHan.ttf

Of course, replace your/domain/and/package with your package name.

@rkhlin
Copy link
Author

rkhlin commented Sep 6, 2021

When I use flying Sourcer I just put the ttf file in the same folder as the css file.
so in css I just put,
...
src: url('BabelStoneHan.ttf');
...

In openhtmptopdf not the case?
Should be the same right? Since it is CSS approach.

@rkhlin
Copy link
Author

rkhlin commented Sep 6, 2021

I can even see it take the font in and use almost 16sec.

Really weird, I can't generate the PDF.

nvt

@rototor
Copy link
Contributor

rototor commented Sep 6, 2021

No this is not the case, don't even think to use a browser to compare the output. OpenHTMLToPDF only can do a very small subset of what browsers do. And it has changed many things from flyingsaucer. Quote from the README.md:

Open HTML to PDF is a pure-Java library for rendering a reasonable subset of well-formed XML/XHTML (and even some HTML5) using CSS 2.1 (and later standards) for layout and formatting, outputting to PDF or images.

Use this library to generated nice looking PDF documents. But be aware that you can not throw modern HTML5+ at this engine and expect a great result. You must special craft the HTML document for this library and use it's extended CSS feature like #31 or #32 to get good results. Avoid floats near page breaks and use table layouts.

Without a reduced sample (i.e. make a small GitHub repository and put a compile and runnable example in) I can not really help you, as it works for me when putting the font into the class path and referencing it in url() with the package name.

@rkhlin
Copy link
Author

rkhlin commented Sep 7, 2021

This is the only Java file:

package test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class TestOpenHtml2Pdf {

public static void main(String[] args) throws Exception {
    try (OutputStream os = new FileOutputStream("E:/Temp/out.pdf")) {
        PdfRendererBuilder builder = new PdfRendererBuilder();

// File font = new File("BabelStoneHan.ttf");
// builder.useFont(font, "BabelStoneHan");
builder.useFastMode();
builder.withUri("https://newvision.net.au/app/test/testPDF.html");
builder.toStream(os);
builder.run();
}
}
}

I have it point to a Web Page you can access:

https://newvision.net.au/app/test/testPDF.html
The stylesheet it refers to:
https://newvision.net.au/app/test/test.css

@vipcxj
Copy link
Contributor

vipcxj commented Sep 8, 2021

If you have used builder.useFont with the absolute path in the code, don't spec the url in css, use font name directly.
If you don't want to use a absolue path, use can provide a FSSupplier instance to the useFont api. Just like this

                builder.useFont(new FSSupplier<InputStream>() {
                    @Override
                    public InputStream supply() {
                        return getClass().getResourceAsStream(media.getPath());
                    }
                }, media.getFamilyName());

@rkhlin
Copy link
Author

rkhlin commented Sep 9, 2021

I tried many combinations. If the useFont absolute path approach is not giving me the result. Will FSSupplier giving me result?

@danfickle
Copy link
Owner

Hi @rkhlin,

I can't open your links. Perhaps try with local files to start with. If they work, we can narrow it down to a resource loading issue.

@rkhlin
Copy link
Author

rkhlin commented Sep 10, 2021

HTML file:

<title>Testing</title>

每星期三
Every Wednesday

@rkhlin
Copy link
Author

rkhlin commented Sep 10, 2021

CSS file:
@charset "UTF-8";
@font-face {
font-family: "BabelStoneHan";
src: url('BabelStoneHan.ttf');
-fs-pdf-font-embed: embed;
}

  • {
    font-family: "BabelStoneHan", "Arial Unicode MS",sans-serif;
    }

@rkhlin
Copy link
Author

rkhlin commented Sep 10, 2021

The link normally is working, but when I am doing debugging, I may restart the server.
I also created a Maven project and tried to test more, but still ###.

@rkhlin
Copy link
Author

rkhlin commented Sep 11, 2021 via email

@danfickle
Copy link
Owner

Hi @rkhlin,

I used this simple source code and got Chinese characters in the PDF:

package com.example;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.File;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class ChineseSample {
    public static void main( String[] args ) throws IOException
    {
        try (FileOutputStream fos = new FileOutputStream("C:\\Users\\me\\Desktop\\res.pdf")) {
            PdfRendererBuilder builder = new PdfRendererBuilder();

            builder.useFont(new File("C:\\Users\\me\\Desktop\\fonts\\BabelStoneHan.ttf"),"BabelStoneHan");
            builder.toStream(fos);
            builder.useFastMode();
            builder.withHtmlContent("<body style=\"font-family: 'BabelStoneHan', sans-serif; font-size: 20px;\">Hello world! <br/> 每星期三 </body>", null);
            builder.run();
        }
    }
}

Please try it and see if it works for you. You will have to change the paths in two places (to output PDF and font file). Here is the pom.xml I used for reference:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.example</groupId>
  <artifactId>demo</artifactId>
  <version>1.0-SNAPSHOT</version>

  <name>demo</name>
  <!-- FIXME change it to the project's website -->
  <url>http://www.example.com</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>

  <dependencies>
    <dependency>
        <groupId>com.openhtmltopdf</groupId>
        <artifactId>openhtmltopdf-pdfbox</artifactId>
        <version>1.0.9</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.2.4</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>com.example.ChineseSample</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>

    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
      <plugins>
        <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
        <plugin>
          <artifactId>maven-clean-plugin</artifactId>
          <version>3.1.0</version>
        </plugin>
        <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.0</version>
        </plugin>
        <plugin>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.22.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-jar-plugin</artifactId>
          <version>3.0.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-install-plugin</artifactId>
          <version>2.5.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.8.2</version>
        </plugin>
        <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
        <plugin>
          <artifactId>maven-site-plugin</artifactId>
          <version>3.7.1</version>
        </plugin>
        <plugin>
          <artifactId>maven-project-info-reports-plugin</artifactId>
          <version>3.0.0</version>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>
</project>

Note the project.build.sourceEncoding is set to UTF-8 and your code editor should be set to save in UTF-8 as well.

@rkhlin
Copy link
Author

rkhlin commented Sep 15, 2021

Phew! Finally resolved the problem.

Place the styles or link css between ... is the key.
Then it works really well.

It is so great that openhtml2pdf is strict on HTML. A bit of logging on those style placing errors will really help others tho.
Since Chrome browser does a good job to display but may give us a false sense the HTML is valid by being too helpful.

Thanks to Dan and others. Great community to be in.

@rkhlin rkhlin closed this as completed Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants