Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Chinese/Japanese/Korean/more fonts in headless Chrome binary #49

Open
adieuadieu opened this issue Jul 28, 2017 · 34 comments
Open

Include Chinese/Japanese/Korean/more fonts in headless Chrome binary #49

adieuadieu opened this issue Jul 28, 2017 · 34 comments
Assignees
Labels
Milestone

Comments

@adieuadieu
Copy link
Owner

@adieuadieu adieuadieu commented Jul 28, 2017

From downstream issue in prisma-archive/chromeless#43

Should include the relevant fonts for all scripts, not just kanji.

Resources:

@adieuadieu adieuadieu self-assigned this Jul 28, 2017
@toddwprice
Copy link

@toddwprice toddwprice commented Jul 29, 2017

@adieuadieu we solved this for a subset of languages (CJK) in our lambda function that uses phantomjs today. We were able to do it by:

  • packaging a .fonts directory with the TTF for NotoSansCJK-Regular.ttc and including that in our function upload zip.
  • adding an environment variable in our lambda function console (on AWS) called HOME set to /var/task. This allows Qt to pick up the included font.

The liability here is that due to the 50MB size limit on Lambda function packages you have to choose which fonts to be included carefully. In our case the Noto Sans font solved our issues, but I'm sure other fonts will be needed for other purposes.

I'm relaying most of this second-hand because one of my colleagues did most of the work earlier today. I'm going to try digging in further on it to make sure I got that right but that might be a possible solution for this project as well. I can try taking a crack at it next week if that would help, but I better get acclimated with this project more before doing any work. Nice work on this BTW!

@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Jul 29, 2017

@toddwprice oh that's great news! I hope Chrome looks for fonts in the same place.

About the 50MB limit: if you deploy your Lambda function with the deployment package in S3, the package’s size limit increases dramatically — technically 250MB (realistically more around 100MB when packaging less compressable data like executable binaries.) Forgive me for linking to myself: I recently wrote more about it in this article.

@toddwprice
Copy link

@toddwprice toddwprice commented Jul 29, 2017

@adieuadieu wow we were early adopters of Lambda but never questioned the 5B limit. Great article! I will see if I can include Noto Sans for starters and if that works then we could add other fonts to plug other common holes.

@toddwprice
Copy link

@toddwprice toddwprice commented Jul 29, 2017

*50MB

@toddwprice
Copy link

@toddwprice toddwprice commented Aug 4, 2017

@adieuadieu I'm trying to get going on the project but getting errors with some missing dependencies and files when running npm test. Let me know if you want me to post my errors here or ping you somewhere else. I'm using the develop branch by the way. Thanks.

@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Aug 4, 2017

@toddwprice Jump over npm test. Which folder are you working in? packages/lambda may be the best place to play around in. There's a pesudo-integration test for Serverless there which you can use to deploy to Lambda. Run npm run build in the packages/lambda, then create a symlink in package/lambda/integration-test for a dist folder which points to the parent directory's dist folder (packages/lambda/dist)

My local setup (it's not so pretty..):

marco:integration-test marco$ pwd
/Users/marco/src/github/serverless-chrome/packages/lambda/integration-test
marco:integration-test marco$ ls -lhtra
total 260800
-rwxr-xr-x   1 marco  502   127M May  9 07:14 headless_shell
lrwxr-xr-x   1 marco  502     8B Jun 18 23:21 dist -> ../dist/
-rw-r--r--   1 marco  502   463B Jul 10 18:48 serverless.yml
-rw-r--r--   1 marco  502   789B Jul 10 18:48 handler.js
drwxr-xr-x  13 marco  502   442B Jul 10 18:48 ..
drwxr-xr-x   7 marco  502   238B Jul 10 18:48 .
@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Aug 4, 2017

@toddwprice Sure—would at 18:30 CEST work? Could you DM me on Twitter or Gitter (@adieuadieu), or email (on my GitHub profile) so we can settle on a tool/service/share usernames to screen share?

@toddwprice
Copy link

@toddwprice toddwprice commented Aug 4, 2017

@toddwprice
Copy link

@toddwprice toddwprice commented Aug 10, 2017

@adieuadieu I'm struggling to get a good test running in Lambda without adding too many other dependencies. My current approach is to use chrome-remote-interface directly inside the test handler. See this file: handler.js.zip.

Two problems so far:

  1. Chrome spins up fine the first time, but fails afterwards. The logic around recognizing a running instance when a container is re-used is either not working or I've configured it wrong.

  2. Screenshots are returning a blank page. I saw this behavior in the past when testing chrome --headless with chrome-remote-interface so it's likely something I'm doing wrong there.

Any pointers you could give me to get me on track with a valid test would be appreciated.

@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Aug 10, 2017

Hi @toddwprice thanks for the update. I would not worry too much about the first problem or adding too many dependencies. I would focus on just getting fonts working correctly with blatant disregard for anything else. Once fonts work, we'd have a proof-of-concept that it's possible. We can iterate from there to make it cleaner/easier.

With that in mind.. The example handler in this repository should work for capturing a screenshot, at least on Lambda. You might need to wait for the page to load before taking the screenshot. For a simple, mostly static page without any ajax-y behavior which occurs after the DOMContentLoaded event fires, you can wait on CDPs Page.loadEventFired() Promise to resolve before doing Page.captureScreenshot().

@adieuadieu adieuadieu changed the title Include Chinese/Japanese fonts in headless Chrome binary Include Chinese/Japanese/Korean/more fonts in headless Chrome binary Aug 12, 2017
@kumorig
Copy link

@kumorig kumorig commented Aug 20, 2017

Well this is probably no help since I don't even use AWS, but I'll share anyway. I run chromeless in a docker-container side by side with knqz/chrome-headless and to that I add:

ADD https://noto-website.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip /tmp
RUN unzip /tmp/NotoSansCJKjp-hinted.zip && \
    mkdir -p /usr/share/fonts/noto && \
    cp *.otf /usr/share/fonts/noto && \
    chmod 644 -R /usr/share/fonts/noto/ && \
    fc-cache -fv

(All of noto is 120MB and only regular is 15MB) after that at least japanese works fine. You probably have all that figured already, so sorry for being noisy!

http://qiita.com/dd511805/items/dfe03c5486bf1421875a

@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Aug 23, 2017

Thank you for the tip, @kumorig!

@fd00
Copy link

@fd00 fd00 commented Sep 11, 2017

Please use it as a reference
http://fd0.hatenablog.jp/entry/2017/09/10/223042 (sorry, written in Japanese)

  • use custom fontconfig
  • use small size font
  • strip chrome binary
@nmqanh
Copy link

@nmqanh nmqanh commented Sep 24, 2017

thanks @fd00, I tried exactly as your guide in the blog but it seems not working for me, I still get a lot tofu after deployed to lambda. Not sure if I missed anything :(.
Edited: It works for font IPAexfont, but not working for Noto fonts. I managed to upload Noto fonts in the packaged, but seems like not working properly for fontconfig. (upload via S3 allowed you to deploy up to 250mb)

@nmqanh
Copy link

@nmqanh nmqanh commented Sep 25, 2017

Updated: OTF fonts from google do not work for me, but TTC from google does work well with fontconfig following the guide of @fd00 thanks a lot, mate :).
TTC fonts can be downloaded via https://github.com/googlei18n/noto-cjk

@adieuadieu adieuadieu added this to the 1.0 milestone Nov 19, 2017
@nmqanh
Copy link

@nmqanh nmqanh commented Dec 4, 2017

another update: since serverless-chrome@1.0.0-6 with headless_shell changing to headless-chromium, the method of @fd00 stopped working and the tofus are now coming back :(

@adieuadieu
Copy link
Owner Author

@adieuadieu adieuadieu commented Dec 4, 2017

@nmqanh you might just need to change the name/paths in a few steps from headless_shell to headless-chrome. For example, in the article, in the Deploy section, there is reference to CHROME_PATH pointing at headless_shell. Change this to headless-chrome.

I have an implementation of font support in progress that I'll finish sometime over the next week or two which will close this Issue.

@nmqanh
Copy link

@nmqanh nmqanh commented Dec 6, 2017

Just a quick update: I tried updating the CHROME_PATH and also re-built the font cache from step 0 as guided in the article and it does not work, tofus are still coming back with serverless-chrome@1.0.0-6 and later. Thanks for the good news that new release gonna support CJK fonts by default in 1-2 weeks :). Would love to try it soon. Please let me know if there is anything I can help.

@eggnita
Copy link

@eggnita eggnita commented Dec 13, 2017

@adieuadieu We have tried several times to add support for this, but still a dead end. Have you manage to figure it out? Can we assist with something?
Thank you for your great work.

@nmqanh
Copy link

@nmqanh nmqanh commented Jan 9, 2018

I tried many ways to bring CJK fonts back to headless but I could not :(. Was anyone here able to do that? please help me, I appreciate a lot, thanks. This only started breaking since serverless-chrome@1.0.0-6, it works fine with serverless-chrome@1.0.0-5 and lower versions.

Thanks all.

@nat-n
Copy link

@nat-n nat-n commented Jan 12, 2018

I got it to work for my own setup. I documented the process I used with a little more detail than the other blog post here: https://gist.github.com/nat-n/c3429d29f2478ccb3de243810bb12956

@nmqanh
Copy link

@nmqanh nmqanh commented Jan 22, 2018

Thanks @nat-n , it works like a charm. The main reason were that from version 1.0.0-6 the symlink failed to run, it used to work with 1.0.0-5 and lower versions .

@LeeGardiner
Copy link

@LeeGardiner LeeGardiner commented Jan 31, 2018

@nat-n I've been able to include the ipaexg font into the chromeless using this method.

Docker container created, once done rsync'd it into the chromeless path so it looks like the following...

chromeless/serverless/node_modules/@serverless-chrome/lambda/dist/fontconfig/etc/fonts

and within my serverless.yml

  name: aws
  runtime: nodejs6.10
  stage: ${self:custom.stage}
  region: eu-west-1
  environment:
    DEBUG: ${self:custom.debug}
    AWS_IOT_HOST: ${self:custom.awsIotHost}
    FONTCONFIG_PATH: /var/task/node_modules/@serverless-chrome/lambda/dist/fontconfig/etc/fonts
    LD_LIBRARY_PATH: /var/task/node_modules/@serverless-chrome/lambda/dist/fontconfig/usr/lib
@luminous8
Copy link

@luminous8 luminous8 commented Feb 4, 2018

@nat-n I tried to follow your note, but I'm missing knowledge from "Configuring fontconfig" to the end. Could you please detail more on how to do it or give links to learn what I'm missing. Thanks

@nat-n
Copy link

@nat-n nat-n commented Feb 5, 2018

@luminous8 I can try to help, but I'm not sure what you're missing. The general idea is that the fontconfig built inside the container also exists under /tmp outside the container, and so you can make some requires changes to it there, before running some commands from inside the container to complete the setup.
I've just fixed a formatting issue that might have made a part of if less clear but I'm afraid I can't make the instructions too concrete without making them to specific to a particular setup (which may be different from your own).

@arikfr
Copy link

@arikfr arikfr commented Mar 29, 2018

For anyone stumbling at this at some point in the future, I just wanted to mention that what @toddwprice did:

  • Upload fonts in a .fonts directory.
  • Set $HOME env var to /var/task.

Worked just fine without the need to build fontconfig or the other extra steps.

@abargnesi
Copy link

@abargnesi abargnesi commented Apr 3, 2018

Agree with @arikfr. Shipped the following with our λ.

$ tree -la
.
├── chromium
└── .fonts
    ├── NotoColorEmoji.ttf
    ├── NotoEmoji-Regular.ttf
    ├── NotoSansArabic-Bold.ttf
    ├── NotoSansArabic-Regular.ttf
    ├── NotoSansCJKjp-Bold.otf
    ├── NotoSansCJKjp-Regular.otf
    ├── NotoSansCJKkr-Bold.otf
    ├── NotoSansCJKkr-Regular.otf
    ├── NotoSansCJKsc-Bold.otf
    ├── NotoSansCJKsc-Regular.otf
    ├── NotoSansCJKtc-Bold.otf
    ├── NotoSansCJKtc-Regular.otf
    ├── NotoSansHebrew-Bold.ttf
    ├── NotoSansHebrew-Regular.ttf
    ├── NotoSansMongolian-Regular.ttf
    ├── NotoSansThai-Bold.ttf
    └── NotoSansThai-Regular.ttf

Unpacked as is to /var/task. After setting $HOME to /var/task we were able to confirm CJK characters rendered.

@liwaiwai
Copy link

@liwaiwai liwaiwai commented May 17, 2018

I followed @toddwprice's suggestion and it works locally but not in lambda. Then I tried *.otf file instead of *.tcc file as @abargnesi suggests, it works both locally and in lambda. The font I used is NotoSansCJKtc-Black.otf

So you may try both font files and see if any one of them works.

@NickBlow
Copy link

@NickBlow NickBlow commented Oct 11, 2018

In case this saves someone else some work.

I got fonts working by putting them in .fonts and setting HOME=/var/task, however it didn't work for me until I made the font files have permission 644 (-rw-r--r--).

@kirilledelman
Copy link

@kirilledelman kirilledelman commented Dec 3, 2018

@NickBlow, @abargnesi

I'm still unable to get these characters to render. Can somebody clarify this for me please - where does this .fonts directory go?

I added the same fonts as listed in @abargnesi 's comment, changed their perms to 644, and tried:

  1. adding this directory to `chrome/chrome-headless-lambda-linux-x64.tar.gz' archive
  2. adding this directory at top level of serverless-chrome git repo root
  3. adding it to .serverless/serverless-chrome.zip archive

and deploying, but my characters are rendered as empty space (not even squares).

Any ideas?

Thank you .

@kirilledelman
Copy link

@kirilledelman kirilledelman commented Dec 20, 2018

Bump

@abadraja
Copy link

@abadraja abadraja commented Feb 7, 2019

@kirilledelman Any updates? I'm not able to render special characters too.

@snene
Copy link

@snene snene commented May 30, 2019

@NickBlow @abargnesi I am copying the fonts to .fonts folder and also set the HOME to /var/task. But I noticed the otf fonts are NOT working using this method. They work locally but not through lambda.
On the other hand, ttf fonts work both locally and through lambda. Any pointers on what could be going wrong with otf fonts?

@moroine
Copy link

@moroine moroine commented Nov 22, 2019

I've managed to make it working with layers. @NickBlow comment is not visible enough, got a lot of troubles due to permissions...

In case this saves someone else some work.

I got fonts working by putting them in .fonts and setting HOME=/var/task, however it didn't work for me until I made the font files have permission 644 (-rw-r--r--).

I wanted to provide font for every possible using Noto. I download all unhinted versions (Using download all fonts).

Then I run the following command to extract and pick only variants I want to keep layer size as small as possible.

mkdir -p fonts/.fonts
unzip Noto-unhinted.zip
cp NotoSans*-Regular.* fonts/.fonts
cp NotoColorEmoji.ttf fonts/.fonts
chmod 644 fonts/.fonts/*
zip fonts.zip -r fonts

Then I store the file in my project under layers/fonts.zip and I use the following serverless config:

service: chrome-lambda
provider:
  name: aws
  runtime: nodejs10.x
  timeout: 30

package:
  exclude:
    - node_modules/puppeteer/.local-chromium/**
    - layers/**

functions:
  headless-chrome:
    handler: src/index.handler
    memorySize: 1600MB
    timeout: 30
    environment:
      HOME: /opt/fonts
    layers:
      - { Ref: FontsLambdaLayer }
    events:
      - http:
          path: /
          method: post

layers:
  fonts:
    package:
      artifact: layers/fonts.zip

custom:
  chrome:
    flags:
        - --hide-scrollbars
        - --ignore-certificate-errors
    functions:
        - headless-chrome

Note: Make sure to use the hinted version of Noto font!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.