Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of external dependencies #841

Open
mojimi opened this issue Mar 29, 2019 · 18 comments
Open

Get rid of external dependencies #841

mojimi opened this issue Mar 29, 2019 · 18 comments
Labels
documentation Problems or improvements needed on the documentation or on the website

Comments

@mojimi
Copy link

mojimi commented Mar 29, 2019

I'm just opening this issue to have a discussion on it. And because it doesn't exist 😏

WeasyPrint is amazing and heavily differentiates from any other pdf creating library since it implemented its own css engine, but at the moment all the dependencies are the biggest downside in my view.

All the libraries it requires can be a hassle when building cross-platform, I've had several issues deploying to AWS services and testing on Windows.

Discussions to consider :

  1. Is it feasible to think about removing the dependencies in some near feature?
  2. Do you guys plan on ever doing it or there would need to be some type of investment/external incentive?
  3. Which of the dependencies would be the harder to redo in python?
  4. What are the biggest challenges here?

*Btw I'm only talking about external libs like Pango/Cairo/GDK

@mojimi mojimi changed the title Get rid of dependencies Get rid of external dependencies Mar 29, 2019
@MindFluid
Copy link

You could set up a docker container that can be easily deployed to other instances.

@liZe
Copy link
Member

liZe commented Apr 1, 2019

Hello,

I love removing external dependencies. Using something different from cairo has already been discussed in #342 for example. Really, I'd love to.

I can imagine rewriting cairo, or at least imagine generating PDF files out of simple drawing operations. I would love to, and maybe one day will. But there's one big problem: text.

Drawing text is not difficult. It's not really difficult. It's a nightmare. Well, it's actually never-ending nightmares in a never-ending night. You can even call that "hell" if you want.

So… Maybe one day I'll drop Pango and use HarfBuzz instead (it means rewriting the whole line-breaking algorithm in WeasyPrint, that's already frightening). But I can't even imagine not relying on HarfBuzz. And I'm not the only one:

HarfBuzz is used in Android, Chrome, ChromeOS, Firefox, GNOME, GTK+, KDE, LibreOffice, OpenJDK, PlayStation, Qt, XeTeX, and other places.

Bad news: Pango is not the only library WeasyPrint relies on to render text. It also relies on Fontconfig to find and configure fonts, and FreeType to render TrueType fonts. In case you're wondering: it's really painful too.

TL;DR: Replacing cairo and Pango can be done with a lot of work. Getting rid of all the non-Python external dependencies is nothing more than an illusion.

You could set up a docker container that can be easily deployed to other instances.

Yes. Providing Snap or Flatpak packages could be another solution.

@mojimi
Copy link
Author

mojimi commented Apr 1, 2019

I guess more a more batteries-ready documentation could also be a decent solution.

As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.

@liZe liZe added the documentation Problems or improvements needed on the documentation or on the website label Apr 12, 2019
@pperona
Copy link

pperona commented Apr 13, 2019

Here is an example of what can go wrong:

In [2]: import jinja2                                                           
 
In [3]: import weasyprint                                                       
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-4d0739b75804> in <module>
----> 1 import weasyprint
 
~/anaconda3/lib/python3.6/site-packages/weasyprint/__init__.py in <module>
    439 
    440 # Work around circular imports.
--> 441 from .css import preprocess_stylesheet  # noqa isort:skip
    442 from .html import (  # noqa isort:skip
    443     HTML5_UA_STYLESHEET, HTML5_PH_STYLESHEET, find_base_url, get_html_metadata)
 
~/anaconda3/lib/python3.6/site-packages/weasyprint/css/__init__.py in <module>
     28 from ..logger import LOGGER, PROGRESS_LOGGER
     29 from ..urls import URLFetchingError, get_url_attribute, url_join
---> 30 from . import computed_values, media_queries
     31 from .properties import INHERITED, INITIAL_NOT_COMPUTED, INITIAL_VALUES
     32 from .utils import remove_whitespace
 
~/anaconda3/lib/python3.6/site-packages/weasyprint/css/computed_values.py in <module>
     15 from tinycss2.color3 import parse_color
     16 
---> 17 from .. import text
     18 from ..logger import LOGGER
     19 from ..urls import get_link_attribute
 
~/anaconda3/lib/python3.6/site-packages/weasyprint/text.py in <module>
     12 import re
     13 
---> 14 import cairocffi as cairo
     15 import cffi
     16 import pyphen
 
~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in <module>
     37 
     38 
---> 39 cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
     40 
     41 
 
~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in dlopen(ffi, *names)
     34             except OSError:
     35                 pass
---> 36     raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
     37 
     38 
 
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2
 
In [4]: quit 

@liZe
Copy link
Member

liZe commented Apr 13, 2019

Here is an example of what can go wrong:

There's no need to convince anyone that things may go wrong when you have external dependencies (moreover when you don't follow the installation guide and try to use Anaconda, but that's another story 😉). We'd really like to find a solution about this, but without a port of Pango and Cairo (and Fontconfig and Freetype and GDK-Pixbuf and …) in Python, the only solutions we have are a better documentation or packaged distributions of WeasyPrint.

As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.

Yes, I've done my best to have a pretty good documentation for many Linux distributions, @Tontyna has really improved both the code and the documentation about installation on Windows, but we can't cover all the cases needed by users, and we have to rely on everybody's work for that. I'd be really happy to merge pull requests adding more documentation about Docker images, AWS and Anaconda ❤️.

@pperona
Copy link

pperona commented Apr 14, 2019 via email

@stuaxo
Copy link

stuaxo commented Dec 4, 2019

A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.

@liZe
Copy link
Member

liZe commented Dec 5, 2019

A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.

CairoCFFI requires a file generation step during its installation, and I think that the generated file depends on the version of Cairo installed on the system. I'd be happy to discuss this on a separate issue for CairoCFFI.

@liZe
Copy link
Member

liZe commented Jan 16, 2021

Cairo is now gone :).

@liZe
Copy link
Member

liZe commented Feb 5, 2021

It’s summary time about non-Python dependencies!

We’ve removed direct dependencies:

  • Cairo
  • GDK-Pixbuf

We use these libraries as direct dependencies:

  • Pango (high-level text layout)
  • Harfbuzz (text shaping)
  • GObject (management of Pango objects references)
  • Fontconfig (font matching and substitution, for Pango)
  • Pango-Freetype (font rendering, for Pango)

We use these libraries as indirect dependencies (not exhaustive):

  • Freetype (font rendering)
  • Fontconfig (font matching and substitution)
  • Fribidi (bidirectional algorithm)

Removing Pango is the next step, because it’s too limited to render HTML+CSS text. But it’s really useful now: it’s used to split lines, with a lot of workarounds because of its limits. Removing it would require to directly use Harfbuzz for text shaping (as many other browsers do) and manually handle bidirectional text (or use a Python library for that).

Having only Harfbuzz as a direct dependency could be possible. It’s available with pygobject, which probably more reliable than our current code. There are also Python bindings with wheels for major OSes, so we can imagine a full WeasyPrint installation using only pip.

Of course: we’ll see that in the future. Not now.

@stuaxo
Copy link

stuaxo commented Feb 5, 2021

This is cool, it would be amazing if the library that you replace Pango with is it's own project, there are definitely other python text/graphics projects that would benefit.

@DidierLoiseau
Copy link

@liZe:

We’ve removed direct dependencies:

  • Cairo
  • GTK-Pixbuf

Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.

(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)

@liZe
Copy link
Member

liZe commented Feb 9, 2022

Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.

We ask users to install GTK with the installer because it’s the easiest way to get Pango, Harfbuzz and Fontconfig (and others) installed on Windows. So, GTK is technically not needed (and it has never been, GDK-Pixbuf is separated from GTK), but it’s in the documentation because it’s the easy way to get everything installed.

(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)

I’ve just tested a couple of days ago to install WeasyPrint on a fresh Windows 11 VM, and it was just:

  • install Python 3.10 from Microsoft Store,
  • install GTK with the default options (ie. click "Next" until it’s installed),
  • create a virtual environment and activate it,
  • pip install weasyprint.

All the Windows problems come from different versions of libraries installed elsewhere on the system, or from different package managers that install broken libraries for some reason. If the problem is different, please open a new issue 😀.

@vaughnkoch
Copy link

Hi, thanks for this beautiful and useful library.

Is there a good way to install Weasyprint and its dependencies to reduce the total image size in Docker? It seems that including Weasyprint adds about 500mb to my Debian Docker image.

@liZe
Copy link
Member

liZe commented Aug 31, 2023

Hi @vaughnkoch,

If you don’t care too much about performance and just want to use WeasyPrint as a binary, you can try to test this binary (⚠️ that’s not officially supported yet!)

Otherwise, you can test other distributions that may include less optional dependencies. But I’m curious, and 500MB seems to be a lot: which packages do you include in this size, does it include Python?

@vaughnkoch
Copy link

The total size using the recommended installs (e.g. libgtk) was more like 1GB, up from a previous non-weasyprint image size of 460mb, which includes python and many other packages.

However, I was able to get the additional size down to just 80mb (not great but not that much worse), by just using this in
my Dockerfile, and adding weasyprint to my Pipfile. Total image size of 460MB.

RUN apt-get update -y && \
    apt-get -y install \
    libpango-1.0-0 \
    pangoft2-1.0-0 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

@liZe
Copy link
Member

liZe commented Aug 31, 2023

That’s pretty much what’s proposed in the documentation. Installing GTK is only recommended for Windows.

@vaughnkoch
Copy link

Ah, I see now. Sorry I missed that. I think I initially saw the 'missing gobject' error (from lack of Pango installation), did some googling, and probably was lead to install GTK from a StackOverflow entry or similar. Thanks for the confirmation that this is the right way to install weasyprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Problems or improvements needed on the documentation or on the website
Projects
None yet
Development

No branches or pull requests

7 participants