Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF on Chromium #9343

Closed
15 tasks done
yufeih opened this issue Oct 26, 2023 · 12 comments
Closed
15 tasks done

PDF on Chromium #9343

yufeih opened this issue Oct 26, 2023 · 12 comments
Labels
pdf Produce PDF as the output format
Milestone

Comments

@yufeih
Copy link
Contributor

yufeih commented Oct 26, 2023

The wkhtmltopdf based PDF solution is flawed in many areas:

This is a new PDF solution based on Chromium, by printing PDF files out of HTML pages using the HTML template for ordinary build.

@yufeih yufeih added the pdf Produce PDF as the output format label Oct 26, 2023
@yufeih yufeih added this to the Working Set milestone Oct 26, 2023
@paulushub
Copy link

paulushub commented Oct 26, 2023

About wkhtmltopdf: This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

Edited: (wanted to verify it myself): Is the Playwright packaged with the releases?

@yufeih
Copy link
Contributor Author

yufeih commented Oct 26, 2023

Playwright isn't packaged today, it is only for testing. If packaged, it would increase the size of published zip releases of each platform from ~60MB to ~100MB, and increase the size of the docfx tool package to 300MB due to redistribution of NodeJS runtime for many platforms.

Dependency on NodeJS isn't a bad thing as we probably need it for #661 and to replace JINT. We can probably set NodeJS as a pre-requisite instead of redistributing it if size is a concern.

@paulushub
Copy link

Dependency on NodeJS isn't a bad thing as we probably need it for #661 and to replace JINT.

Actually waiting for this for my solution. The release v2.72.1 does not include the usual binary. Is this intentional?

@yufeih
Copy link
Contributor Author

yufeih commented Oct 26, 2023

Dependency on NodeJS isn't a bad thing as we probably need it for #661 and to replace JINT.

Actually waiting for this for my solution. The release v2.72.1 does not include the usual binary. Is this intentional?

No, there are some problems with NuGet.org's signing key setup causing publish failures that I need to consult their support. Obviously, they work on a different time-zone so it could take a while.

@filzrev
Copy link
Contributor

filzrev commented Oct 26, 2023

If packaged, it would increase the size of published zip releases of each platform from ~60MB to ~100MB, and increase the size of the docfx tool package to 300MB due to redistribution of NodeJS runtime for many platforms.

I'm using Docfx.App NuGet package and it's undesirable to increasing package size.
(Though It might be possible to excluding playwright assets via PackageReference)

Is is hard to separate PDF generation functionality to separate docfx-pdf.exe process.
And docfx.exe's docfx pdf command operations are delegated to that process like dotnet-tools.

@yufeih
Copy link
Contributor Author

yufeih commented Oct 26, 2023

It only affect the size of the docfx package not the size of Docfx.App.

@filzrev
Copy link
Contributor

filzrev commented Oct 27, 2023

It only affect the size of the docfx package not the size of Docfx.App.
I've intended output bin size that referencing Docfx.App NuGet package.
It increase bin size about ~60MB to ~100MB as noted above.

It's desirable to use pre-installed node runtime.
(It's planned at microsoft/playwright-dotnet#1850)

Additionally latest docfx main branch build consume twice as much disk space as before. (about 5GB increased)
Because .playwright resources are separately included by following output combinations.

  • Projects
    • docfx
    • docfx.App
    • docfx.Tests
    • docfx.Snapshot.Tests
  • TargetFrameworks
    • net6.0
    • net7.0
    • net8.0
  • Configuration
    • Debug
    • Release

Note:
It might not be a problem on Dev Drive environment.
Because CopyOnWrite feature exists.

@paulushub
Copy link

Issue: API Page
Namespace links points to localhost.

@yufeih
Copy link
Contributor Author

yufeih commented Oct 31, 2023

Issue: API Page Namespace links points to localhost.

Seems to be caused by 404s on the pages themselves.

@paulushub
Copy link

Task: Optimize Playwright binary size?
Since DocFX is plugin based, will it be possible to only install the playwright when needed?

@yufeih
Copy link
Contributor Author

yufeih commented Oct 31, 2023

The biggest size factor is the redistrbution of NodeJS for 4 platforms in Playwright dotnet: microsoft/playwright-dotnet#1850

At this moment, I tend to leaning on Playwright team fixing the issue using approaches such as using system-wide NodeJS or dynamically install NodeJS

@paulushub
Copy link

paulushub commented Oct 31, 2023

We know this is going to be difficult. For MS (on GitHub and Azure), more time used is better business.
For developers we need to cut this time also for better business.
Hope tools like DocFX make the right choice to avoid fragmentation.

Since this playwright is bringing in quiet a baggage, defining the PDF build interfaces with implementations
through Playwright and others might be a good compromise in my view.
I am looking at other options including WeasyPrint, but will also require NodeJs
for the conversion of mermaid-js graphics. WeasyPrint depends on GTK, no simple road but having a choice
is better.

Here are samples pdf output of the MkDocs's pdf-plugin using the WeasyPrint:

Not that bad in my view!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pdf Produce PDF as the output format
Projects
None yet
Development

No branches or pull requests

3 participants