Inserting HTML to PDF #134

kurtisane · 2022-03-07T10:42:55Z

Hi,
first of thanks for all of this !
We are currently not using this library due to the lack of some features but I love, appreciate and track this already for some time.
I see this as the "rescue" from headless chromium to render stuff.

One of the requirements for us is most definitely the rendering of HTML content.
So I would love to basically create a PDF and insert HTML snippets into it.

Is this something that can be added to the roadmap ?

Regards !

MarcinZiabek · 2022-03-07T20:39:53Z

Hi,
first of thanks for all of this !
We are currently not using this library due to the lack of some features but I love, appreciate and track this already for some time.
I see this as the "rescue" from headless chromium to render stuff.

Thank you for you kind words. I really appreciate them 😁 Can you please share what features are missing? I will consider placing them somewhere on the roadmap.

So I would love to basically create a PDF and insert HTML snippets into it.

Generally speaking, I am against creating functionality that resembles HTML. HTML and CSS are really complex and trying to recreate them would be an enormous effort. On the other hand, if we decide to limit the scope, I expect a constant stream of requests (looking like bug reports) to add various functionalities to the HTML parser.

However, please elaborate more about your idea 😁

kurtisane · 2022-03-08T19:52:53Z

My use case might be a little bit special. We are getting a certain html snippet from some third party and need to wrap it into a pdf. This html snippet can contain tables, text, images etc. There is not a lot of CSS going on. And the length of that content can span across multiple pages.

But I need control where to do the page break and maybe add a some text above the footer except for the last page and so on.

I know that the main tasks are not to be handles by QuestPDF and more from my side but since there is no HTML support yet I can't even look into figuring stuff out around it.

MarcinZiabek · 2022-03-08T20:41:53Z

I understand your point of view. I am also afraid that QuestPDF is not a proper library in your case. You specifically want to render HTML content as PDF files. The easiest way is just to use any HTML-to-PDF converter that exists on the market.

As stated before, HTML and CSS are so complex that I am not planning to support anything that resembles such format. Especially, that aformentioned converters (usually based on Chromium which is just a webpage engine) are just better in this regard (when you accept their paging limitations). I want to give more granual control specifically designed for dynamic PDF generation with paging support in mind. I hope you understand 😁

lmingle · 2022-03-08T21:01:56Z

Take a look at the HTML-to-PDF converter https://github.com/Kemsty2/HtmlConverter.

MarcinZiabek · 2022-03-09T14:20:09Z

Yes, this library is one of the potential solutions - they are free and paid products based on this idea. They are either based on wkhtml2pdf or chromium engine. Basically, they do run entire webbrowser emulation inside (including javascript). This is slow, usually unstable and has many limitations. I strongly believe that QuestPDF offers and will continue to offer features that are very specific to PDF generation domain, e.g. advanced paging support. However, I do not deny that there are usecases (like yours) where alternatives fit better. I am afraid that I will never attempt to support HTML format in QuestPDF at this level of complexity.

bgiromini · 2022-03-13T09:39:24Z

I too have a similar use case, as we have tried several solutions but the page break is never followed. What if we pre-rendered the html to an image and then placed it into the document? Would that work?

MarcinZiabek · 2022-03-13T18:59:06Z

Prerendering html as image is a good idea. I suggest rendering with higher resolution so the content looks sharp.
Please notice that QuestPDF will display images as is. It is on your side to properly divide html content for each page so to achieve correct page breaks.
QuestPDF currently treats images as solid blocks. That means, images cannot be broken into multiple pages automatically.

bgiromini · 2022-03-13T19:22:15Z

So if I have an image that spans 2 and a half pages, QuestPDF will not break it up over those pages but just have 1 very tall page, is that correct? Bummer. We have users who input formatted text and unfortunately that can't be removed.

…

On Sun, Mar 13, 2022 at 1:59 PM Marcin Ziąbek ***@***.***> wrote: 1. Prerendering html as image is a good idea. I suggest rendering with higher resolution so the content looks sharp. 2. Please notice that QuestPDF will display images as is. It is on your side to properly divide html content for each page so to achieve correct page breaks. 3. QuestPDF currently treats images as solid blocks. That means, images cannot be broken into multiple pages automatically. — Reply to this email directly, view it on GitHub <#134 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABG46GSGFBJ5Z7BMLG7HZA3U7Y3ILANCNFSM5QC4TKYQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

bgiromini · 2022-03-13T19:24:05Z

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

MarcinZiabek · 2022-03-13T19:28:11Z

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

That we be a preffered solution if your HTML content is shaped in well-known and predictable way.

You can perform webscarpping to extract data from the HTML content and then display it using QuestPDF.
Or you can implement some translation layer. Complexity of this step depends on how many building blocks / styles the HTML content uses.

MarcinZiabek · 2022-03-13T19:30:59Z

So if I have an image that spans 2 and a half pages, QuestPDF will not break it up over those pages but just have 1 very tall page, is that correct? Bummer. We have users who input formatted text and unfortunately that can't be removed.

I have not been aware about such a requirement. If more people ask for it, I can implement native support for image breaking. So far, I expect that the vast majority of use cases want to scale or wrap the image and keep it as a whole.

Please remember that you can always predict size of available space on the document and divide your image into smaller image chunks 😁

vanwinkelseppe · 2022-03-14T16:21:19Z

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

Hi @bgiromini

I have faced the same issues when getting HTML snippets from third parties that need to be integrated. A solution, could be to use HtmlAgilityPack package.

With this, you can load in an HTML file or snippet, then loop over the nodes inside. If you know what to expect, you can make sure everything is covered. Maybe if there is enough need for this, the community can come together and create a separate package that includes some basic HTML components, for example render p, span, b, strong, ul, ol, li.

adamfoneil · 2022-05-01T12:06:50Z

I started on a lightweight HTML conversion library here: https://github.com/adamfoneil/QuestPdfUtil

MarcinZiabek · 2022-05-05T16:10:43Z

I started on a lightweight HTML conversion library here: https://github.com/adamfoneil/QuestPdfUtil

I am excited to observe your progress! It really depends on how many HTML/CSS features we want to support 😁 At this moment, I am not sure if the semi-HTML parser have any benefits over existing API. I expect that developers will constantly hit some incompatibilities or missing features, making such effort a hell. Let's hope that I am wrong!

P.S. I am very sorry for replying so late. There is just a lot of going on in my personal life, very positive yet time-consuming events. I hope tha you understand 😁

adamfoneil · 2022-05-05T20:41:15Z

Thank you @MarcinZiabek ! I was pleasantly surprised you starred my repo. No idea what will come of it. I just needed something in a pinch to handle the HTML fragments that are edited in my application. I totally understand why it would not be mainline functionality in a PDF library. Yes I think it would be pretty difficult to make it more full-featured, but what's there now works for my use case.

bgiromini · 2022-05-05T21:42:32Z

In my case our users use TinyMCE to edit text for a proposal.

…

On Thu, May 5, 2022 at 3:41 PM adamfoneil ***@***.***> wrote: Thank you @MarcinZiabek <https://github.com/MarcinZiabek> ! I was pleasantly surprised you starred my repo. No idea what will come of it. I just needed something in a pinch to handle the HTML fragments that are edited in my application. I totally understand why it would not be mainline functionality in a PDF library. Yes I think it would be pretty difficult to make it more full-featured, but what's there now works for my use case. — Reply to this email directly, view it on GitHub <#134 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABG46GUQ2XUAGFEN76GWUZDVIQW7LANCNFSM5QC4TKYQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

MarcinZiabek · 2022-05-05T21:53:49Z

TinyMCE sounds like limited and well-known environment. In such a case, where you can accuratelly predict all requirements and corner cases, the solution proposed by @adamfoneil may work really nicely. Of course, assuming that his library is extended to support output from TineMCE 😁

GeeSuth · 2022-10-14T14:10:09Z

@adamfoneil, your idea is very useful I look on your repository and find this interesting
@MarcinZiabek Did think about make it Buit-in in QuestPDF?

Relorer · 2022-10-15T01:20:14Z

I needed HTML support, so I wrote a small library for this. I will be glad if it will be useful to someone else.
You can also find it in NuGet

@MarcinZiabek I used the outline of your icon. If you are against it, let me know, and I will replace it.

MarcinZiabek · 2022-10-15T22:35:31Z

Did think about make it Buit-in in QuestPDF?

@GeeSuth This is really an interesting concept but I am not planning to integrate it to the library. The complexity of HTML and CSS (especially modern versions) is overwhelming. It is just not possible to create a translation layer that will work for so many cases. Not to mention, QuestPDF is not that powerful. After all, behind Chromium (and any other web rendering engine) there are dozens of full-time experts. Whatever such library does, it will always be a small subset of the actual HTML+CSS technology.

That being said, I am very happy to see new libraries that are attempting to fill that gap. All similar effors are very welcome! I can't wait to observe their development.

MarcinZiabek · 2022-10-15T22:46:19Z

I needed HTML support, so I wrote a small library for this. I will be glad if it will be useful to someone else. You can also find it in NuGet

@Relorer Your library looks very interesting and is already quite impressive 😁 Keep going!

@MarcinZiabek I used the outline of your icon. If you are against it, let me know, and I will replace it.

I am totally ok with using the library logo (in the original or modified form). I only suggest to use the full name of QuestPDF - this may help you position your nuget in the results. For example, Relorer.QuestPDF.HTML (or something similar)

Also, I have a question 😁 In accordance to my message above, writing a fully capable HTML+CSS to QuestPDF converter is just not possible, for multiple reasons. No matter how many features you add, there will always be more features to support. This was nicely shown in your first issue Relorer/HTMLToQPDF#1

However, I expect that many projects may benefit from supporting a very limited and predictable set of HTML content, e.g. something that is an output of WYSIWYG editor. Do you plan supporting this type of scenario? I think this may be a great niche for your project 😁

Relorer · 2022-10-15T23:17:53Z

@MarcinZiabek I hadn't thought of such editors, and I didn't even know they had such a name 😆
But now I realize that it seems quite interesting

Now I plan to add minimal CSS support. And then I'll probably really try to explore the existing WYSIWYG editors to add the missing features

Thank you for advice 😁

girlpunk · 2022-10-17T09:53:00Z

e.g. something that is an output of WYSIWYG editor

Slightly off-topic from HTML parsing, but this is significantly easier when using a reduced markup such as markdown. Unfortunately I can't share it, but I do have a (mostly) working markdown to PDF converter that was created without too much trouble with the Markdig library.

MarcinZiabek · 2022-10-17T09:55:41Z

e.g. something that is an output of WYSIWYG editor

Slightly off-topic from HTML parsing, but this is significantly easier when using a reduced markup such as markdown. Unfortunately I can't share it, but I do have a (mostly) working markdown to PDF converter that was created without too much trouble with the Markdig library.

I was thinking about markdown too 😁 Great idea, thank you!

I am not sure though, if all editors are capable of outputting markdown. Also, markdown is not as powerful compared to HTML. Maybe there is space for both approaches?

bgiromini · 2022-10-17T09:55:42Z

In my use case, I have over 10 years worth of data stored as html would need to be converted or deprecated.

PrzemyslawKlys · 2022-10-17T09:59:14Z

You can always use chrome to convert HTML to PDF using PowerShell

https://gist.github.com/ilovefreesw/da435865a443a62923d67e6af6c6b2a8

If it's business oriented that's the shortest way

girlpunk · 2022-10-17T10:00:34Z

You're correct that it's not as powerful, however that probably works in your favour, given it means there's a significantly limited set of possibilities for formatting, layout, etc, compared with HTML

GeeSuth · 2022-10-17T10:57:01Z

Finally after a lot of searching I decide to learn QuestPDF concept I find this very useful
and very easy way to generate PDF file,
there were a great method inside it like "Element" | "Column" | "Row"
Thank you @MarcinZiabek GREAT JOB 👍

GFoley83 · 2022-12-01T10:11:57Z

If you have to work with html it might be easier to use something else like Puppeteer Sharp to load/inject a html page in to a headless Chromium browser and just save html to PDF

https://github.com/hardkoded/puppeteer-sharp#generate-pdf-files

bgiromini · 2023-03-01T17:36:25Z

If you have to work with html it might be easier to use something else like Puppeteer Sharp to load/inject a html page in to a headless Chromium browser and just save html to PDF

https://github.com/hardkoded/puppeteer-sharp#generate-pdf-files

The problem with headless Chrome is the flaky paging support for table headers and footers

humphrey · 2023-06-06T02:45:33Z

What about something simple like Markdown? I suspect there would almost be a 1:1 correlation between Markdown features and QuestPDF features. The use case would be for those complex PDF's that have snippets of text or cover letter content, that you might want the end user to be able to edit.

christiaanderidder · 2023-11-12T19:53:52Z

@kurtisane @humphrey In case you are still looking for something like this using markdown, I have decided to create a small library that does exactly that, and expect to release a non-preview version in the upcoming days. You can find it here: https://github.com/christiaanderidder/QuestPDF.Markdown

mjefim · 2024-01-05T09:46:30Z

Hi everyone! I am enjoying the QuestPDF very much, @MarcinZiabek thanks! I am in need of outputing a simple HTML content that is the product of a WYSIWYG editor. I have noticed there are a few suggested solutions, some from a while ago, others quite new. My question is, which is currently the most viable solution?

MarcinZiabek · 2024-01-05T17:53:43Z

At this moment, there is no official support for this feature 😥

Based on nuget and github statistics, this https://github.com/Relorer/HTMLToQPDF is a viable solution. Huge shout out to its authors!

Once I finish working on a couple of high-priority features, I will consider introducing official HTML support or (more likely) helping existing projects if their authors are interested in collaboration 😀

noptools mentioned this issue Oct 8, 2022

Convert docx to pdf #363

Open

luis-fss mentioned this issue Mar 2, 2023

How to place an element in the middle of the page taking up as much space as possible, but keeping the elements below on the same page? #509

Closed

girlpunk mentioned this issue Feb 16, 2024

Inject HTML into table cell or column or any where in pdf. #790

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting HTML to PDF #134

Inserting HTML to PDF #134

kurtisane commented Mar 7, 2022

MarcinZiabek commented Mar 7, 2022

kurtisane commented Mar 8, 2022

MarcinZiabek commented Mar 8, 2022

lmingle commented Mar 8, 2022

MarcinZiabek commented Mar 9, 2022 •

edited

Loading

bgiromini commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

bgiromini commented Mar 13, 2022 via email

bgiromini commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

vanwinkelseppe commented Mar 14, 2022

adamfoneil commented May 1, 2022

MarcinZiabek commented May 5, 2022

adamfoneil commented May 5, 2022

bgiromini commented May 5, 2022 via email

MarcinZiabek commented May 5, 2022

GeeSuth commented Oct 14, 2022 •

edited

Loading

Relorer commented Oct 15, 2022

MarcinZiabek commented Oct 15, 2022 •

edited

Loading

MarcinZiabek commented Oct 15, 2022

Relorer commented Oct 15, 2022

girlpunk commented Oct 17, 2022

MarcinZiabek commented Oct 17, 2022

bgiromini commented Oct 17, 2022

PrzemyslawKlys commented Oct 17, 2022

girlpunk commented Oct 17, 2022

GeeSuth commented Oct 17, 2022

GFoley83 commented Dec 1, 2022

bgiromini commented Mar 1, 2023

humphrey commented Jun 6, 2023

christiaanderidder commented Nov 12, 2023

mjefim commented Jan 5, 2024

MarcinZiabek commented Jan 5, 2024

Inserting HTML to PDF #134

Inserting HTML to PDF #134

Comments

kurtisane commented Mar 7, 2022

MarcinZiabek commented Mar 7, 2022

kurtisane commented Mar 8, 2022

MarcinZiabek commented Mar 8, 2022

lmingle commented Mar 8, 2022

MarcinZiabek commented Mar 9, 2022 • edited Loading

bgiromini commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

bgiromini commented Mar 13, 2022 via email

bgiromini commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

MarcinZiabek commented Mar 13, 2022

vanwinkelseppe commented Mar 14, 2022

adamfoneil commented May 1, 2022

MarcinZiabek commented May 5, 2022

adamfoneil commented May 5, 2022

bgiromini commented May 5, 2022 via email

MarcinZiabek commented May 5, 2022

GeeSuth commented Oct 14, 2022 • edited Loading

Relorer commented Oct 15, 2022

MarcinZiabek commented Oct 15, 2022 • edited Loading

MarcinZiabek commented Oct 15, 2022

Relorer commented Oct 15, 2022

girlpunk commented Oct 17, 2022

MarcinZiabek commented Oct 17, 2022

bgiromini commented Oct 17, 2022

PrzemyslawKlys commented Oct 17, 2022

girlpunk commented Oct 17, 2022

GeeSuth commented Oct 17, 2022

GFoley83 commented Dec 1, 2022

bgiromini commented Mar 1, 2023

humphrey commented Jun 6, 2023

christiaanderidder commented Nov 12, 2023

mjefim commented Jan 5, 2024

MarcinZiabek commented Jan 5, 2024

MarcinZiabek commented Mar 9, 2022 •

edited

Loading

GeeSuth commented Oct 14, 2022 •

edited

Loading

MarcinZiabek commented Oct 15, 2022 •

edited

Loading