Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting HTML to PDF #134

Open
kurtisane opened this issue Mar 7, 2022 · 34 comments
Open

Inserting HTML to PDF #134

kurtisane opened this issue Mar 7, 2022 · 34 comments

Comments

@kurtisane
Copy link

Hi,
first of thanks for all of this !
We are currently not using this library due to the lack of some features but I love, appreciate and track this already for some time.
I see this as the "rescue" from headless chromium to render stuff.

One of the requirements for us is most definitely the rendering of HTML content.
So I would love to basically create a PDF and insert HTML snippets into it.

Is this something that can be added to the roadmap ?

Regards !

@MarcinZiabek
Copy link
Member

Hi,
first of thanks for all of this !
We are currently not using this library due to the lack of some features but I love, appreciate and track this already for some time.
I see this as the "rescue" from headless chromium to render stuff.

Thank you for you kind words. I really appreciate them 😁 Can you please share what features are missing? I will consider placing them somewhere on the roadmap.

So I would love to basically create a PDF and insert HTML snippets into it.

Generally speaking, I am against creating functionality that resembles HTML. HTML and CSS are really complex and trying to recreate them would be an enormous effort. On the other hand, if we decide to limit the scope, I expect a constant stream of requests (looking like bug reports) to add various functionalities to the HTML parser.

However, please elaborate more about your idea 😁

@kurtisane
Copy link
Author

My use case might be a little bit special. We are getting a certain html snippet from some third party and need to wrap it into a pdf. This html snippet can contain tables, text, images etc. There is not a lot of CSS going on. And the length of that content can span across multiple pages.

But I need control where to do the page break and maybe add a some text above the footer except for the last page and so on.

I know that the main tasks are not to be handles by QuestPDF and more from my side but since there is no HTML support yet I can't even look into figuring stuff out around it.

@MarcinZiabek
Copy link
Member

I understand your point of view. I am also afraid that QuestPDF is not a proper library in your case. You specifically want to render HTML content as PDF files. The easiest way is just to use any HTML-to-PDF converter that exists on the market.

As stated before, HTML and CSS are so complex that I am not planning to support anything that resembles such format. Especially, that aformentioned converters (usually based on Chromium which is just a webpage engine) are just better in this regard (when you accept their paging limitations). I want to give more granual control specifically designed for dynamic PDF generation with paging support in mind. I hope you understand 😁

@lmingle
Copy link
Contributor

lmingle commented Mar 8, 2022

Take a look at the HTML-to-PDF converter https://github.com/Kemsty2/HtmlConverter.

@MarcinZiabek
Copy link
Member

MarcinZiabek commented Mar 9, 2022

Yes, this library is one of the potential solutions - they are free and paid products based on this idea. They are either based on wkhtml2pdf or chromium engine. Basically, they do run entire webbrowser emulation inside (including javascript). This is slow, usually unstable and has many limitations. I strongly believe that QuestPDF offers and will continue to offer features that are very specific to PDF generation domain, e.g. advanced paging support. However, I do not deny that there are usecases (like yours) where alternatives fit better. I am afraid that I will never attempt to support HTML format in QuestPDF at this level of complexity.

@bgiromini
Copy link

I too have a similar use case, as we have tried several solutions but the page break is never followed. What if we pre-rendered the html to an image and then placed it into the document? Would that work?

@MarcinZiabek
Copy link
Member

  1. Prerendering html as image is a good idea. I suggest rendering with higher resolution so the content looks sharp.
  2. Please notice that QuestPDF will display images as is. It is on your side to properly divide html content for each page so to achieve correct page breaks.
  3. QuestPDF currently treats images as solid blocks. That means, images cannot be broken into multiple pages automatically.

@bgiromini
Copy link

bgiromini commented Mar 13, 2022 via email

@bgiromini
Copy link

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

@MarcinZiabek
Copy link
Member

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

That we be a preffered solution if your HTML content is shaped in well-known and predictable way.

  1. You can perform webscarpping to extract data from the HTML content and then display it using QuestPDF.
  2. Or you can implement some translation layer. Complexity of this step depends on how many building blocks / styles the HTML content uses.

@MarcinZiabek
Copy link
Member

So if I have an image that spans 2 and a half pages, QuestPDF will not break it up over those pages but just have 1 very tall page, is that correct? Bummer. We have users who input formatted text and unfortunately that can't be removed.

I have not been aware about such a requirement. If more people ask for it, I can implement native support for image breaking. So far, I expect that the vast majority of use cases want to scale or wrap the image and keep it as a whole.

Please remember that you can always predict size of available space on the document and divide your image into smaller image chunks 😁

@vanwinkelseppe
Copy link

I guess the next possible step would be to somehow parse the html and convert it to QuestPDF syntax.

Hi @bgiromini

I have faced the same issues when getting HTML snippets from third parties that need to be integrated. A solution, could be to use HtmlAgilityPack package.

With this, you can load in an HTML file or snippet, then loop over the nodes inside. If you know what to expect, you can make sure everything is covered. Maybe if there is enough need for this, the community can come together and create a separate package that includes some basic HTML components, for example render p, span, b, strong, ul, ol, li.

@adamfoneil
Copy link

I started on a lightweight HTML conversion library here: https://github.com/adamfoneil/QuestPdfUtil

@MarcinZiabek
Copy link
Member

I started on a lightweight HTML conversion library here: https://github.com/adamfoneil/QuestPdfUtil

I am excited to observe your progress! It really depends on how many HTML/CSS features we want to support 😁 At this moment, I am not sure if the semi-HTML parser have any benefits over existing API. I expect that developers will constantly hit some incompatibilities or missing features, making such effort a hell. Let's hope that I am wrong!

P.S. I am very sorry for replying so late. There is just a lot of going on in my personal life, very positive yet time-consuming events. I hope tha you understand 😁

@adamfoneil
Copy link

Thank you @MarcinZiabek ! I was pleasantly surprised you starred my repo. No idea what will come of it. I just needed something in a pinch to handle the HTML fragments that are edited in my application. I totally understand why it would not be mainline functionality in a PDF library. Yes I think it would be pretty difficult to make it more full-featured, but what's there now works for my use case.

@bgiromini
Copy link

bgiromini commented May 5, 2022 via email

@MarcinZiabek
Copy link
Member

TinyMCE sounds like limited and well-known environment. In such a case, where you can accuratelly predict all requirements and corner cases, the solution proposed by @adamfoneil may work really nicely. Of course, assuming that his library is extended to support output from TineMCE 😁

@GeeSuth
Copy link

GeeSuth commented Oct 14, 2022

@adamfoneil, your idea is very useful I look on your repository and find this interesting
@MarcinZiabek Did think about make it Buit-in in QuestPDF?

@Relorer
Copy link

Relorer commented Oct 15, 2022

I needed HTML support, so I wrote a small library for this. I will be glad if it will be useful to someone else.
You can also find it in NuGet

@MarcinZiabek I used the outline of your icon. If you are against it, let me know, and I will replace it.
image

image

@MarcinZiabek
Copy link
Member

MarcinZiabek commented Oct 15, 2022

Did think about make it Buit-in in QuestPDF?

@GeeSuth This is really an interesting concept but I am not planning to integrate it to the library. The complexity of HTML and CSS (especially modern versions) is overwhelming. It is just not possible to create a translation layer that will work for so many cases. Not to mention, QuestPDF is not that powerful. After all, behind Chromium (and any other web rendering engine) there are dozens of full-time experts. Whatever such library does, it will always be a small subset of the actual HTML+CSS technology.

That being said, I am very happy to see new libraries that are attempting to fill that gap. All similar effors are very welcome! I can't wait to observe their development.

@MarcinZiabek
Copy link
Member

I needed HTML support, so I wrote a small library for this. I will be glad if it will be useful to someone else. You can also find it in NuGet

@Relorer Your library looks very interesting and is already quite impressive 😁 Keep going!

@MarcinZiabek I used the outline of your icon. If you are against it, let me know, and I will replace it.

I am totally ok with using the library logo (in the original or modified form). I only suggest to use the full name of QuestPDF - this may help you position your nuget in the results. For example, Relorer.QuestPDF.HTML (or something similar)

Also, I have a question 😁 In accordance to my message above, writing a fully capable HTML+CSS to QuestPDF converter is just not possible, for multiple reasons. No matter how many features you add, there will always be more features to support. This was nicely shown in your first issue Relorer/HTMLToQPDF#1

However, I expect that many projects may benefit from supporting a very limited and predictable set of HTML content, e.g. something that is an output of WYSIWYG editor. Do you plan supporting this type of scenario? I think this may be a great niche for your project 😁

@Relorer
Copy link

Relorer commented Oct 15, 2022

@MarcinZiabek I hadn't thought of such editors, and I didn't even know they had such a name 😆
But now I realize that it seems quite interesting

Now I plan to add minimal CSS support. And then I'll probably really try to explore the existing WYSIWYG editors to add the missing features

Thank you for advice 😁

@girlpunk
Copy link

e.g. something that is an output of WYSIWYG editor

Slightly off-topic from HTML parsing, but this is significantly easier when using a reduced markup such as markdown. Unfortunately I can't share it, but I do have a (mostly) working markdown to PDF converter that was created without too much trouble with the Markdig library.

@MarcinZiabek
Copy link
Member

e.g. something that is an output of WYSIWYG editor

Slightly off-topic from HTML parsing, but this is significantly easier when using a reduced markup such as markdown. Unfortunately I can't share it, but I do have a (mostly) working markdown to PDF converter that was created without too much trouble with the Markdig library.

I was thinking about markdown too 😁 Great idea, thank you!

I am not sure though, if all editors are capable of outputting markdown. Also, markdown is not as powerful compared to HTML. Maybe there is space for both approaches?

@bgiromini
Copy link

In my use case, I have over 10 years worth of data stored as html would need to be converted or deprecated.

@PrzemyslawKlys
Copy link

You can always use chrome to convert HTML to PDF using PowerShell

https://gist.github.com/ilovefreesw/da435865a443a62923d67e6af6c6b2a8

If it's business oriented that's the shortest way

@girlpunk
Copy link

You're correct that it's not as powerful, however that probably works in your favour, given it means there's a significantly limited set of possibilities for formatting, layout, etc, compared with HTML

@GeeSuth
Copy link

GeeSuth commented Oct 17, 2022

Finally after a lot of searching I decide to learn QuestPDF concept I find this very useful
and very easy way to generate PDF file,
there were a great method inside it like "Element" | "Column" | "Row"
Thank you @MarcinZiabek GREAT JOB 👍

@GFoley83
Copy link

GFoley83 commented Dec 1, 2022

If you have to work with html it might be easier to use something else like Puppeteer Sharp to load/inject a html page in to a headless Chromium browser and just save html to PDF

https://github.com/hardkoded/puppeteer-sharp#generate-pdf-files

@bgiromini
Copy link

If you have to work with html it might be easier to use something else like Puppeteer Sharp to load/inject a html page in to a headless Chromium browser and just save html to PDF

https://github.com/hardkoded/puppeteer-sharp#generate-pdf-files

The problem with headless Chrome is the flaky paging support for table headers and footers

@humphrey
Copy link

humphrey commented Jun 6, 2023

What about something simple like Markdown? I suspect there would almost be a 1:1 correlation between Markdown features and QuestPDF features. The use case would be for those complex PDF's that have snippets of text or cover letter content, that you might want the end user to be able to edit.

@christiaanderidder
Copy link

@kurtisane @humphrey In case you are still looking for something like this using markdown, I have decided to create a small library that does exactly that, and expect to release a non-preview version in the upcoming days. You can find it here: https://github.com/christiaanderidder/QuestPDF.Markdown

@mjefim
Copy link

mjefim commented Jan 5, 2024

Hi everyone! I am enjoying the QuestPDF very much, @MarcinZiabek thanks! I am in need of outputing a simple HTML content that is the product of a WYSIWYG editor. I have noticed there are a few suggested solutions, some from a while ago, others quite new. My question is, which is currently the most viable solution?

@MarcinZiabek
Copy link
Member

At this moment, there is no official support for this feature 😥

Based on nuget and github statistics, this https://github.com/Relorer/HTMLToQPDF is a viable solution. Huge shout out to its authors!

Once I finish working on a couple of high-priority features, I will consider introducing official HTML support or (more likely) helping existing projects if their authors are interested in collaboration 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests