Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMTL export taking longer then 1 minute #5048

Open
jonathon2nd opened this issue Jun 3, 2024 · 10 comments
Open

HMTL export taking longer then 1 minute #5048

jonathon2nd opened this issue Jun 3, 2024 · 10 comments
Labels

Comments

@jonathon2nd
Copy link

Describe the Bug

Attempting to do an HTML export fails after one minute, results in 504 error.

Steps to Reproduce

Using either export-books.php or via UI
image

Attempt to generate an html export of a book.

Expected Behaviour

HTML would be downloaded.

Screenshots or Additional Context

The txt download is ~533kB

Log from console.

2024-06-03T18:51:46.963675560Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
2024-06-03T18:51:47.105897343Z 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  2572  100  2572    0     0  18402      0 --:--:-- --:--:-- --:--:-- 18503
2024-06-03T18:53:00.705073606Z PHP Warning:  file_get_contents(http://bookstack-service.wiki/api/books/28/export/html): Failed to open stream: HTTP request failed! HTTP/1.1 504 Gateway Time-out
2024-06-03T18:53:00.705112372Z  in /export-books.php on line 74

Browser Details

No response

Exact BookStack Version

v24.05.1

@jonathon2nd
Copy link
Author

Screenshot from 2024-06-03 13-13-01
PDF also times out

@ssddanbrown
Copy link
Member

Hi @jonathon2nd,
Exports can take a while if there's a lot of content, and sometimes in rare cases specific content can trip up the exports system and cause more work than expected to be done.
Really, this is the kind of thing I'd need to replicate with the same content to actually testing.

Do other books in the system also time-out, even if simple?
You could maybe clone the book and delete parts of it to help identify if it's mainly down to a specific page or collection of pages.

@M0n7y5
Copy link

M0n7y5 commented Jun 4, 2024

Check your logs ... you may need to change memory limits or execution timeout in php.ini

@jonathon2nd
Copy link
Author

@M0n7y5 Both had already increased. I am now running into Cloudflare timeout. No errors in container logs.

@ssddanbrown We have no other books that have the timeout. Once the book is split up, we will export each one and see if it is a problem because of content type, not necessarily the size of the book.

The txt download is ~533kB
The md download is ~775kB

@jonathon2nd
Copy link
Author

The book has been refactored, still failing to export to html in 1 minute

txt export size: ~150kB
md export size: ~250kB

Able to export each page individually
image

@M0n7y5
Copy link

M0n7y5 commented Jun 5, 2024

You need to tell cloudflare to wait longer for server to respond. Cloudflare thinks server is down while your book is converting to PDF.

@M0n7y5
Copy link

M0n7y5 commented Jun 5, 2024

Also one page taking 120MB is crazy ... What kind of content do you have on your pages?

@jonathon2nd
Copy link
Author

Lots of photos.

Whats strange is that those couple of huge individual pages take no more then ~3 seconds. Most others were instant. So not sure why the book export explodes.

@ssddanbrown
Copy link
Member

Yeah, 120MB is super high. If the pages are exporting quick, might indicate hitting some kind of memory limit or exhaustion, or just that HTML is just too large to be handling without problems.
There might be a more efficient way for us to do the embed/parsing (placeholder then simple string replacements at the end) but at those kinds of sizes, I'd be surpised if there are not other issues that pop up anyway.
The formats we produce aren't really great for high-image/data content tbh.

@M0n7y5
Copy link

M0n7y5 commented Jun 18, 2024

The issue here is that parsing HTML takes a lot of memory and converting it to PDF is CPU intensive task because all of this is done in old PHP library. PHP itself is just slow. I solved my issue by using https://gotenberg.dev/ and overriding the PDF Export. It also solves a lot of weird issues with some Unicode stuff. It uses headless Chrome under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants