Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] PDF generation fails when TOC has many bookmarks #9926

Open
iti-pawel-wach opened this issue May 15, 2024 · 1 comment
Open

[Bug] PDF generation fails when TOC has many bookmarks #9926

iti-pawel-wach opened this issue May 15, 2024 · 1 comment
Labels
bug A bug to fix pdf Produce PDF as the output format

Comments

@iti-pawel-wach
Copy link

iti-pawel-wach commented May 15, 2024

Describe the bug
Hi, I have strange problem with generating PDF when TOC.md contains some links with bookmarks. It works locally on my laptop but when I'm doing the same on Windows Server PDF generation fails.

PDF generation turned off - no error.
In generated TOC.html links are working.
I've tried TOC.yml instead TOC.md - the same error.
Locally on Windows 11 works, but on Windows server - it fails.

To Reproduce
Steps to reproduce the behavior:

  1. Create file "article1.md" with following content:
# header1
## header2
### header3-1
### header3-2
### header3-3
### header3-4
### header3-5
  1. Create "TOC.md" with following content:
# [header1](article1.md#header1)
## [header2](article1.md#header2)
### [header3-1](article1.md#header3-1)
### [header3-2](article1.md#header3-2)
### [header3-3](article1.md#header3-3)
### [header3-4](article1.md#header3-4)
### [header3-5](article1.md#header3-5)
  1. Run docfx with pdf generation

docfx.json:

{
  "build": {
    "content": [
      {
        "files": ["**/*.{md,yml}"],
        "exclude": ["**.*.pdf"]
      }
    ],
    "resource": [
      {
        "files": ["**/media/**"],
        "exclude": ["**/obj/**", "**/includes/**"]
      }
    ],
    "overwrite": [
      {
        "exclude": ["obj/**", "_site/**"]
      }
    ],
    "dest": "_site",
    "globalMetadata": {
      "_appTitle": "Test",
      "_disableContribution": "true",
      "_enableSearch": true,
      "pdf": true,
      "pdfTocPage": true,
      "pdfFileName": "doc-pdf.pdf"
    },
    "globalMetadataFiles": [],
    "fileMetadataFiles": [],
    "template": [
      "default",
      "modern"
    ],
    "postProcessors": ["ExtractSearchIndex"],
    "markdownEngineName": "markdig",
    "noLangKeyword": false,
    "keepFileLink": false,
    "cleanupCacheHistory": false,
    "disableGitFeatures": false
  }
}
  1. Get error:
(...)
XRef map exported.
Extracting index data from 4 html files
Content\doc-pdf.pdf: 0%
InvalidOperationException: Failed to build PDF page []: 
http://127.0.0.1:58412/Content/article1.html#header3-5
  at void MoveNext() in PdfBuilder.cs:156                                       
  at void MoveNext() in PdfBuilder.cs:238                                       
  at void MoveNext()                                                            
  at async Task CreatePdf(Func<Outline, Uri, Task<byte[]>> printPdf,            
     Func<Outline, int, int, Task<byte[]>> printHeaderFooter, ProgressTask task,
     Uri outlineUrl, Outline outline, string outputPath,                        
     Action<Dictionary<Outline, int>> updatePageNumbers) in PdfBuilder.cs:235   
  at void MoveNext() in PdfBuilder.cs:105                                       
  at void MoveNext()                                                            
  at void MoveNext() in PdfBuilder.cs:98                                        
  at void MoveNext() in Progress.cs:103                                         
  at void MoveNext() in Progress.cs:138                                         
  at async Task<T> RunAsync<T>(Func<Task<T>> func) in DefaultExclusivityMode.cs:
     40                                                                         
  at async Task<T> StartAsync<T>(Func<ProgressContext, Task<T>> action) in      
     Progress.cs:121                                                            
  at async Task StartAsync(Func<ProgressContext, Task> action) in Progress.cs:  
     101                                                                        
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:96              
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:114             
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:114             
  at void <Execute>b__0() in DefaultCommand.cs:53                               
  at int Run(LogOptions options, Action run) in CommandHelper.cs:48             
  at int Execute(CommandContext context, Options options) in DefaultCommand.cs: 
     31                                                                         
  at Task<int> Execute(CommandContext context, CommandSettings settings) in     
     CommandOfT.cs:40                                                           
  at Task<int> Execute(CommandTree leaf, CommandTree tree, CommandContext       
     context, ITypeResolver resolver, IConfiguration configuration) in          
     CommandExecutor.cs:144                                                     
  at async Task<int> Execute(IConfiguration configuration, IEnumerable<string>  
     args) in CommandExecutor.cs:83                                             

Full log: log.txt

Expected behavior
PDF should be generated without any errors/warnings.

Context (please complete the following information):

  • OS: Windows Server 2019, 2022
  • Docfx version: 2.76.0

Additional context

I noticed - when I remove last line in TOC.md - this one:

### [header3-5](article1.md#header3-5)

Everythings works and there is no error:

XRef map exported.
Extracting index data from 4 html files
Content\doc-pdf.pdf: 0%
Content\doc-pdf.pdf: 53%
Content\doc-pdf.pdf: 99%

Build succeeded.

    0 warning(s)
    0 error(s)

Adding any next link with bookmark causes error but adding link without bookmark works. So it looks like there is some problem above 6 links with bookmarks or maybe there is limitation I don't know. Could someone check this issue or tell me how to resolve/workaround that?

@iti-pawel-wach iti-pawel-wach added the bug A bug to fix label May 15, 2024
@iti-pawel-wach iti-pawel-wach changed the title [Bug] PDF generation fails when TOC has many bookmarks - Azure DevOps pipeline [Bug] PDF generation fails when TOC has many bookmarks May 15, 2024
@filzrev
Copy link
Contributor

filzrev commented May 15, 2024

I can also reproduce problems on my environment.

I've confirmed related source code. and it seems occurred when following condition met.

  • IPage object is get from cached pools.
  • Call GotoAsync with same page (but pointing different anchor(#))

async Task<byte[]?> PrintPdf(Outline outline, Uri url)
{
await pageLimiter.WaitAsync();
var page = pagePool.TryTake(out var pooled) ? pooled : await context.NewPageAsync();
try
{
var response = await page.GotoAsync(url.ToString(), new() { WaitUntil = WaitUntilState.DOMContentLoaded });
if (response?.Status is 404)
return null;
if (response is null || !response.Ok)
throw new InvalidOperationException($"Failed to build PDF page [{response?.Status}]: {url}");

There is a document when response returns null.
https://playwright.dev/docs/api/class-page#page-goto

The method either throws an error or returns a main resource response. The only exceptions are navigation to about:blank or navigation to the same URL with a different hash, which would succeed and return null.

@yufeih yufeih added the pdf Produce PDF as the output format label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug to fix pdf Produce PDF as the output format
Projects
None yet
Development

No branches or pull requests

3 participants