Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF header signature not found (Error happen when conversion toc.json to Pdf) #4999

Closed
marco-bertschi opened this issue Aug 16, 2019 · 17 comments
Labels
pdf Produce PDF as the output format
Projects

Comments

@marco-bertschi
Copy link

Operation System: Windos 10

DocFX Version Used: 2.43.2
Template used: default

Steps to Reproduce:

  1. Take artifacts from documentation
  2. Run DocFX
  3. Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: Mindestens ein Fehler ist aufgetreten. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.

Expected Behavior:
No error

Actual Behavior:
Error

I've tried to add PDF generation wo one of my own docFX repos, but the build failed with the error above. After that I downloaded the artifacts mentioned above, yet no avail: The build still fails, even with the standard documentation.

@superyyrrzz superyyrrzz added this to To do in DocFX v2 via automation Aug 27, 2019
@marco-bertschi
Copy link
Author

marco-bertschi commented Sep 27, 2019

@superyyrrzz is there any progress on this? Would be nice to have at least a working example

@icnocop
Copy link
Contributor

icnocop commented Mar 25, 2020

Which version of wkhtmltox are you using?

I tested it with wkhtmltox v0.12.5-1.msvc2015-win64 and it worked without issues.

Try passing --logLevel Verbose to docfx.exe and share the output please.

Thank you.

@icnocop
Copy link
Contributor

icnocop commented Mar 31, 2020

I have received this error when I pass in an html file with invalid references to the wkhtmltopdf cover parameter.

For example,

    "wkhtmltopdf": {
      "additionalArguments": "--quiet cover \"C:\\contains invalid references.html\""
    }

If the html contains a relative file path like <img src="../test.png" /> and test.png can't be found, then the error occurs.

@icnocop
Copy link
Contributor

icnocop commented Mar 31, 2020

#4488 seems related.

@marco-bertschi
Copy link
Author

@icnocop I was using the version available from chocolatey.org, which was at the time of testing 0.12.5.

@icnocop
Copy link
Contributor

icnocop commented May 2, 2020

Sorry, I couldn't reproduce.

Make sure your current directory is in the same folder as docfx.json.

Steps I took:

  1. Download walkthrough3.zip and extract to c:\walkthrough3

  2. Download wkhtmltox-0.12.5-1.msvc2015-win64.exe and extract to c:\wkhtmltox

  3. Copy c:\wkhtmltox\bin\wkhtmltopdf.exe to c:\walkthrough3\wkhtmltopdf.exe

  4. Download docfx.zip and extract to c:\docfx

  5. Open a command prompt and run the following commands:

cd C:\walkthrough3
c:\docfx\docfx.exe docfx.json --logLevel Verbose

@marco-bertschi
Copy link
Author

I'm unable to spend any more time on this as per my employer, please close the issue.

@alexrp
Copy link
Contributor

alexrp commented Sep 7, 2020

I've run into this issue as well. It only happens on a Windows runner on GitHub Actions; it works fine locally. All versions are the same between the runner and my local machine (DocFX 2.56.2, wkhtmltopdf 0.12.6).

@icnocop
Copy link
Contributor

icnocop commented Sep 7, 2020

@alexrp can you provide a repository that can reproduce the issue?

Thank you.

@alexrp
Copy link
Contributor

alexrp commented Sep 7, 2020

@icnocop over here: https://github.com/flare-lang/flare-lang.github.io

Note that I removed PDF generation from CI in https://github.com/flare-lang/flare-lang.github.io/commit/1d905f487328f578062b08c41f7da15c16f9f085. You'll need to revert that commit to reproduce.

Here's an example of a run where the problem occurred: https://github.com/flare-lang/flare-lang.github.io/runs/1080331536?check_suite_focus=true#step:7:45

@icnocop
Copy link
Contributor

icnocop commented Nov 9, 2020

Thank you, @alexrp.

I was able to reproduce the issue using your repo.

I was able to "work-around" the issue by specifying "noStdin": true in docfx.json as follows:

{
    ...
    "pdf": {
        ...
        "noStdin": true,
        ...
    }
}

Example commit: icnocop/flare-lang.github.io@d34621a

Example build: https://github.com/icnocop/flare-lang.github.io/runs/1372135172

For reference, see: #4488

@CalvinWilkinson
Copy link

Hello @icnocop!!

I just started using this for documentation for my library and so far it is great, but ran into this issue.

The versions that I used when I ran into the issue is below:

  1. docfx 👉🏼 v2.59.2.0
  2. wkhtmltopdf 👉🏼 v0.12.6 (with patched qt)

I did indeed get it working by adding noStdin: true to the pdf section of the docfx.json.
My questions are this:

  1. Is this an "issue" that is in the works on getting fixed and this is just a workaround?
  2. I did not see anything about noStdin in the walkthrough or anything and stumbled on this issue for hours, if this is not a workaround and it is meant to be used like this, is the documentation/tutorial on the website going to be updated?
  3. Is this a windows only thing? I did notice that somebody in the comments mentioned that they only ran into the issue with a windows runner with GitHub actions.

Just for clarity and to hopefully help with the issue, below is the error I got in windows terminal.

[22-05-12 03:40:51.420]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: One or more errors occurred. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetPartialPdfModels(IList`1 htmlFilePaths)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.ConvertOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
---> (Inner Exception #0) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #1) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #2) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #3) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #4) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #5) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #6) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #7) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<

Cheers!!

@icnocop
Copy link
Contributor

icnocop commented May 15, 2022

Hi @CalvinWilkinson.

  1. Is this an "issue" that is in the works on getting fixed and this is just a workaround?

I'm not exactly sure if this is an issue in docfx, wkhtmltopdf, or iTextSharp for example.

  1. I did not see anything about noStdin in the walkthrough or anything and stumbled on this issue for hours, if this is not a workaround and it is meant to be used like this, is the documentation/tutorial on the website going to be updated?

The tutorial seems to work without issues for some users, so I'm not exactly sure the tutorial is the actual issue.
I'm sure the docfx project maintainers will provide feedback to a pull request to update the tutorial if this is an issue.

  1. Is this a windows only thing? I did notice that somebody in the comments mentioned that they only ran into the issue with a windows runner with GitHub actions.

It could be a Windows only and/or a GitHub action only thing; sorry, I'm not exactly sure what the underlying issue is.
I've personally only used docfx on Windows.

I'm interested to know if the same error occurs when wkhtmltopdf is replaced with another compatible exe and noStdin: true is removed.

For example, I'm using https://github.com/icnocop/HtmlToPdf instead of wkhtmltopdf and it meets my requirements.
HtmlToPdf is not 100% compatible with wkhtmltopdf, and that's okay because I don't use all the features of wkhtmltopdf with docfx anyways.
Disclaimer: I'm the creator of https://github.com/icnocop/HtmlToPdf.

If HtmlToPdf works instead of wkhtmltopdf, then the issue seems to be in wkhtmltopdf or iTextSharp.

Thank you.

@CalvinWilkinson
Copy link

Ok. Sounds good.

Thanks for your response!!

@melanchall
Copy link

I have the same issue. Everything is OK locally, but have the problem within Azure Pipelines.

@CalvinWilkinson
Copy link

I have the same issue. Everything is OK locally, but have the problem within Azure Pipelines.

I have the issue locally and in GitHub actions.

@yufeih yufeih added the pdf Produce PDF as the output format label Dec 15, 2022
@yufeih
Copy link
Contributor

yufeih commented Nov 2, 2023

Addressed in v2.73.0 with a new PDF engine.

@yufeih yufeih closed this as completed Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pdf Produce PDF as the output format
Projects
No open projects
DocFX v2
  
To do
Development

No branches or pull requests

6 participants