Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge PDF page number error #5

Open
aMagicalpole opened this issue Jun 8, 2023 · 2 comments
Open

Merge PDF page number error #5

aMagicalpole opened this issue Jun 8, 2023 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@aMagicalpole
Copy link

aMagicalpole commented Jun 8, 2023

Question:

image

Page number is not merge, Is there any way to combine the page numbers or customize the page numbers, Thank you very much

Code:

const footerTemplate = `<div style="margin-bottom: -0.4cm; height: 70%; width: 100%; display: flex; justify-content: space-between; align-items: center; color: lightgray; border-top: solid lightgray 1px; font-size: 10px;">
	<span style="margin-left: 15px;" class="url"></span><span style="margin-right: 15px;"><span class="pageNumber"></span>/<span class="totalPages"></span></span
</div>`;
@condorheroblog
Copy link
Owner

condorheroblog commented Jun 8, 2023

The PDF file format is all about producing the desired visual result for printing. It was not created for parsing the content. PDF files don’t contain a semantic layer.

Specifically, there is no information what the header, footer, page numbers, tables, and paragraphs are. The visual appearence is there and people might find heuristics to make educated guesses, but there is no way of being certain.

This is a shortcoming of the PDF file format.
https://pypdf.readthedocs.io/en/stable/user/extract-text.html#missing-semantic-layer

The description language in PDF format is very similar to HTML, with the only drawback being its lack of semantics,It describes the content of PDF pages through objects, such as the following example:

3 0 obj
<< /Filter /FlateDecode /Length 191 >>
stream
x�]��
�@��}�9������& ���<�
>�VD�J���7��QrH�f�/���`��xS0ؑa������uO���g�{��
����H��&֐�a���#O8"�`:E��W�]7�a����}i� |e*)��c6���P� 6H��4�[(P��������a�
�b�Aoë�6�c���G�NMJWܯ�t�#���
�\+��h�>>
endstream
endobj
1 0 obj
<< /Type /Page /Parent 2 0 R /Resources 4 0 R /Contents 3 0 R /MediaBox [0 0 595.28 841.89]
>>
endobj
4 0 obj
<< /ProcSet [ /PDF /Text ] /ColorSpace << /Cs1 5 0 R >> /Font << /TT1 6 0 R
>> >>
endobj

So many PDF parsing libraries cannot extract page numbers, and I cannot modify page numbers when merging PDFs. I have been thinking for a long time without a solution.

However, there is an imperfect solution, which is to turn off page numbers when generating PDF, but leave room for page numbers and add them yourself. Here is an example:
d26383d#diff-79cab662fb8d5527d226a743033ffdfd879fcb65489faa6eabe35ca25a7906d5

import { readFileSync, writeFileSync } from "node:fs";
import { PDFDocument, StandardFonts, rgb } from "pdf-lib";

const existingPdfBytes = readFileSync("./vitepress.dev.pdf");
const pdfDoc = await PDFDocument.load(existingPdfBytes);
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);

const pages = pdfDoc.getPages();
const totalPages = pages.length;

for (let i = 0; i < totalPages; i++) {
	const page = pages[i];
	const { width } = page.getSize();
	const text = `${i + 1} / ${totalPages}`;
	const fontSize = 9;
	const textX = width - 50;
	const textY = fontSize;
	page.drawText(text, {
		x: textX,
		y: textY + 5,
		size: fontSize,
		font: helveticaFont,
		color: rgb(127 / 256, 127 / 256, 127 / 256),
	});
}

const pdfBytes = await pdfDoc.save();
writeFileSync("pagination.pdf", pdfBytes);

It's not perfect, but it's good.

@condorheroblog condorheroblog added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jun 8, 2023
@asinghvi17
Copy link

I know that Cairo at least supports page labels. Perhaps pdf-lib does as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants