Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom metadata #352

Closed
nicmeriano opened this issue Feb 12, 2020 · 4 comments
Closed

Custom metadata #352

nicmeriano opened this issue Feb 12, 2020 · 4 comments

Comments

@nicmeriano
Copy link

@Hopding is there a way to add custom metadata? I'm looking to embed some JSON data in a PDF in the form of XMP custom key value pairs. I noticed you only support editing the default XMP tags for PDFs.

@Hopding
Copy link
Owner

Hopding commented Feb 16, 2020

Hello @nicmeriano!

You certainly can add custom metadata. However, there are no high level APIs for this. Most readers are not capable of reading or editing XMP metadata, so most users of pdf-lib wouldn't derive value from this. Instead, pdf-lib exposes high-level APIs for setting the standard metadata that most readers support (which, actually, isn't stored in XMP format).

I've provided an example demonstrating how to embed XMP metadata in an older version of pdf-lib here: #55 (comment). Please take a look and see if it's what you're looking for. If it is, and you're unable to get it working in the latest version of pdf-lib, let me know and I'll provide an updated example here.

I hope this helps. Please let me know if you have any additional questions!

@Hopding Hopding closed this as completed Feb 16, 2020
@nicmeriano
Copy link
Author

@Hopding thanks for your quick response. I implemented your example and it seems to be working fine. That being said, does pdf-lib offer any tools to edit the PDF Info dictionary to add custom tags? As you mentioned, XMP metadata is extracted differently by different readers, however I'm trying to get my custom XMP fields to show up in Acrobat.

Thanks again.

@Hopding
Copy link
Owner

Hopding commented Feb 23, 2020

@nicmeriano None of the APIs provided by pdf-lib for editing the info dict deal with XMP data. Although, there is a getter on the PDFDocument class that returns the info dict (or creates one if none exists): https://github.com/Hopding/pdf-lib/blob/master/src/api/PDFDocument.ts#L797-L805. It's technically private right now, but that will likely change in a future release.

@piyushpd
Copy link

Hello @nicmeriano!

You certainly can add custom metadata. However, there are no high level APIs for this. Most readers are not capable of reading or editing XMP metadata, so most users of pdf-lib wouldn't derive value from this. Instead, pdf-lib exposes high-level APIs for setting the standard metadata that most readers support (which, actually, isn't stored in XMP format).

I've provided an example demonstrating how to embed XMP metadata in an older version of pdf-lib here: #55 (comment). Please take a look and see if it's what you're looking for. If it is, and you're unable to get it working in the latest version of pdf-lib, let me know and I'll provide an updated example here.

I hope this helps. Please let me know if you have any additional questions!

Hello @Hopding . This is Piyush this side. I am working on a project to capture duplicate Hospital Prescription. Patients would submit the same prescription again and again and claim the amount spent. They would just change the patient name , age,etc, these details are printed at the top of the prescription sheet, and the doctor's diagnostic or medicines are written below that. Assume a prescription is divided in two parts- header and body. The header contains the details of the patient like name, age, address, etc. and the body would contain the doctor's handwritten diagnostic and medicines. Inorder to capture duplicate prescription, I split the prescription(which is always in pdf format) in two parts- header and body and calculate the SHA 256 hash only of the body. Now this SHA 265 value serves as my unique ID. So if the patient changes anything in the header part and submits it again, thinking it would be a new prescription and he would again get the claim, it wont because the body(hand written part) hasnt changed and so it will again give the same SHA256 value. I can compare it in my database and throw an error if it matches.

The problem is that the metadata is also coming in picture while calculating hash. If a patient edits the prescription and just changes the name, the metadata changes and that produces a unique hash. Then I hardcoded the metadata , see below code:
secondPartDoc.setTitle('NA');
secondPartDoc.setAuthor('NA');
secondPartDoc.setSubject('NA');
secondPartDoc.setKeywords(['NA']);
secondPartDoc.setCreator('NA');
secondPartDoc.setProducer('NA');
secondPartDoc.setLanguage('en-us');
secondPartDoc.setCreationDate(new Date('2018-06-24T01:58:37.228Z'));
secondPartDoc.setModificationDate(new Date('2019-12-21T07:00:11.000Z'));

I was hoping this would solve my problem, but it didnot. I got a different hash value again. So there are other metadata fields that need to be hardcoded. But pdf lib only gives the above mentioned fields. So i used exiftool to print all the metadata(ran the command in ubuntu) . below is the output:
exiftool edited_name.pdf
ExifTool Version Number : 11.88
File Name : edited_name.pdf
Directory : .
File Size : 544 kB
File Modification Date/Time : 2024:01:09 17:06:25+05:30
File Access Date/Time : 2024:01:11 17:38:51+05:30
File Inode Change Date/Time : 2024:01:11 17:38:46+05:30
File Permissions : rw-rw-r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.7
Linearized : No
Author : Amit Chandra
Create Date : 2023:12:07 11:40:39+05:30
Modify Date : 2024:01:09 17:01:25+05:30
XMP Toolkit : XMP Core 6.0.0
Creator : Amit Chandra
Title : Prescription Format_RGHS.pdf
Format : application/pdf
Producer : Microsoft: Print To PDF
Document ID : uuid:e5dbc5f4-a9cf-4e1b-997d-c8810325b49f
Instance ID : uuid:46561721-06ce-4efd-ab54-d73dab920115
Page Count : 1

My question is i want to set all the above fields to some default hardcoded value. The DOcumentID changes even if you type a word and then delete it inside the pdf. So my hash value is always different even for the same pdf document. Can you offer a solution here so that i can hardcode all the above metadata fields. Feel free to reach out to my whatsap +91-9916651980. or email- piyushdutta1@gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants