Custom metadata #352

nicmeriano · 2020-02-12T21:34:36Z

@Hopding is there a way to add custom metadata? I'm looking to embed some JSON data in a PDF in the form of XMP custom key value pairs. I noticed you only support editing the default XMP tags for PDFs.

Hopding · 2020-02-16T20:27:40Z

Hello @nicmeriano!

You certainly can add custom metadata. However, there are no high level APIs for this. Most readers are not capable of reading or editing XMP metadata, so most users of pdf-lib wouldn't derive value from this. Instead, pdf-lib exposes high-level APIs for setting the standard metadata that most readers support (which, actually, isn't stored in XMP format).

I've provided an example demonstrating how to embed XMP metadata in an older version of pdf-lib here: #55 (comment). Please take a look and see if it's what you're looking for. If it is, and you're unable to get it working in the latest version of pdf-lib, let me know and I'll provide an updated example here.

I hope this helps. Please let me know if you have any additional questions!

nicmeriano · 2020-02-17T22:37:03Z

@Hopding thanks for your quick response. I implemented your example and it seems to be working fine. That being said, does pdf-lib offer any tools to edit the PDF Info dictionary to add custom tags? As you mentioned, XMP metadata is extracted differently by different readers, however I'm trying to get my custom XMP fields to show up in Acrobat.

Thanks again.

Hopding · 2020-02-23T23:44:03Z

@nicmeriano None of the APIs provided by pdf-lib for editing the info dict deal with XMP data. Although, there is a getter on the PDFDocument class that returns the info dict (or creates one if none exists): https://github.com/Hopding/pdf-lib/blob/master/src/api/PDFDocument.ts#L797-L805. It's technically private right now, but that will likely change in a future release.

piyushpd · 2024-01-12T07:27:36Z

Hello @nicmeriano!

You certainly can add custom metadata. However, there are no high level APIs for this. Most readers are not capable of reading or editing XMP metadata, so most users of pdf-lib wouldn't derive value from this. Instead, pdf-lib exposes high-level APIs for setting the standard metadata that most readers support (which, actually, isn't stored in XMP format).

I've provided an example demonstrating how to embed XMP metadata in an older version of pdf-lib here: #55 (comment). Please take a look and see if it's what you're looking for. If it is, and you're unable to get it working in the latest version of pdf-lib, let me know and I'll provide an updated example here.

I hope this helps. Please let me know if you have any additional questions!

Hello @Hopding . This is Piyush this side. I am working on a project to capture duplicate Hospital Prescription. Patients would submit the same prescription again and again and claim the amount spent. They would just change the patient name , age,etc, these details are printed at the top of the prescription sheet, and the doctor's diagnostic or medicines are written below that. Assume a prescription is divided in two parts- header and body. The header contains the details of the patient like name, age, address, etc. and the body would contain the doctor's handwritten diagnostic and medicines. Inorder to capture duplicate prescription, I split the prescription(which is always in pdf format) in two parts- header and body and calculate the SHA 256 hash only of the body. Now this SHA 265 value serves as my unique ID. So if the patient changes anything in the header part and submits it again, thinking it would be a new prescription and he would again get the claim, it wont because the body(hand written part) hasnt changed and so it will again give the same SHA256 value. I can compare it in my database and throw an error if it matches.

The problem is that the metadata is also coming in picture while calculating hash. If a patient edits the prescription and just changes the name, the metadata changes and that produces a unique hash. Then I hardcoded the metadata , see below code:
secondPartDoc.setTitle('NA');
secondPartDoc.setAuthor('NA');
secondPartDoc.setSubject('NA');
secondPartDoc.setKeywords(['NA']);
secondPartDoc.setCreator('NA');
secondPartDoc.setProducer('NA');
secondPartDoc.setLanguage('en-us');
secondPartDoc.setCreationDate(new Date('2018-06-24T01:58:37.228Z'));
secondPartDoc.setModificationDate(new Date('2019-12-21T07:00:11.000Z'));

I was hoping this would solve my problem, but it didnot. I got a different hash value again. So there are other metadata fields that need to be hardcoded. But pdf lib only gives the above mentioned fields. So i used exiftool to print all the metadata(ran the command in ubuntu) . below is the output:
exiftool edited_name.pdf
ExifTool Version Number : 11.88
File Name : edited_name.pdf
Directory : .
File Size : 544 kB
File Modification Date/Time : 2024:01:09 17:06:25+05:30
File Access Date/Time : 2024:01:11 17:38:51+05:30
File Inode Change Date/Time : 2024:01:11 17:38:46+05:30
File Permissions : rw-rw-r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.7
Linearized : No
Author : Amit Chandra
Create Date : 2023:12:07 11:40:39+05:30
Modify Date : 2024:01:09 17:01:25+05:30
XMP Toolkit : XMP Core 6.0.0
Creator : Amit Chandra
Title : Prescription Format_RGHS.pdf
Format : application/pdf
Producer : Microsoft: Print To PDF
Document ID : uuid:e5dbc5f4-a9cf-4e1b-997d-c8810325b49f
Instance ID : uuid:46561721-06ce-4efd-ab54-d73dab920115
Page Count : 1

My question is i want to set all the above fields to some default hardcoded value. The DOcumentID changes even if you type a word and then delete it inside the pdf. So my hash value is always different even for the same pdf document. Can you offer a solution here so that i can hardcode all the above metadata fields. Feel free to reach out to my whatsap +91-9916651980. or email- piyushdutta1@gmail.com

Hopding closed this as completed Feb 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom metadata #352

Custom metadata #352

nicmeriano commented Feb 12, 2020

Hopding commented Feb 16, 2020

nicmeriano commented Feb 17, 2020

Hopding commented Feb 23, 2020

piyushpd commented Jan 12, 2024

Custom metadata #352

Custom metadata #352

Comments

nicmeriano commented Feb 12, 2020

Hopding commented Feb 16, 2020

nicmeriano commented Feb 17, 2020

Hopding commented Feb 23, 2020

piyushpd commented Jan 12, 2024