Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghostscript - completely REMOVE METADATA from pdf files #114

Closed
Geo-Van opened this issue Sep 19, 2023 · 5 comments
Closed

Ghostscript - completely REMOVE METADATA from pdf files #114

Geo-Van opened this issue Sep 19, 2023 · 5 comments

Comments

@Geo-Van
Copy link

Geo-Van commented Sep 19, 2023

Hello,
Please let me ask regarding Ghostscript - completely REMOVE METADATA from pdf files.

(1) With the help of the pdfmark.txt (txt file containing the below info and saved in same directory as the Ghostscript executable (gs.exe), we can use the following command to completely remove any metadata from a pdf file:
Command: gs.exe -o output.pdf -sDEVICE=pdfwrite input.pdf pdfmark.txt

Note the pdfmark.txt file content:

[ /Title ()
/Author ()
/Subject ()
/Creator ()
/ModDate ()
/Producer ()
/Keywords ()
/CreationDate ()
/DOCINFO pdfmark

Please let me ask:
(a) The above method of removing metadata is completely permanent - irreversible? Or is there a way to reverse
(get back the original metadata), please?
(b) The above method used to work 100% in previous Ghostscript releases including the PRODUCER name.
But unfortunately in latest versions, the PRODUCER name can not be changed - ALWAYS writes GHOSTSCRIPT as
producer name. It seems that IGNORES the command /Producer () from the pdfmark.txt
Is there anything we can do in order to change the PRODUCER name, please?

(2) XMP metadata:
How can we also completely remove any XMP metadata from a pdf file, please?
I use the following SINGLE pdfmark.txt , but it seems that creates a "new" XMP metadata:

[ /Title ()
/Author ()
/Subject ()
/Creator ()
/ModDate ()
/Producer ()
/Keywords ()
/CreationDate ()
/DOCINFO pdfmark

[ /XML ()
/Ext_Metadata pdfmark

Please let me ask:
(a) Is the above pdfmark.txt correct, please?
(b) If yes, then why the created new pdf file it seems that creates a "new" XMP metadata?

We are looking forward for your reply.
Many thanks!

@jhabjan
Copy link
Contributor

jhabjan commented Sep 19, 2023

Try to use -dOmitXMP=true

@Geo-Van
Copy link
Author

Geo-Van commented Sep 19, 2023

Thanks for your reply, but please note that:
-dOmitXMP=true
according to documentation, it is required when producing PDF/A output .

But, please note that my question refers to NOT PDF/A output . (sorry i did not mention it before).

So, any more ideas, please?

*By the way is the syntax of pdfmark.txt correct? (INCLUDING ZERO XMP metadata):

[ /Title ()
/Author ()
/Subject ()
/Creator ()
/ModDate ()
/Producer ()
/Keywords ()
/CreationDate ()
/DOCINFO pdfmark

[ /XML ()
/Ext_Metadata pdfmark

@jhabjan
Copy link
Contributor

jhabjan commented Sep 21, 2023

You cannot remove the XMP metadata from a conforming PDF/A file, it's a requirement of the specification. Other PDF types may be similar, and Ghostscript will, of course, always insert XMP if required by the specification.

You can't touch the Producer key/value pair in the Info dictionary, Ghostscript always write the Producer and it's always set to "Ghostscript". This is to prevent people changing it and passing off Ghostscript produced PDF files as their own work.

NOTE:
This is Ghostscript.NET repository, which serves as a .NET wrapper for the core Ghostscript library. If you have questions about the core Ghostscript functionality, I suggest seeking assistance in the #ghostscript Discord channel.

@jhabjan jhabjan closed this as completed Sep 21, 2023
@Geo-Van
Copy link
Author

Geo-Van commented Sep 21, 2023

Thank you very much for your reply.

So, if i get it correct:

If i use Ghostscript to REMOVE ALL metadata from a pdf file by using the pdfmark.txt (as described at the end of this comment),
XMP metadata will be created inside the new pdf file, but it will be an "EMPTY" XMP metadata ("empty" - means that contains no metadata info - it contains only the metadata XMP structure which is the same for every pdf file).
So actually the command: gs.exe -o output.pdf -sDEVICE=pdfwrite input.pdf pdfmark.txt
will REMOVE ALL metadata including the XMP metadata.

Do i understand it correct, please?

NOTE:
The following pdfmark.txt , will remove all metadata but it will create a "new" "EMPTY" XMP metadata:
(note that the [ /XML () /Ext_Metadata pdfmark ---> is NOT needed.

[ /Title ()
/Author ()
/Subject ()
/Creator ()
/ModDate ()
/Producer ()
/Keywords ()
/CreationDate ()
/DOCINFO pdfmark

@Geo-Van
Copy link
Author

Geo-Van commented Sep 27, 2023

Please see
#117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants