# Writing Data in PDFs

Alejandro Ricciardi (Omegapy)  
created date: 01/17/2024   
[GitHub](https://github.com/Omegapy)  

Credit: 
[Control PDF with Python & PyPDF2](https://www.udemy.com/course/control-pdf-with-python-pypdf2) Udemy - Conny Soderholm
The original code was substantially modified from PyPDF2 to PyPDF v3.17.4, to meet my requirements, and to add functionally to the program.

Projects Description:  
Using ```PdfFileWriter``` class to add:
- Javascript
- Bookmarks
- Links
- Metadata
- Fill in form fields
- Open password protected files

The [PdfFileWriter](https://pypdf.readthedocs.io/en/stable/modules/PdfWriter.html?highlight=PdfFileWriter%20class) writes PDF files out, given pages produced by another class.
Typically data is added from a ```PdfReader``` object.


Project map:
- Add Javascript -```pdf_writer.add_js(Javascript Code)```-
- Encrypting and Decrypting (Password Protection)
    - Encrypting PDF -```pdf_writer.encrypt("my-secret-password", algorithm="AES-256")```-
    - Decrypting PDF -```if reader.is_encrypted: pdf_reader.decrypt("my-secret-password")``` -
- Filling In Fields -```pdf_writer.update_page_form_field_values(page, field_dictionary)```-
- Adding Bookmarks -```pdf_writer.add_outline_item(title="Go to page 1", page_number=0, color=(0.1,0.1,0.5)) # Index 0 is page 1```-


In [1]:
from pypdf import PdfReader, PdfWriter
import subprocess
import os

### Add Javascript
```add_js(javascript: str)→ None```


In [2]:
pdf_reader = PdfReader("docs/p17.pdf")
# Creat an object writer
pdf_writer = PdfWriter()


# Opens the PDF editor print window 
#pdf_writer.add_js("this.print({bUI:true,bSilent:false,bShrinkToFit:true});")

# Opens a messagebox
#pdf_writer.add_js(r'app.alert("\nHello World!\nUsing PyPDF to manipulate PDFs\n")' )

# Activates editor fullscreen mode
pdf_writer.add_js(r'app.fs.isFullScreen = true;')

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)

# Save the new PDF to disk
with open("Manipulated PDFs/p17_add_javascript.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/p17_add_javascript.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/p17_add_javascript.pdf']>

### Encrypting and Decrypting (Password protection)
[Encrypt and Decrypt](https://pypdf.readthedocs.io/en/stable/user/encryption-decryption.html?highlight=encrypt#encrypt), You can encrypt and decrypt a PDF by using a password

Encrypt: ```pdf_writer.encrypt("my-secret-password", algorithm="AES-256")```
Decrypt: ```if reader.is_encrypted: pdf_reader.decrypt("my-secret-password")``` 

Note: The algorithm can be one of RC4-40, RC4-128, AES-128, AES-256-R5, AES-256. PyPdf recommends using AES-256-R5.

##### Encrypting PDF

In [3]:
pdf_reader = PdfReader("docs/p17.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)

# Encrypts the writer object
pdf_writer.encrypt("password", algorithm="AES-256")

# Save the encrypted PDF to disk
with open("Manipulated PDFs/p17_encrypted_password.pdf", "wb") as f:
        pdf_writer.write(f)


##### Decrypting PDF

In [4]:
pdf_reader = PdfReader("Manipulated PDFs/p17_encrypted_password.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Decrypts the reader object using a password
if pdf_reader.is_encrypted:
    pdf_reader.decrypt("password")

# Adds all decrypted reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)

# Creates a temp decrypted PDF in disk
with open("Manipulated PDFs/~Temp-p17_decrypted_password.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/~Temp-p17_decrypted_password.pdf"],shell=True)

# Manages the temp decrypted PDF open until close by user and then deletes the temp decrypted PDF
import time # allows time for the PDF editor to open the PDF file
time.sleep(1)
while True: # Waiting loop 
    try:
        # If file is open by another process it will generate a Permission error
        temp_file = open("Manipulated PDFs/~Temp-p17_decrypted_password.pdf", "w") # To generate an error the file needs to be opened in writing mode
        # if no exceptions then, the file is not open by another process:
        temp_file.close()
        os.remove("Manipulated PDFs/~Temp-p17_decrypted_password.pdf")
        break # Exists waiting loop
    except PermissionError:
        '''
            The file is open by another process
            Do Nothing
            Waiting for the user to close pdf file and delete it
        '''


### Filling In Fields
https://pypdf.readthedocs.io/en/stable/modules/Field.html
Field can be field by using the method ```pdf_writer.update_page_form_field_values(page, field_dictionary)```

In [5]:
pdf_reader = PdfReader("docs/APPLICATION FOR TAX CARD.pdf", strict=False)
# Creat an object writer
pdf_writer = PdfWriter()

# Gets the fields
# get_fields() returns a dictionary
fields = pdf_reader.get_fields()

# Prints the fields Information
for field in fields:
    field_type = fields[field].field_type
    name = fields[field].name
    value = fields[field].value
    
    print(field_type, name,  value)

# The writer fields are the one that need to be modified
# The clone_reader_document_root method clones all the root from the reader, like forms and all the pages
pdf_writer.clone_reader_document_root(pdf_reader)
# Fill in field
pdf_writer.update_page_form_field_values(pdf_writer.pages[0], {"020":"Alex Ricciardi"})

# Save the new PDF to disk
with open("Manipulated PDFs/Field_in_fields_APPLICATION FOR TAX CARD.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Field_in_fields_APPLICATION FOR TAX CARD.pdf"],shell=True)

/Tx 020 None
/Tx 477 None
/Btn 476 None
/Btn 479 None
/Btn 478 None
/Tx 488 None
/Tx s488 None
/Tx 489 None
/Tx s489 None
/Tx 562 None
/Tx s562 None
/Tx 563 None
/Tx s563 None
/Tx 564 None
/Tx s564 None
/Tx 565 None
/Tx s565 None
/Tx 566 None
/Tx s566 None
/Tx 567 None
/Tx s567 None
/Tx 568 None
/Tx s568 None
/Tx 569 None
/Tx s569 None
/Btn 481 None
/Btn 572 None
/Btn 573 None
/Tx 482 None
/Tx s482 None
/Tx 483 None
/Tx s483 None
/Tx 484 None
/Tx s484 None
/Tx 574 None
/Tx 575 None
/Tx s575 None
/Tx 576 None
/Tx s576 None
/Tx s574 None
/Tx 010 None
/Tx 053 None
/Tx 580;1 None
/Tx 581;1 None
/Tx 582;1 None
/Tx 583;1 None
/Tx 584;1 None
/Tx s584;1 None
/Tx 585;1 None
/Tx s585;1 None
/Tx 586;1 None
/Tx s586;1 None
/Tx 580;2 None
/Tx 581;2 None
/Tx 582;2 None
/Tx 583;2 None
/Tx 584;2 None
/Tx s584;2 None
/Tx 585;2 None
/Tx s585;2 None
/Tx 586;2 None
/Tx s586;2 None
/Tx 587;1 None
/Tx 588;1 None
/Tx 589;1 None
/Tx 590;1 None
/Tx 591;1 None
/Tx s591;1 None
/Tx 592;1 None
/Tx s592;1 None
/Tx 

<Popen: returncode: None args: ['Manipulated PDFs/Field_in_fields_APPLICATIO...>

### Adding Bookmarks

```pdf_writer.add_outline_item(title="Go to page 1", page_number=0, color=(0.1,0.1,0.5)) # Index 0 is page 1```

Parameters
- **title:** Title to use for this outline item.
- **page_number:** Page number this outline item will point to.
- **parent:** A reference to a parent outline item to create nested outline items.
- **color:** Color of the outline item’s font as a red, green, blue tuple from 0.0 to 1.0
- **bold:** Outline item font is bold
- **italic:** Outline item font is italic
- **fit:** The fit of the destination page.

In [6]:
pdf_reader = PdfReader("docs/CRISPR–Cas9.pdf", strict=False)
# Creat an object writer
pdf_writer = PdfWriter()

print(f"Number of pages: {len(pdf_reader.pages)}")

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)
# The clone_reader_document_root method clones all the root from the reader, like forms and all the pages
# pdf_writer.clone_reader_document_root(pdf_reader)

for i in range(0, len(pdf_reader.pages)):
    if i != len(pdf_reader.pages) - 1:
        pdf_writer.add_outline_item(title=f"Go to page {i+1}", page_number=i, color=(0.1,0.1,0.5), bold=True )
        # or
        #pdf_writer.add_outline_item(title=f"Go to page {i+1}", page_number=1, color=(0,0,1), italic=True )
    else:
        parent = pdf_writer.add_outline_item(title=f"Go to page {i+1}", page_number=i, color=(0.1,0.1,0.5), bold=True )

parent = pdf_writer.add_outline_item(title=f"Go to page 1 again", parent=parent, page_number=0, color=(0.1,0.1,0.5), bold=True )

# Save the new PDF to disk
with open("Manipulated PDFs/CRISPR–Cas9_Bookmarks.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/CRISPR–Cas9_Bookmarks.pdf"],shell=True)

Number of pages: 12


<Popen: returncode: None args: ['Manipulated PDFs/CRISPR–Cas9_Bookmarks.pdf']>