# PyPDF PDF Manipulation

Alejandro Ricciardi (Omegapy)  
created date: 01/10/2024   
[GitHub](https://github.com/Omegapy)  

Credit: 
[Control PDF with Python & PyPDF2](https://www.udemy.com/course/control-pdf-with-python-pypdf2) Udemy - Conny Soderholm
The original code was substantially modified from PyPDF2 to PyPDF v3.17.4, to meet my requirements, and to add functionally to the program.

Projects Description:  
Using PyPDF to manipulate PDF files.
- How to work with pages
- How to scale, rotate, crop, clip, and watermark pages
- How to split and join pages
- How to read a pdf to memory instead of having to write to disk

The [PageOject Class](https://pypdf.readthedocs.io/en/stable/modules/PageObject.html?highlight=add_transformation#the-pageobject-class) represents a single page within a PDF file. 

Typically this object will be created by accessing the ```pdf_reader_object.pages``` or ```pdf_writer_object.pages```, a list of all the pages, a page is a PageOject Class which is a subclass of the PdfReader and PdfWriter classes, but it is also possible to create an empty page with the ```create_blank_page()```.


Project map:
- Transformation Matrix 
    - Sheer Transformation x Axis To The Right -```page.add_transformation( (1,0,.5,1,0,0) )```-
    - Sheer Transformation y Axis up -```page.add_transformation( (scale,0,0,scale,0,0) )```-
    - Scaling Pages -```page.add_transformation( (scale,0,0,scale,0,0) )```-
    - More Transformation Using the Transformation Matrix Method
- Rotated Page -```page.rotate(90)```-
- Creating Blank Pages
    - Stand alone blank page -```blank_page = pages[0].create_blank_page(pdf=pdf_reader, width=None, height=None)```-
    - Add a blank page at a particular index -```pdf_writer.insert_blank_page(None, None, 2)```-
    - Add a blank page at the end of a PDF -```pdf_writer.add_blank_page()```-
- Splitting PDFs
    - Splitting Documents in Half Or Thirds -```pdf_writer_0_to_5.append( pdf_reader, pages=(0, 5) )```-
    - Splitting Documents by Individual Pages -```pdf_writer_even.append( pdf_reader, pages=[0,2,4,6,8,10] )```-
- Merging PDFs -```for pdf in PDFList : pdf_writer_merged.append(pdf)```-
- PDF Boxes
    - Display Boxes Boundaries Functions
    - Format Page From A5 to A5 
- Clipping - Merge Pages -```page.merge_page(overlay_page)```-
- Water Marking - Merge Pages -```page.merge_page(page_watermark, True, False)```-
- Read PDF From Memory
- Decreasing PDF file size - PDF Compression-```page.compress_content_streams()```-

In [1]:
from pypdf import PdfReader, PdfWriter
import subprocess

### Transformation Matrix

[The Transformation Class](https://pypdf.readthedocs.io/en/stable/modules/Transformation.html)

Represent a 2D transformation.

The transformation between two coordinate systems is represented by a 3-by-3 transformation matrix matrix with the following form:

a b **0**
c d **0**
e f **1**

Because a transformation matrix has only six elements that can be changed,  
it is usually specified in PDF as the six-element array ```[ a c b d e f ]```.
```page.add_transformation( (a,c,b,b,e,f) )```

Coordinate transformations are expressed as matrix multiplications:

                           
 $\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ a b **0**
[ x′ y′ 1 ] = [ x y 1 ] × c d **0**
$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ e f **1**


<p></p>
<img src="pics/2D Matrix Transformation.png" alt="Transformation Matrix"  width="863" height="483.2"/>


##### Sheer Transformation x Axis To The Right

In [2]:
pdf_reader = PdfReader("docs/camera.pdf")

# Creat an object writer
pdf_writer_x = PdfWriter()

# Get all the pages
pages = pdf_reader.pages

for page in pages:
    page.add_transformation( (1,0,.5,1,0,0) ) # Apply a transformation matrix to the page.
    pdf_writer_x.add_page(page)

# Save the new PDF to disk
with open("Manipulated PDFs/camera_sheer_x_right.pdf", "wb") as f:
        pdf_writer_x.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/camera_sheer_x_right.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/camera_sheer_x_right.pdf']>

##### Sheer Transformation y Axis up

In [3]:
pdf_reader = PdfReader("docs/camera.pdf")

# Creat an object writer
pdf_writer_y = PdfWriter()

# Get all the pages
pages = pdf_reader.pages

# entire document is transforms
for page in pages:
    page.add_transformation( (1,0.5,0,1,0,0) ) # Apply a transformation matrix to the page.
    pdf_writer_y.add_page(page)

# Save the new PDF to disk
with open("Manipulated PDFs/camera_sheer_y_up.pdf", "wb") as f:
        pdf_writer_y.write(f)

# Opens PDF
subprocess.Popen(["Manipulated PDFs/camera_sheer_y_up,pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/camera_sheer_y_up,pdf']>

##### Scale Transformation Using the Transformation Matrix Method

In [4]:
pdf_reader = PdfReader("docs/camera.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Get all the pages
pages = pdf_reader.pages

# scale factor
scale = 0.5

for page in pages:
    page.add_transformation( (scale,0,0,scale,0,0) )# Apply a transformation matrix to the page.
    pdf_writer.add_page(page)

# Save the new PDF to disk
with open("Manipulated PDFs/camera_T_scaledby_05.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/camera_T_scaledby_05.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/camera_T_scaledby_05.pdf']>

##### More Transformation Using the Transformation Matrix Method

In [5]:
 # Default units are points or pt
# 1 pt = 1/72 inch
# 1 pt = 0.0352778 cm


# move 100 points from the left side to the right side
page.add_transformation( (1,0,0,1,100,0) )

# move 3 (216/72) inches from the top to the bottom
page.add_transformation( (1,0,0,1,0,-216) )

# scale the document 0.5 times smaller
page.add_transformation( (0.5,0,0,0.5,0,0) )

# scale the document 2 times larger
page.add_transformation( (2,0,0,2,0,0) )

# rotate the document 30 degrees counterclockwise (cos θ sin θ −sin θ cos θ)
page.add_transformation( (0.87, 0.5, -0.5, 0.87, 0, 0) )

# rotate the document 60 degrees counterclockwise (cos θ sin θ −sin θ cos θ)
page.add_transformation( (0.5, 0.8660, -0.8660, 0.5, 0, 0) )

# Skew the document 30 degrees in X (1 tan α tan β 1 0 0)
page.add_transformation( (1, 0.5773, 0, 1, 0, 0) )

# Skew the document 30 degrees in Y (1 tan α tan β 1 0 0)
page.add_transformation( (1, 0, 0.5773, 1, 0, 0) )
    

### Rotated Page
Rotates by increments of 90 degrees

In [6]:
pdf_reader = PdfReader("docs/camera.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Get all the pages
pages = pdf_reader.pages

for page in pages:
    page.rotate(90)
    pdf_writer.add_page(page)

# Save the new PDF to disk
with open("Manipulated PDFs/camera_rotated_clockwise_90.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/camera_rotated_clockwise_90.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/camera_rotated_clockwise_9...>

### Creating and Adding Blank Pages

Default units are points or pt
 1 pt = 1/72 inch
1 pt = 0.0352778 cm = 0.352778mm
200 pt = 70.5556 mm


##### Stand alone blank page

In [7]:
pdf_reader = PdfReader("docs/Pages.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# gets all the pages from the writer
pages = pdf_reader.pages

# Creates a blank page as the same size as page[0]
blank_page = pages[0].create_blank_page(pdf=pdf_reader, width=None, height=None)
# Adds blank page to writer
pdf_writer.add_page(blank_page)

# Save the new PDF to disk
with open("Manipulated PDFs/blank_page.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/blank_page.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/blank_page.pdf']>

##### Add a blank page at a particular index

Creates a blank page. If no page size is specified, use the size of the page oject.


In [8]:
pdf_reader = PdfReader("docs/Pages.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)
# Adds a blank page at index 2 (page-3)
pdf_writer.insert_blank_page(None, None, 2)
        
# Save the new PDF to disk
with open("Manipulated PDFs/Pages_add_blankpage_atindex.pdf", "wb") as f:
        pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_add_blankpage_atindex.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/Pages_add_blankpage_atinde...>

##### Add a blank page at the end of a PDF

In [9]:
pdf_reader = PdfReader("docs/Pages.pdf")
# Creat an object writer
pdf_writer = PdfWriter()

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)
# Adds a blank page at the end of the PDF
pdf_writer.add_blank_page()

# Save the new PDF to disk
with open("Manipulated PDFs/Pages_add_blankpage_atenddoc.pdf", "wb") as f:
    pdf_writer.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_add_blankpage_atenddoc.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/Pages_add_blankpage_atendd...>

### Splitting PDFs

##### Splitting Documents in Half Or Thirds

In [10]:
pdf_reader = PdfReader("docs/Pages.pdf")
# Creat an object writer
pdf_writer_0_to_5 = PdfWriter()
pdf_writer_6_to_10 = PdfWriter()

# Splits page 1 to 5 from PDF
pdf_writer_0_to_5.append( pdf_reader, pages=(0, 5) )
# Splits page 6 to 10 from PDF
pdf_writer_6_to_10.append( pdf_reader, pages=(5, len(pdf_reader.pages)) )

# Save the new PDF to disk
with open("Manipulated PDFs/Pages_split_pages-1-5.pdf", "wb") as f:
    pdf_writer_0_to_5.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_split_pages-1-5.pdf"],shell=True)
# Save the new PDF to disk
with open("Manipulated PDFs/Pages_split_pages-6-10.pdf", "wb") as f:
    pdf_writer_6_to_10.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_split_pages-6-10.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/Pages_split_pages-6-10.pdf']>

##### Splitting Documents by Extracting individual pages

In [11]:
pdf_reader = PdfReader("docs/Pages.pdf")
# Creat an object write
pdf_writer_even = PdfWriter()
pdf_writer_odd = PdfWriter()

# Splits page 1 to 5 from PDF
# pdf_writer_even.append( pdf_reader, pages=[0,2,4,6,8,10] )
pdf_writer_even.append( pdf_reader, pages=list(filter(lambda x: (x%2==0),range(0,len(pdf_reader.pages)))) )
# Splits page 6 to 10 from PDF
pdf_writer_odd.append( pdf_reader, pages=list(filter(lambda x: (x%2!=0),range(0,len(pdf_reader.pages)))) )

# Save the new PDF to disk
with open("Manipulated PDFs/Pages_split_even_pages.pdf", "wb") as f:
    pdf_writer_even.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_split_even_pages.pdf.pdf"],shell=True)
# Save the new PDF to disk
with open("Manipulated PDFs/Pages_split_odd_pages.pdf", "wb") as f:
    pdf_writer_odd.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/Pages_split_odd_pages.pdf.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/Pages_split_odd_pages.pdf....>

### Merging PDFs

In [12]:
pdf_reader_camera = PdfReader("docs/camera.pdf")
pdf_reader_workstation = PdfReader("docs/WorkStation.pdf")
pdf_reader_pages = PdfReader("docs/Pages.pdf")

# Writer object
pdf_writer_merged = PdfWriter()

# Merge PDFs
for pdf in [pdf_reader_camera , pdf_reader_workstation, pdf_reader_pages]:
    pdf_writer_merged.append(pdf)
# Save the new PDF to disk
with open("Manipulated PDFs/merged_pdf.pdf", "wb") as f:
    pdf_writer_merged.write(f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/merged_pdf.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/merged_pdf.pdf']>

### PDF Boxes

See [The PDF page boxes](https://www.prepressure.com/pdf/basics/page-boxes)

- **Media Box:** The MediaBox is used to specify the width and height of the page. Actual page size.
- **The BleedBox:** determines the region to which the page contents needs to be clipped when output in a production environment. Usually the BleedBox is 3 to 5 millimeters larger than the trimbox.
- **The TrimBox:** defines the intended dimensions of the finished page. Contrary to the cropbox, the trimbox is very important because it defines the actual page size that gets printed. 
- **The CropBox:** defines the region that the PDF viewer application is expected to display or print. (The visible area of the page when you open the PDF)
- **The ArtBox** is a bit of a special case. It was originally added to indicate the area covered by the artwork of the page.

**Note:** Only the Media Box is required all the other ones are options

<p></p>
<img src="pics/PDF boxes.png" alt="PDF boxes"  width="400" height="500"/>

###### Display Boxes Boundaries Functions

In [13]:
def print_page_media_box_boundaries(p):
    
    print("Media box:")
    print("upper_right XY", page.mediabox.right, page.mediabox.top)
    print("upper_left XY", page.mediabox.left, page.mediabox.top)
    print("LowerRight XY", page.mediabox.right, page.mediabox.bottom)
    print("LowerLeft XY", page.mediabox.left, page.mediabox.bottom)
    print()  
    
def print_page_crop_box_boundaries(p):
    try:
        print("Crop box:")
        print("upper_right XY", page.cropbox.right, page.cropbox.top)
        print("upper_left XY", page.cropbox.left, page.cropbox.top)
        print("LowerRight XY", page.cropbox.right, page.cropbox.bottom)
        print("LowerLeft XY", page.cropbox.left, page.cropbox.bottom)
        print()
    except AttributeError:
        print("No Crop box defined")
        print()
    
def print_page_bleed_box_boundaries(p):
    try:
        print("Bleed box:")
        print("upper_right XY", page.bleedbox.right, page.bleedbox.top)
        print("upper_left XY", page.bleedbox.left, page.bleedbox.top)
        print("LowerRight XY", page.bleedbox.right, page.bleedbox.bottom)
        print("LowerLeft XY", page.bleedbox.left, page.bleedbox.bottom)
        print()
    except AttributeError:
        print("No Bleed box defined")
        print()
    
def print_page_trim_box_boundaries(p):
    try:
        print("Trim box:")
        print("upper_right XY", page.trimbox.right, page.trimbox.top)
        print("upper_left XY", page.trimbox.left, page.trimbox.top)
        print("LowerRight XY", page.trimbox.right, page.trimbox.bottom)
        print("LowerLeft XY", page.trimbox.left, page.trimbox.bottom)
        print()
    except AttributeError:
        print("No Trim box defined")
        print()
    
def print_page_art_box_boundaries(p):
    try:
        print("Art box:")
        print("upper_right XY", page.artbox.right, page.artbox.top)
        print("upper_left XY", page.artbox.left, page.artbox.top)
        print("LowerRight XY", page.artbox.right, page.artbox.bottom)
        print("LowerLeft XY", page.artbox.left, page.artbox.bottom)
        print()
    except AttributeError:
        print("No Art box defined")
        print()

#### Format page from A5 to A5

In [14]:
# Load the pdf to the PdfReader object with default settings
pdf_reader = PdfReader("docs/camera.pdf")

# Writer object
pdf_writer = PdfWriter()
      
page = pdf_reader.pages[0]
print_page_media_box_boundaries(page)
print_page_crop_box_boundaries(page)
print_page_bleed_box_boundaries(page)
print_page_trim_box_boundaries(page)
print_page_art_box_boundaries(page)

# Change crop box dimensions
# X, Y
old_left_x = page.mediabox.left
old_right_x = page.mediabox.right
old_lower_y = page.mediabox.bottom
old_upper_y = page.mediabox.top


#Camera pdf is A4 size = 210x297mm
# Default units are points or pt
# 1 pt = 1/72 inch
# 1 pt = 0.0352778 cm = 0.352778mm
#A5 size = 148x210mm
#A5 size = 148/0.352778 x 210/0.352778
#RectangleObject(rect_x0, rect_y0, rect_x0, rect_y0)
#RectangleObject([old_left_x, old_lower_y, old_right_x, old_upper_y])
pt_mm = 0.352778

"""
X = 0 from left
Y = 0 from bottom
upper_left                             upper_right
left, top_____________________________ right, top
|                                              |
|                                              |
|                                              |
|                                              |
left, bottom________________________right, bottom
lower_left                             lower_right
"""

# scale the page before changing mediabox size
from math import sqrt
page.add_transformation( (sqrt(0.5),0,0,sqrt(0.5),0,0) ) # A4 to A5 scale factor 71% or square root of 0.5
#     
upper_y = round(210.0/pt_mm,2)
right_x = round(148.0/pt_mm,2)
page.mediabox.upper_left = (old_left_x, upper_y)
page.mediabox.upper_right = (right_x, upper_y)
page.mediabox.lower_left = (old_left_x , old_lower_y)
page.mediabox.lower_right = (right_x , old_lower_y)

crop = 50
page.cropbox.upper_left = (old_left_x+crop, upper_y-crop)
page.cropbox.upper_right = (right_x-crop, upper_y-crop)
page.cropbox.lower_left = (old_left_x+crop , old_lower_y+crop)
page.cropbox.lower_right = (right_x-crop , old_lower_y+crop)        

pdf_writer.add_page(page)
        
print("*"*20)
print_page_media_box_boundaries(page)
print_page_crop_box_boundaries(page)
print_page_trim_box_boundaries(page)

with open("Manipulated PDFs/camera_A4_to_A5.pdf", "wb") as out_f:
    pdf_writer.write(out_f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/camera_A4_to_A5.pdf"],shell=True)

Media box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0

Crop box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0

Bleed box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0

Trim box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0

Art box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0

********************
Media box:
upper_right XY 419.53 595.28
upper_left XY 0.0 595.28
LowerRight XY 419.53 0.0
LowerLeft XY 0.0 0.0

Crop box:
upper_right XY 369.53 545.28
upper_left XY 50 545.28
LowerRight XY 369.53 50
LowerLeft XY 50 50

Trim box:
upper_right XY 595.27576 841.89001
upper_left XY 0.0 841.89001
LowerRight XY 595.27576 0.0
LowerLeft XY 0.0 0.0


<Popen: returncode: None args: ['Manipulated PDFs/camera_A4_to_A5.pdf']>

### Cropping Pages

In [15]:
pdf_reader = PdfReader("docs/p17.pdf")
# Writer object
pdf_writer = PdfWriter()

page = pdf_reader.pages[0]
print("Media box")
print("UpperRight XY", page.mediabox.right, page.mediabox.top)
print("UpperLeft XY", page.mediabox.left, page.mediabox.top)
print("LowerRight XY", page.mediabox.right, page.mediabox.bottom)
print("LowerLeft XY", page.mediabox.bottom, page.mediabox.bottom)
print()

print("Crop box")
print("UpperRight XY", page.cropbox.right, page.cropbox.top)
print("UpperLeft XY", page.cropbox.left, page.cropbox.top)
print("LowerRight XY", page.cropbox.right, page.cropbox.bottom)
print("LowerLeft XY", page.cropbox.bottom, page.cropbox.bottom)
print()

# Crop permanently 100 points from the left side and 100 points from the top
#page.mediabox.lowerLeft = (100, 0)
#page.mediabox.upperRight = (595.276, 741.89)

# Crop 100 points from the left side and 100 points from the top
page.cropbox.lower_left = (100, 0)
page.cropbox.upper_right = (595.276, 741.89)

pdf_writer.add_page(page)

print("Media box after")
print("UpperRight XY", page.mediabox.right, page.mediabox.top)
print("UpperLeft XY", page.mediabox.left, page.mediabox.top)
print("LowerRight XY", page.mediabox.right, page.mediabox.bottom)
print("LowerLeft XY", page.mediabox.bottom, page.mediabox.bottom)
print()

print("Crop box after")
print("UpperRight XY", page.cropbox.right, page.cropbox.top)
print("UpperLeft XY", page.cropbox.left, page.cropbox.top)
print("LowerRight XY", page.cropbox.right, page.cropbox.bottom)
print("LowerLeft XY", page.cropbox.bottom, page.cropbox.bottom)
print()

# saves PDF
with open("Manipulated PDFs/p17_cropped.pdf", "wb") as out_f:
    pdf_writer.write(out_f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/p17_cropped.pdf"],shell=True)

Media box
UpperRight XY 612 792
UpperLeft XY 0 792
LowerRight XY 612 0
LowerLeft XY 0 0

Crop box
UpperRight XY 612 792
UpperLeft XY 0 792
LowerRight XY 612 0
LowerLeft XY 0 0

Media box after
UpperRight XY 612 792
UpperLeft XY 0 792
LowerRight XY 612 0
LowerLeft XY 0 0

Crop box after
UpperRight XY 595.276 741.89
UpperLeft XY 100 741.89
LowerRight XY 595.276 0.0
LowerLeft XY 0.0 0.0


<Popen: returncode: None args: ['Manipulated PDFs/p17_cropped.pdf']>

### Clipping  - Merge Pages
Removing part of a page by overlaying a picture over the session to clip using the merge page method.

In [16]:
# Load the pdf to the PdfReader object with default settings
pdf_reader = PdfReader("docs/camera.pdf")
pdf_overlay = PdfReader("docs/overlay_white.pdf")
# Writer object
pdf_writer = PdfWriter()

page = pdf_reader.pages[0]
overlay_page = pdf_overlay.pages[0]

# page1.merge_page(page2: PageObject, expand: bool = False, over: bool = True)→ None
# expand page2 to page1 size set to default False and overlay page2 on top of page1 is set to default True
page.merge_page(overlay_page)
pdf_writer.add_page(page)

# saves PDF
with open("Manipulated PDFs/clipped_camera.pdf", "wb") as out_f:
    pdf_writer.write(out_f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/clipped_camera.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/clipped_camera.pdf']>

### Water Marking
Similar to clipping, overlaying a page over the watermark text using the merge page method.

In [17]:
# Load the pdf to the PdfReader object with default settings
pdf_reader = PdfReader("docs/p17.pdf")
pdf_watermark_reader = PdfReader("docs/watermark_date.pdf")
# Writer object
pdf_writer = PdfWriter()

# page1.merge_page(page2: PageObject, expand: bool = False, over: bool = True)→ None
# expand page2 to page1 size set to default False and overlay page2 on top of page1 is set to default True
page_watermark = pdf_watermark_reader.pages[0]
for page in pdf_reader.pages:
    page.merge_page(page_watermark, True, False)
    pdf_writer.add_page(page)

# saves PDF
with open("Manipulated PDFs/p17_watermarked.pdf", "wb") as out_f:
    pdf_writer.write(out_f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/p17_watermarked.pdf"],shell=True)

<Popen: returncode: None args: ['Manipulated PDFs/p17_watermarked.pdf']>

### Read PDF From Memory

In [18]:
# https://www.digitalocean.com/community/tutorials/python-io-bytesio-stringio
from io import BytesIO # https://docs.python.org/3/library/io.html

In [19]:
# Load the pdf to the PdfReader object with default settings
pdf_reader = PdfReader("docs/p17.pdf")
# Writer object
pdf_writer = PdfWriter()
pdf_writer_memory = PdfWriter()

pdf_in_memory = BytesIO()

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)

# Write the object pdf_writer into a memory variable
pdf_writer.write(pdf_in_memory)


# Loads the pdf in memory to a reader object
pdf_reader_memory = PdfReader(pdf_in_memory)
# Copy the reader document root to the writer and all sub elements, including pages, threads, outlines.
pdf_writer_memory.clone_reader_document_root(pdf_reader_memory)

# saves PDF
with open("Manipulated PDFs/p17_in_memory.pdf", "wb") as out_f:
    pdf_writer_memory.write(out_f)
# Opens PDF
subprocess.Popen("Manipulated PDFs/p17_in_memory.pdf" ,shell=True)
    

<Popen: returncode: None args: 'Manipulated PDFs/p17_in_memory.pdf'>

#### Decreasing PDF File Size - PDF Compression

In [20]:
from pathlib import Path

In [21]:
# Load the pdf to the PdfReader object with default settings
pdf_reader = PdfReader("docs/2554ci Operation Guide.pdf")
# Writer object
pdf_writer = PdfWriter()

file_size_before = Path("docs/2554ci Operation Guide.pdf").stat().st_size / 1e+6
print("File size before compression:", round(file_size_before, 2), "MB")

# Adds all reader pages to writer
pdf_writer.append_pages_from_reader(pdf_reader)
# Compress the pages
for page in pdf_writer.pages: # Using writer pages
    page.compress_content_streams()

# saves PDF
with open("Manipulated PDFs/2554ci Operation Guide.pdf", "wb") as out_f:
    pdf_writer.write(out_f)
# Opens PDF
subprocess.Popen(["Manipulated PDFs/2554ci Operation Guide.pdf"],shell=True)

file_size_after = Path("Manipulated PDFs/2554ci Operation Guide.pdf").stat().st_size / 1e+6
print("File size after compression:", round(file_size_after, 2), "MB")

if file_size_after < file_size_before:
    print("File size decreased with", round(file_size_before - file_size_after, 2), "MB")


File size before compression: 33.17 MB
File size after compression: 32.01 MB
File size decreased with 1.16 MB
