# PDFs with Python

We need a library that we can use with Python to process PDF files.

In [None]:
# pdf.py

import PyPDF2

with open('dummy.pdf', 'r') as file:
  reader = PyPDF2.PdfFileReader(file)
  print(reader.numPages)

Gives us a warning:

<br />

```bash
PdfReadWarning: PdfFileReader stream/file object is not in binary mode. It may not be read correctly. [pdf.py:1079]
Traceback (most recent call last):
  File "/home/ct/Documents/comp-python-2022-Neagoie/section-17-scripting/pdf.py", line 4, in <module>
    reader = PyPDF2.PdfFileReader(file)
  File "/home/ct/.local/lib/python3.10/site-packages/PyPDF2/pdf.py", line 1084, in __init__
    self.read(stream)
  File "/home/ct/.local/lib/python3.10/site-packages/PyPDF2/pdf.py", line 1689, in read
    stream.seek(-1, 2)
io.UnsupportedOperation: can't do nonzero end-relative seeks
```

<br />

We want to be able to read the __binary__

In [None]:
# pdf.py

import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  print(reader.numPages)

Okay, so using `'rb'` as the second argument allows us to read the binary and output the following:

``` bash
ct@pop-os:~/Documents/comp-python-2022-Neagoie/section-17-scripting$ /bin/python3 /home/ct/Documents/comp-python-2022-Neagoie/section-17-scripting/pdf.py
1
```

This creates a file stream object, thus it will convert the file object to __binary mode__ so that the PDF2 file reader can read the binary file object.
Does the same for `twopage.pdf`:

In [3]:
# pdf.py

import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  print(reader.numPages)
  print(reader.getPage(1))

1


IndexError: list index out of range

This makes sense because the `dummy.pdf` only has one page. We can also rotate the page:

In [4]:
# pdf.py

import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  print(reader.numPages)
  print(reader.rotate(180))

1


AttributeError: 'PdfFileReader' object has no attribute 'rotate'

Of course PpPDF2 uses the ` object`, but it needs to know __what__ to rotate. We have to get the page of the reader in order to rotate it.

In [5]:
# pdf.py

import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  page = reader.getPage(0)
  print(page.rotate(180)) 

AttributeError: 'PageObject' object has no attribute 'rotate'

Rotate is not the word that we want here. How to solve?

In [6]:
import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  page = reader.getPage(0)
  print(page.rotateCounterClockwise(90)) 


{'/Type': '/Page', '/Parent': IndirectObject(4, 0), '/Resources': IndirectObject(11, 0), '/MediaBox': [0, 0, 595, 842], '/Group': {'/S': '/Transparency', '/CS': '/DeviceRGB', '/I': <PyPDF2.generic.BooleanObject object at 0x7f51c160b9d0>}, '/Contents': IndirectObject(2, 0), '/Rotate': -90}


We're returned an object here, but this object is in memory. We need to update our `dummy.pdf` to actually write and rotate counter clockwise.
We'll do this with the `with` keyword, writing to the file in binary format:

In [1]:
from csv import writer
import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  page = reader.getPage(0)
  print(page.rotateCounterClockwise(90))
  writer = PyPDF2.PdfFileWriter()
  with open('tilt.pdf', 'wb') as new_file:
    writer.write(new_file)


{'/Type': '/Page', '/Parent': IndirectObject(4, 0), '/Resources': IndirectObject(11, 0), '/MediaBox': [0, 0, 595, 842], '/Group': {'/S': '/Transparency', '/CS': '/DeviceRGB', '/I': <PyPDF2.generic.BooleanObject object at 0x7f78582801f0>}, '/Contents': IndirectObject(2, 0), '/Rotate': -90}


This doesn't do anything for us because we haven't done anything with `page`. Check out the modifications in line 7-9 and run. `tilt.pdf` is now a rotated file for real and not just in memory.

In [2]:
from csv import writer
import PyPDF2

with open('dummy.pdf', 'rb') as file:
  reader = PyPDF2.PdfFileReader(file)
  page = reader.getPage(0)
  page.rotateCounterClockwise(90)
  writer = PyPDF2.PdfFileWriter()
  writer.addPage(page)
  with open('tilt.pdf', 'wb') as new_file:
    writer.write(new_file)