# Description
The eleventh practical project in the [Super Data Science](https://www.superdatascience.com) [Python 3 Masterclass](https://www.superdatascience.com/courses/python-3-programming-beginner-to-pro-masterclass) is a series of exercises done to manipulate a PDF file:

1) Extract Information from a PDF

2) Copy a Single Page; Paste it into a new PDF

3) Rotate a PDF; Write to a new PDF

4) Read Multiple Pages from a PDF

5) Add a Watermark to a PDF

In [1]:
from PyPDF2 import PdfFileReader, PdfFileWriter

## Extract Information from a PDF
### Create PDF Reader

In [2]:
pdf_file = open('Harvard_Business_School.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)

### Get Document Information

In [3]:
pdf_reader.getDocumentInfo()

{'/Author': 'Barlow, Andrew Jonathan',
 '/Company': 'Harvard University',
 '/CreationDate': "D:20180817171357-04'00'",
 '/Creator': 'Acrobat PDFMaker 18 for Word',
 '/ModDate': "D:20180817171437-04'00'",
 '/Producer': 'Adobe PDF Library 15.0',
 '/SourceModified': 'D:20180817211351',
 '/Title': 'I'}

### Get Page Count

In [4]:
pdf_reader.numPages

8

### Extract Text from a Page

In [5]:
pdf_reader.getPage(0).extractText()

'Undergraduate Resource Series\nO˜ce of Career Services | 54 Dunster Street     \nHarvard University | Faculty of Arts and Sciences | 617.495.2595\nwww.ocs.fas.harvard.edu\nOCSAPPLYING  TO \nBUSINESS  SCHOOLPhoto: Harvard University News O˜ce\n'

In [None]:
pdf_file.close()

## Extract a Page from a PDF and Create a New PDF

### Create PDF Reader

In [2]:
pdf_file = open('Harvard_Business_School.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)

### Extract a Page

In [3]:
cover_page = pdf_reader.getPage(0)

### Create a New PDF

In [4]:
pdf_writer = PdfFileWriter()
pdf_writer.addPage(cover_page)

### Write PDF to Disk

In [5]:
new_pdf_file = open('Harvard_Business_School_Cover_Page.pdf', 'wb')
pdf_writer.write(new_pdf_file)

In [6]:
pdf_file.close()
new_pdf_file.close()

## Extract a Page, Rotate It and Create a New PDF

### Create PDF Reader

In [7]:
pdf_file = open('Harvard_Business_School.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)

### Extract and Rotate a Page 90 Degrees

In [8]:
cover_page_rotated = pdf_reader.getPage(0).rotateClockwise(90)

### Create a New PDF

In [9]:
pdf_writer = PdfFileWriter()
pdf_writer.addPage(cover_page_rotated)

### Write PDF to Disk

In [10]:
new_pdf_file = open('Harvard_Business_School_Cover_Page_Rotated.pdf', 'wb')
pdf_writer.write(new_pdf_file)

In [11]:
pdf_file.close()
new_pdf_file.close()

## Read Multiple Pages

### Create PDF Reader

In [12]:
pdf_file = open('Harvard_Business_School.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)

### Read Multiple Pages

In [14]:
pdf_text = []

for page in range(pdf_reader.numPages):
    page_text = pdf_reader.getPage(page).extractText()
    pdf_text.append(page_text)

len(pdf_text)

8

In [15]:
pdf_file.close()

## Watermark a PDF

### Read the Watermark

In [13]:
watermark_file = open('Watermark.pdf', 'rb')
watermark_reader = PdfFileReader(watermark_file)
watermark = watermark_reader.getPage(0)

### Read Source PDF

In [14]:
source_pdf_file = open('Harvard_Business_School.pdf', 'rb')
source_pdf_reader = PdfFileReader(source_pdf_file)

### Merge PDFs

In [15]:
output_pdf_writer = PdfFileWriter()

for page in range(source_pdf_reader.getNumPages()):
    current_page = source_pdf_reader.getPage(page)
    current_page.mergePage(watermark)
    output_pdf_writer.addPage(current_page)

### Write PDF to Disk

In [16]:
output_pdf_file = open('Harvard_Business_School_Cover_Page_Watermarked.pdf', 'wb')
output_pdf_writer.write(output_pdf_file)

In [17]:
watermark_file.close()
source_pdf_file.close()
output_pdf_file.close()