## PyPDF2 Module 
In this section we will work on the PyPDF2 module. in this we will learn that how to import, extract text, split and merge in a pdf document.

This section will cover:
* Extract document information from a PDF 
* Rotate pages
* Merge PDFs
* Split PDFs
* Add watermarks
* Encrypt a PDF

In [4]:
import PyPDF2 as pdf # to work on the pdf files
import os # to use the os path and dir method.

## 1. Extract the document information from a PDF file
In this section we will extract some information of the documents like author, creator  , producer, subject, title and number of pages.

In [17]:
# in this we will make a dictionary for some 
# imformation like author, creator, producer, subject, title.

# we need to use the PdfFileReader() mthod or class read the pdf file.

with open('testpdf.pdf', 'rb') as file:
    # note we need to read this pdf files in the binary mode.
    read = pdf.PdfFileReader(file)

# now we have created a PdfFile reading object now we need to get the 
# information about the document

    doc_info = read.getDocumentInfo()
    Pages = read.getNumPages()

    # now we have 

    txt = f"""
        Information about testpdf.pdf file: 

        Author: {doc_info.author}
        Creator: {doc_info.creator}
        Producer: {doc_info.producer}
        Subject: {doc_info.subject}
        Title: {doc_info.title}
        Number of pages: {Pages}
        """
    print(txt)


        Information about testpdf.pdf file: 

        Author: None
        Creator: Chromium
        Producer: Skia/PDF m88
        Subject: None
        Title: None
        Number of pages: 5
        


Since this a pdf file which is downloaded from the wikipedia so this will not have many information.

## 2. Rotate a pdf page
In this section we will see about the rotation of the pdf pages. there are many situation where we need to rotate the pdf pages. in this we will learn that.

In [20]:
# Rotating the pdf page

    
# in this we have opened a file in two mode.
# read mode to read the file.
pdf_reader = pdf.PdfFileReader("testpdf.pdf")
    
# We need a Pdf writer object to write the file
pdf_writer = pdf.PdfFileWriter()
# this object will be used to write a new or existing pdf file.
    
# Now we need to choose that which one page need to be rotated.
# Selected page 1. (indexed as 0)
    
# we will rotate this at 90 degree 
    
page_1 = pdf_reader.getPage(0)
# this pdf_reader.getPage() will return a page object
    
rotate_page_1 = page_1.rotateClockwise(90)
# page object will be rotated by using the rotateClockwise() method.
# this will return the rotated page of the page_1
    
# Now we need to write this page
# To write the page we need to use two steps
# 1. first we need to write the pagess in the pdf_writer object by 
        # using the pdf_writer.AddPage() method.
# After adding the page in the pdf_writer object we need to save the
        # data in a file.
        
# 1. Adding the page in the pdf_writer object
pdf_writer.addPage(rotate_page_1)
    
    # 2. now we need to save this file in the local disk.
    # in this we need to use the .write() method of the pdf_writer object.
with open("testedpdf.pdf",'wb') as file:
    pdf_writer.write(file)
    
    # after above statement we can see the local disk that we have 
    # rotated the the page of the testpdf.pdf file.


For this example, you need to import the PdfFileWriter in addition to PdfFileReader because you will need to write out a new PDF. rotate_pages() takes in the path to the PDF that you want to modify. Within that function, you will need to create a writer object that you can name pdf_writer and a reader object called pdf_reader.

Next, you can use .GetPage() to get the desired page. Here you grab page zero, which is the first page. Then you call the page object’s .rotateClockwise() method and pass in 90 degrees. Then for page two, you call .rotateCounterClockwise() and pass it 90 degrees as well.

After each call to the rotation methods, you call .addPage(). This will add the rotated version of the page to the writer object. The last page that you add to the writer object is page 3 without any rotation done to it.

Finally you write out the new PDF using .write(). It takes a file-like object as its parameter. This new PDF will contain three pages. The first two will be rotated in opposite directions of each other and be in landscape while the third page is a normal page.

Now let’s learn how you can merge multiple PDFs into one.

## 3. Merge pdf files
There are many situation where we need to mere some pdf to each other in an order or as it is. we can do this with the help of the  addpage.

Note: we have some pdf files in the "./test/pdf" diretory, so we will use them to perform the merging operation on the file.

we will store the merged file in the  "./test/pdf/merged" location. 

In [8]:
# first of all we need to get the all file location (directory address)
# this can be done in a simple list comprehension.
pdfs = [os.path.join(os.getcwd(),"test\pdf", i) for i in os.listdir("./test/pdf")]

# with the help of the above line we will get the all the files to be merged
# Now we need we need a pdf_writer object to write and stores the pages 
# in the memory and after all of the work we will dump these pages in the file.

pdf_writer = pdf.PdfFileWriter()
# Now we have a pdf_writer object , we will use this in the rading loop 
# while we will read each page of the pdf file.

for file in pdfs:
    # we will pass the file addres into the pdfFileReader class 
    # so it can make a reader object for each file.
    pdf_reader = pdf.PdfFileReader(file)
    
    # Now we need to get the total number of pages in the current pdf file
    # we can do this by using the getNumPages() method
    total_pages = pdf_reader.getNumPages()
    
    # now we will run another loop up to the number of pages 
    # to add the each page in the paf_writer object.
    for page in range(total_pages):
        
        # now we need to add the pages in the pdf_writer object.
        pdf_writer.addPage(pdf_reader.getPage(page))
        # this will get and write each page of the file.
        
# after writing all the pages in the pdf_writer object 
# we need to dump all the pages in the a pdf file on the local disk.

with open("./test/pdf_merged/merged_file.pdf",'wb') as mergedfile:
    pdf_writer.write(mergedfile)
    

You can use merge_pdfs() when you have a list of PDFs that you want to merge together. You will also need to know where to save the result, so this function takes a list of input paths and an output path.

Then you loop over the inputs and create a PDF reader object for each of them. Next you will iterate over all the pages in the PDF file and use .addPage() to add each of those pages to itself.

Once you’re finished iterating over all of the pages of all of the PDFs in your list, you will write out the result at the end.

One item I would like to point out is that you could enhance this script a bit by adding in a range of pages to be added if you didn’t want to merge all the pages of each PDF. If you’d like a challenge, you could also create a command line interface for this function using Python’s argparse module.

Let’s find out how to do the opposite of merging!

---
## 4. Splitting the pdf file.
In this section we will split a pdf file into a single page with the number of the page.

In [12]:
# the concept of spliting of a pdf file.
'''
To split a pdf file we need to write each page individually on the 
local disk by using the pdf_writer object. To do that we need to use
the pdf_writer object in the for loop up to the end of the pdf.
''' 

# first of all we need a pdf_reader object to read the pages and also to get the numbero of pages in the paf file.
read_path = './test/pdf/1.pdf'
write_path ='./test/pdf_splited'

pdf_reader = pdf.PdfFileReader(read_path)

# now we need to run the a for loop over the page number of the file
for page in range(pdf_reader.getNumPages()):
    # here we need to create a separate writer for each page
    pdf_writer = pdf.PdfFileWriter()
    # here page is the number of the page (current page) of pdf file.
    # Now after creating the pdf_writer object we need to add and write the page
    
    pdf_writer.addPage(pdf_reader.getPage(page))
    # after adding the page we need to save it into paf doc with the page number
    
    path = os.path.normcase(os.path.join(write_path ,f"Page-{page}.pdf"))
    
    with open(path, 'wb') as write_pdf:
        pdf_writer.write(write_pdf)
    

In this example, you once again create a PDF reader object and loop over its pages. For each page in the PDF, you will create a new PDF writer instance and add a single page to it. Then you will write that page out to a uniquely named file. When the script is finished running, you should have each page of the original PDF split into separate PDFs.

---
## 5. Add Watermarks
Watermarks are identifying images or patterns on printed and digital documents. Some watermarks can only be sseen in special lighting conditions. The reason of watermarking is importnt is that it allows you to protect your intellectual property, such as your images or PDFs. 

To add a mater we need a pdf file which contains watermarks, after this we need to get the page on which the water mark is presented and then we need to merge the pages by using the `mergePage()` method. 

In [16]:
# first of all we need to read two pdf_reader object:
# one for the watermark file and another for the actual file on which we want to put the water mark.

wm_file = './test/watermark.pdf'
test_file = 'testpdf.pdf'
out_put = './test/watermarked_testpdf.pdf'

watermark_reader = pdf.PdfFileReader(wm_file)
watermark = watermark_reader.getPage(0) # 

# Now we have got our watermark.
# now we need a file on which we want to put the watermark
pdf_reader = pdf.PdfFileReader(test_file)

# here we also need a pdf_writer object
pdf_writer = pdf.PdfFileWriter()

for page in range(pdf_reader.getNumPages()):
    
    # now here we need to merge every page with the water mark.
    # but first we need to get the pages
    
    real_page = pdf_reader.getPage(page)
    
    real_page.mergePage(watermark)
    # after merging we need to add the merged page on the pdf_writer object
    
    pdf_writer.addPage(real_page)
    
    # in this every page will be watermarked   
    
# now we have merged our real pages with the watermark 
# now we need to save the final data of the pdf_writer object.

with open(out_put, 'wb') as out:
    pdf_writer.write(out)




Note: we will ignore the warning for now.

In the code, you open up the watermark PDF and grab just the first page from the document as that is where your watermark should reside. Then you create a PDF reader object using the input_pdf and a generic pdf_writer object for writing out the watermarked PDF.

The next step is to iterate over the pages in the input_pdf. This is where the magic happens. You will need to call .mergePage() and pass it the watermark_page. When you do that, it will overlay the watermark_page on top of the current page. Then you add that newly merged page to your pdf_writer object.

---
## 6. Encrypt a pdf file.

There are many situation where we want to protect our pdf document from a password with some encryption we can do it by using the encrypt method of the pdfFileWriter() object of the PyPDF2 module.

PyPDF2 currently only supports adding a user password and an owner password to a preexisting PDF. In PDF land, an owner password will basically give you administrator privileges over the PDF and allow you to set permissions on the document. On the other hand, the user password just allows you to open the document.

As far as I can tell, PyPDF2 doesn’t actually allow you to set any permissions on the document even though it does allow you to set the owner password.

Regardless, this is how you can add a password, which will also inherently encrypt the PDF:



In [19]:
# To encrypt a pdf file we need to make to object 
# one pdf_reader object to read the file to be encrypted.
# one pdf_writer object to add the encryption on the file.

test_file = 'testpdf.pdf'
out_put = './test/encrypted_testpdf.pdf'

pdf_reader = pdf.PdfFileReader(test_file)
pdf_writer = pdf.PdfFileWriter()
# now get the total number of page to itrate through the pdf file.

for page in range(pdf_reader.getNumPages()):
    pdf_writer.addPage(pdf_reader.getPage(page))
    
# after adding all the pages in the pdf_writer object we will apply the encryption on the object.
pdf_writer.encrypt(user_pwd="manishaim")

# after applying the encryption on the object we need to write this object on the local system.
with open(out_put , 'wb') as outfile:
    pdf_writer.write(outfile)

----

## 7. tkPdfviewer