<a href="https://colab.research.google.com/github/anujsaxena/Python/blob/main/FileHandling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **File Handling**

When you’re working with Python, you don’t need to import a library in order to read and write files. It’s handled natively in the language, albeit in a unique manner.

The first thing you’ll need to do is use Python’s built-in open function to get a file object. 

The open function opens a file. It’s simple. 

When you use the open function, it returns something called a file object. File objects contain methods and attributes that can be used to collect information about the file you opened. They can also be used to manipulate said file.

For example, the mode attribute of a file object tells you which mode a file was opened in. And the name attribute tells you the name of the file that the file object has opened. 

You must understand that a file and file object are two wholly separate – yet related – things.
 
**File Types**

What you may know as a file is slightly different in Python. 
In Windows, for example, a file can be any item manipulated, edited or created by the user/OS. That means files can be images, text documents, executables, and much more. Most files are organized by keeping them in individual folders. 
In Python, a file is categorized as either text or binary, and the difference between the two file types is important. 
Text files are structured as a sequence of lines, where each line includes a sequence of characters. This is what you know as code or syntax. 
Each line is terminated with a special character, called the EOL or End of Line character. There are several types, but the most common is the comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. 
A backslash character can also be used, and it tells the interpreter that the next character – following the slash – should be treated as a new line. This character is useful when you don’t want to start a new line in the text itself but in the code. 
A binary file is any type of file that is not a text file. Because of their nature, binary files can only be processed by an application that know or understand the file’s structure. In other words, they must be applications that can read and interpret binary.

**Open ( ) Function**

In order to open a file for writing or use in Python, you must rely on the built-in open () function. 
As explained above, open ( ) will return a file object, so it is most commonly used with two arguments.  
An argument is nothing more than a value that has been provided to a function, which is relayed when you call it. So, for instance, if we declare the name of a file as “Test File,” that name would be considered an argument. 
The syntax to open a file object in Python is:

*File_object = open("filename","mode")*

**Mode**

Including a mode argument is optional because a default value of ‘r’ will be assumed if it is omitted. The ‘r’ value stands for read mode, which is just one of many. 
The modes are: 
1.	‘r’ – Read mode which is used when the file is only being read 
2. ‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
3.	‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically appended to the end 
4.	‘r+’ – Special read and write mode, which is used to handle both actions when working with a file 



In [None]:
#write to a text file

f = open("test.txt","w")
f.write("Hello")
f.write("Welcome to Python class")
f.write("Python is fun")
f.write("Python is easy")
f.close()

In [None]:
f = open("test1.txt","w")
f.write("Hello \n")
f.write("Welcome to Python class \n")
f.write("Python is fun \n")
f.write("Python is easy")
f.close()

In [None]:
#read from a text file
f = open("test1.txt","r")
d = f.read()
print(d)

Hello 
Welcome to Python class 
Python is fun 
Python is easy


In [None]:
f = open("test1.txt","r")
d = f.read(5)
print(d)

Hello


In [None]:
f = open("test1.txt","r")
d = f.readline()
print(d)

Hello 



In [None]:
f1 = open("test.txt","r")
d1 = f1.readlines()
print(d1)

['HelloWelcome to Python classPython is funPython is easy']


In [None]:
f1 = open("test1.txt","r")
d1 = f1.readlines(1)
print(d1)

['Hello \n']


In [None]:
f = open("test2.txt","a")
f.writelines(["Hello \n", "Welcome to Python class\n","Python is fun\n", "Python is easy\n"])
f.close()

In [None]:
f1 = open("test2.txt","r")
d1 = f1.readlines()
print(d1)

['Hello \n', 'Welcome to Python class\n', 'Python is fun\n', 'Python is easy\n']


In [None]:
f=open("test2.txt","r")
for txt in f:
	print(txt)


Hello 

Welcome to Python class

Python is fun

Python is easy



In [None]:
f=open("test2.txt","r")
for txt in f:
  wrd = txt.split()
  print(wrd)

['Hello']
['Welcome', 'to', 'Python', 'class']
['Python', 'is', 'fun']
['Python', 'is', 'easy']


In [None]:
with open("test3.txt",'w') as f:
  f.write("Welocme to my python online classes !!!")
  f.write("\n institue de informatica !!!")

In [None]:
with open("test3.txt",'r') as f:
  d=f.readlines()
  print(d)
for txt in d:
  wrd = txt.split()
  print(wrd)

['Welocme to my python online classes !!!\n', ' institue de informatica !!!']
['Welocme', 'to', 'my', 'python', 'online', 'classes', '!!!']
['institue', 'de', 'informatica', '!!!']


# **CSV files**

**Module**

CSV

In CSV module documentation you can find following functions:
1. csv.field_size_limit – return maximum field size
2. csv.get_dialect – get the dialect which is associated with the name
3.	csv.list_dialects – show all registered dialects
4.	csv.reader – read data from a csv file
4.	csv.register_dialect - associate dialect with name
5.	csv.writer – write data to a csv file
6.	csv.unregister_dialect - delete the dialect associated with the name the dialect registry
7.	csv.QUOTE_ALL - Quote everything, regardless of type.
8.	csv.QUOTE_MINIMAL - Quote fields with special characters
9.	csv.QUOTE_NONNUMERIC - Quote all fields that aren't numbers value
10.	csv.QUOTE_NONE – Don't quote anything in output




In [12]:
# importing the csv module 
import csv 
  
# field names 
fields = ['Name', 'Branch', 'Year', 'CGPA'] 
  
# data rows of csv file 
rows = [ ['Nikhil', 'COE', '2021', '9.0'], 
         ['Sanchit', 'COE', '2020', '9.1'], 
         ['Aditya', 'IT', '2021', '9.3'], 
         ['Sagar', 'SE', '2021', '9.5'], 
         ['Prateek', 'MCE', '2020', '7.8'], 
         ['Sahil', 'EP', '2019', '9.1']] 
  
# name of csv file 
filename = "university_records.csv"
  
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    csvwriter = csv.writer(csvfile)      
    # writing the headers 
    csvwriter.writerow(fields) 
    # writing the data rows 
    csvwriter.writerows(rows)


In [3]:
#from a dictionary

import csv

# my data rows as dictionary objects 
mydict =[{'branch': 'COE', 'cgpa': '9.0', 'name': 'Nikhil', 'year': '2020'}, 
         {'branch': 'COE', 'cgpa': '9.1', 'name': 'Sanchit', 'year': '2021'}, 
         {'branch': 'IT', 'cgpa': '9.3', 'name': 'Aditya', 'year': '2020'}, 
         {'branch': 'SE', 'cgpa': '9.5', 'name': 'Sagar', 'year': '2021'}, 
         {'branch': 'MCE', 'cgpa': '7.8', 'name': 'Prateek', 'year': '2021'}, 
         {'branch': 'EP', 'cgpa': '9.1', 'name': 'Sahil', 'year': '2019'}] 
  
# field names 
fields = ['name', 'branch', 'year', 'cgpa'] 
  
# name of csv file 
filename = "university_dict.csv"
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv dict writer object 
    writer = csv.DictWriter(csvfile, fieldnames = fields)  
    # writing headers (field names) 
    writer.writeheader() 
    # writing data rows 
    writer.writerows(mydict) 



In [4]:
#Read CSV file

import csv
with open( "university_records.csv") as f:
	data = csv.reader(f)
	for row in data:
		print(row)



['Name', 'Branch', 'Year', 'CGPA']
['Nikhil', 'COE', '2021', '9.0']
['Sanchit', 'COE', '2020', '9.1']
['Aditya', 'IT', '2021', '9.3']
['Sagar', 'SE', '2021', '9.5']
['Prateek', 'MCE', '2020', '7.8']
['Sahil', 'EP', '2019', '9.1']


In [5]:
import csv 
  
# csv file name 
filename = "university_records.csv"
  
# initializing the headers and rows list 
fields = [] 
rows = [] 
  
# reading csv file 
with open(filename, 'r') as csvfile: 
    # creating a csv reader object 
    csvreader = csv.reader(csvfile) 
      
    # extracting field names through first row 
    #fields = csvreader.next() 
  
    # extracting each data row one by one 
    for row in csvreader: 
        rows.append(row) 
  
    # get total number of rows 
    print("Total no. of rows: %d"%(csvreader.line_num)) 
  
# printing the field names 
print('Header names are:' + ', '.join(field for field in fields)) 
  
#  printing first 5 rows 
print('\nFirst 5 rows are:\n') 
for row in rows[1:5]: 
    # parsing each column of a row 
    for col in row: 
        print("%10s"%col), 
    print('\n') 


Total no. of rows: 7
Header names are:

First 5 rows are:

    Nikhil
       COE
      2021
       9.0


   Sanchit
       COE
      2020
       9.1


    Aditya
        IT
      2021
       9.3


     Sagar
        SE
      2021
       9.5




In [6]:
import csv
reader = csv.DictReader(open("university_records.csv"))

for txt in reader:
	print(txt)


OrderedDict([('Name', 'Nikhil'), ('Branch', 'COE'), ('Year', '2021'), ('CGPA', '9.0')])
OrderedDict([('Name', 'Sanchit'), ('Branch', 'COE'), ('Year', '2020'), ('CGPA', '9.1')])
OrderedDict([('Name', 'Aditya'), ('Branch', 'IT'), ('Year', '2021'), ('CGPA', '9.3')])
OrderedDict([('Name', 'Sagar'), ('Branch', 'SE'), ('Year', '2021'), ('CGPA', '9.5')])
OrderedDict([('Name', 'Prateek'), ('Branch', 'MCE'), ('Year', '2020'), ('CGPA', '7.8')])
OrderedDict([('Name', 'Sahil'), ('Branch', 'EP'), ('Year', '2019'), ('CGPA', '9.1')])


# **Pandas for CSV files**

In [7]:
import pandas as pd
data=pd.read_csv("university_records.csv", header=0)
print(data)
print("---First row taken as header--------")
data=pd.read_csv("university_records.csv", header=1)
print(data)
#get the shape of the matrix(data)
print(data.shape)
print(data.columns)
print(data.dtypes)



      Name Branch  Year  CGPA
0   Nikhil    COE  2021   9.0
1  Sanchit    COE  2020   9.1
2   Aditya     IT  2021   9.3
3    Sagar     SE  2021   9.5
4  Prateek    MCE  2020   7.8
5    Sahil     EP  2019   9.1
---First row taken as header--------
    Nikhil  COE  2021  9.0
0  Sanchit  COE  2020  9.1
1   Aditya   IT  2021  9.3
2    Sagar   SE  2021  9.5
3  Prateek  MCE  2020  7.8
4    Sahil   EP  2019  9.1
(5, 4)
Index(['Nikhil', 'COE', '2021', '9.0'], dtype='object')
Nikhil     object
COE        object
2021        int64
9.0       float64
dtype: object


# **Add own headers**

In [8]:
import pandas as pd 
data=pd.read_csv("university_records.csv",skiprows=1, names=['SName', 'Course', 'Duration', 'Score']) 
print(data) 

     SName Course  Duration  Score
0   Nikhil    COE      2021    9.0
1  Sanchit    COE      2020    9.1
2   Aditya     IT      2021    9.3
3    Sagar     SE      2021    9.5
4  Prateek    MCE      2020    7.8
5    Sahil     EP      2019    9.1


In [13]:
data=pd.read_csv("university_records.csv", names=['SName', 'Course', 'Duration', 'Score']) 
print(data) 

     SName  Course Duration Score
0     Name  Branch     Year  CGPA
1   Nikhil     COE     2021   9.0
2  Sanchit     COE     2020   9.1
3   Aditya      IT     2021   9.3
4    Sagar      SE     2021   9.5
5  Prateek     MCE     2020   7.8
6    Sahil      EP     2019   9.1


# **Skip rows and retain headers**

In [14]:
data=pd.read_csv("university_records.csv",skiprows=[1,2]) 
print(data) 

      Name Branch  Year  CGPA
0   Aditya     IT  2021   9.3
1    Sagar     SE  2021   9.5
2  Prateek    MCE  2020   7.8
3    Sahil     EP  2019   9.1


In [15]:
data=pd.read_csv("university_records.csv",header=None) 
print(data) 

         0       1     2     3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021   9.3
4    Sagar      SE  2021   9.5
5  Prateek     MCE  2020   7.8
6    Sahil      EP  2019   9.1


# **Add Prefix to column name **

In [16]:
data=pd.read_csv("university_records.csv",header=None, prefix="var") 
print(data) 

      var0    var1  var2  var3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021   9.3
4    Sagar      SE  2021   9.5
5  Prateek     MCE  2020   7.8
6    Sahil      EP  2019   9.1


In [20]:
# importing the csv module 
import csv 
  
# field names 
fields = ['Name', 'Branch', 'Year', 'CGPA'] 
  
# data rows of csv file 
rows = [ ['Nikhil', 'COE', '2021', '9.0'], 
         ['Sanchit', 'COE', '2020', '9.1'], 
         ['Aditya', 'IT', '2021', ' '], 
         ['Sagar', 'SE', '2021', '9.5'], 
         ['Prateek', 'MCE', ' ', '7.8'], 
         ['Sahil', 'EP', '2019', '9.1']] 
  
# name of csv file 
filename = "university_records2.csv"
  
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    csvwriter = csv.writer(csvfile)      
    # writing the headers 
    csvwriter.writerow(fields) 
    # writing the data rows 
    csvwriter.writerows(rows)

In [22]:
data=pd.read_csv("university_records2.csv",header=None, na_values=".") 
print(data)

         0       1     2     3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021      
4    Sagar      SE  2021   9.5
5  Prateek     MCE         7.8
6    Sahil      EP  2019   9.1


# **CSV file from a URL**

In [23]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv") 
print(medals) 

      Year      City       Sport  ...            Event Event gender   Medal
0     1924  Chamonix     Skating  ...       individual            M  Silver
1     1924  Chamonix     Skating  ...       individual            W    Gold
2     1924  Chamonix     Skating  ...            pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...         four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...       ice hockey            M    Gold
...    ...       ...         ...  ...              ...          ...     ...
2306  2006     Turin      Skiing  ...        Half-pipe            M  Silver
2307  2006     Turin      Skiing  ...        Half-pipe            W    Gold
2308  2006     Turin      Skiing  ...        Half-pipe            W  Silver
2309  2006     Turin      Skiing  ...  Snowboard Cross            M    Gold
2310  2006     Turin      Skiing  ...  Snowboard Cross            W  Silver

[2311 rows x 8 columns]


# **Skip Last few lines**

In [24]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skipfooter=5) 
print(medals) 

      Year      City       Sport  ...                  Event Event gender   Medal
0     1924  Chamonix     Skating  ...             individual            M  Silver
1     1924  Chamonix     Skating  ...             individual            W    Gold
2     1924  Chamonix     Skating  ...                  pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...               four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...             ice hockey            M    Gold
...    ...       ...         ...  ...                    ...          ...     ...
2301  2006     Turin      Skiing  ...        Alpine combined            M    Gold
2302  2006     Turin      Skiing  ...           giant slalom            W    Gold
2303  2006     Turin      Skiing  ...                 moguls            M  Bronze
2304  2006     Turin      Skiing  ...  Giant parallel slalom            W  Bronze
2305  2006     Turin      Skiing  ...              Half-pipe            M    Gold

[2306 rows x 8 

  """Entry point for launching an IPython kernel.


# **Read few rows**

In [25]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5) 
print(medals) 

   Year      City       Sport  ...       Event Event gender   Medal
0  1924  Chamonix     Skating  ...  individual            M  Silver
1  1924  Chamonix     Skating  ...  individual            W    Gold
2  1924  Chamonix     Skating  ...       pairs            X    Gold
3  1924  Chamonix   Bobsleigh  ...    four-man            M  Bronze
4  1924  Chamonix  Ice Hockey  ...  ice hockey            M    Gold

[5 rows x 8 columns]


# **Interpret "," as thousand **

In [26]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5, thousands=",") 
print(medals) 

 

   Year      City       Sport  ...       Event Event gender   Medal
0  1924  Chamonix     Skating  ...  individual            M  Silver
1  1924  Chamonix     Skating  ...  individual            W    Gold
2  1924  Chamonix     Skating  ...       pairs            X    Gold
3  1924  Chamonix   Bobsleigh  ...    four-man            M  Bronze
4  1924  Chamonix  Ice Hockey  ...  ice hockey            M    Gold

[5 rows x 8 columns]


# **Read specific Columns**

In [27]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,5,7]) 
print(medals) 

          City            Event   Medal
0     Chamonix       individual  Silver
1     Chamonix       individual    Gold
2     Chamonix            pairs    Gold
3     Chamonix         four-man  Bronze
4     Chamonix       ice hockey    Gold
...        ...              ...     ...
2306     Turin        Half-pipe  Silver
2307     Turin        Half-pipe    Gold
2308     Turin        Half-pipe  Silver
2309     Turin  Snowboard Cross    Gold
2310     Turin  Snowboard Cross  Silver

[2311 rows x 3 columns]


# **Time taken to read the file**

In [28]:
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", verbose=True) 
print(medals) 

Tokenization took: 0.76 ms
Type conversion took: 1.05 ms
Parser memory cleanup took: 0.04 ms
      Year      City       Sport  ...            Event Event gender   Medal
0     1924  Chamonix     Skating  ...       individual            M  Silver
1     1924  Chamonix     Skating  ...       individual            W    Gold
2     1924  Chamonix     Skating  ...            pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...         four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...       ice hockey            M    Gold
...    ...       ...         ...  ...              ...          ...     ...
2306  2006     Turin      Skiing  ...        Half-pipe            M  Silver
2307  2006     Turin      Skiing  ...        Half-pipe            W    Gold
2308  2006     Turin      Skiing  ...        Half-pipe            W  Silver
2309  2006     Turin      Skiing  ...  Snowboard Cross            M    Gold
2310  2006     Turin      Skiing  ...  Snowboard Cross            W  Si

# **Read file with delimiters**

In [29]:
medals = pd.read_csv("university_dict.csv", sep=":") 
print(medals) 

  name,branch,year,cgpa
0   Nikhil,COE,2020,9.0
1  Sanchit,COE,2021,9.1
2    Aditya,IT,2020,9.3
3     Sagar,SE,2021,9.5
4  Prateek,MCE,2021,7.8
5     Sahil,EP,2019,9.1


# **Change Column type while importing **

In [30]:
medals = pd.read_csv("university_dict.csv", dtype = {"year" : "float64"}) 
print(medals) 

      name branch    year  cgpa
0   Nikhil    COE  2020.0   9.0
1  Sanchit    COE  2021.0   9.1
2   Aditya     IT  2020.0   9.3
3    Sagar     SE  2021.0   9.5
4  Prateek    MCE  2021.0   7.8
5    Sahil     EP  2019.0   9.1


In [33]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading PyPDF2-1.26.0.tar.gz (77 kB)
[?25l[K     |████▎                           | 10 kB 25.5 MB/s eta 0:00:01[K     |████████▌                       | 20 kB 28.0 MB/s eta 0:00:01[K     |████████████▊                   | 30 kB 19.8 MB/s eta 0:00:01[K     |█████████████████               | 40 kB 15.5 MB/s eta 0:00:01[K     |█████████████████████▏          | 51 kB 5.5 MB/s eta 0:00:01[K     |█████████████████████████▍      | 61 kB 5.9 MB/s eta 0:00:01[K     |█████████████████████████████▋  | 71 kB 5.8 MB/s eta 0:00:01[K     |████████████████████████████████| 77 kB 3.4 MB/s 
[?25hBuilding wheels for collected packages: PyPDF2
  Building wheel for PyPDF2 (setup.py) ... [?25l[?25hdone
  Created wheel for PyPDF2: filename=PyPDF2-1.26.0-py3-none-any.whl size=61101 sha256=be2a9ea5df472f4311eca522ef3e048b66ea7ca7923ec10dba7a9e05ed133731
  Stored in directory: /root/.cache/pip/wheels/80/1a/24/648467ade3a77ed20f35cfd2badd32134e96dd25ca811e64b3
Successf

In [35]:
import PyPDF2
pdfName = 'Numpy.pdf' 
read_pdf = PyPDF2.PdfFileReader(pdfName) 
page = read_pdf.getPage(0) 
page_content = page.extractText() 
print(page_content) 

 

Numpy
 
Nump
y
 
is a Python package. It stands for 'Numerical Python'. It is a library consisting of 
multidimensional array objects and a collection of routines for processing of array. 
 
 
Numeric
, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numar
ray 
was also developed, having some additional functionalities. In 2005, Travis Oliphant created 
NumPy package by incorporating the features of Num
 
array into Numeric package. There are 
many contributors to this open source project.
 
 
Operations using NumPy
 
 
Using NumPy, a developer can perform the following operations: 
 
 
1.
 
Mathematical and logical operations on arrays. 
 
2.
 
Fourier transforms and routines for shape manipulation. 
 
3.
 
Operations related to linear algebra. NumPy has in
-
built functions for linear algebr
a and 
random number generation. 
 
 
How to use
 
It creates an ndarray from any object exposing array interface, or from any method that returns 
an array.
 
 
numpy.array(

In [36]:
for i in range(read_pdf.getNumPages()): 
    page = read_pdf.getPage(i) 
    print('Page No - ' + str(1+read_pdf.getPageNumber(page))) 
    page_content = page.extractText() 
    print(page_content) 

Page No - 1
Numpy
 
Nump
y
 
is a Python package. It stands for 'Numerical Python'. It is a library consisting of 
multidimensional array objects and a collection of routines for processing of array. 
 
 
Numeric
, the ancestor of NumPy, was developed by Jim Hugunin. Another package Numar
ray 
was also developed, having some additional functionalities. In 2005, Travis Oliphant created 
NumPy package by incorporating the features of Num
 
array into Numeric package. There are 
many contributors to this open source project.
 
 
Operations using NumPy
 
 
Using NumPy, a developer can perform the following operations: 
 
 
1.
 
Mathematical and logical operations on arrays. 
 
2.
 
Fourier transforms and routines for shape manipulation. 
 
3.
 
Operations related to linear algebra. NumPy has in
-
built functions for linear algebr
a and 
random number generation. 
 
 
How to use
 
It creates an ndarray from any object exposing array interface, or from any method that returns 
an array.
 
 


# **Read a PDF file and display information **

In [38]:
with open(pdfName, 'rb') as f: 
        pdf = PyPDF2.PdfFileReader(f) 
        info = pdf.getDocumentInfo() 
        number_of_pages = pdf.getNumPages() 

print(info) 
author = info.author 
creator = info.creator 
producer = info.producer 
subject = info.subject 
title = info.title 
print(author)
print(creator)
print(producer)
print(subject)
print(title)

{'/Author': 'anuj saxena', '/Creator': 'Microsoft® Word for Microsoft 365', '/CreationDate': "D:20200713165335+05'30'", '/ModDate': "D:20200713165335+05'30'", '/Producer': 'Microsoft® Word for Microsoft 365'}
anuj saxena
Microsoft® Word for Microsoft 365
Microsoft® Word for Microsoft 365
None
None


In [39]:
import os 
from PyPDF2 import PdfFileReader, PdfFileWriter  

pdfName = 'Numpy.pdf' 
fname = os.path.splitext(os.path.basename(pdfName))[0] 
pdf = PdfFileReader(pdfName) 
for page in range(pdf.getNumPages()): 
        pdf_writer = PdfFileWriter() 
        pdf_writer.addPage(pdf.getPage(page)) 
        output_filename = '{}_page_{}.pdf'.format(fname, page+1) 

        with open(output_filename, 'wb') as out: 
            pdf_writer.write(out) 
        print('Created: {}'.format(output_filename)) 

 

Created: Numpy_page_1.pdf
Created: Numpy_page_2.pdf
