<a href="https://colab.research.google.com/github/anujsaxena/AIML/blob/main/AIML_Lab_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **File Handling**

When you’re working with Python, you don’t need to import a library in order to read and write files. It’s handled natively in the language, albeit in a unique manner.

The first thing you’ll need to do is use Python’s built-in open function to get a file object. 

The open function opens a file. It’s simple. 

When you use the open function, it returns something called a file object. File objects contain methods and attributes that can be used to collect information about the file you opened. They can also be used to manipulate said file.

For example, the mode attribute of a file object tells you which mode a file was opened in. And the name attribute tells you the name of the file that the file object has opened. 

You must understand that a file and file object are two wholly separate – yet related – things.
 
**File Types**

What you may know as a file is slightly different in Python. 
In Windows, for example, a file can be any item manipulated, edited or created by the user/OS. That means files can be images, text documents, executables, and much more. Most files are organized by keeping them in individual folders. 
In Python, a file is categorized as either text or binary, and the difference between the two file types is important. 
Text files are structured as a sequence of lines, where each line includes a sequence of characters. This is what you know as code or syntax. 
Each line is terminated with a special character, called the EOL or End of Line character. There are several types, but the most common is the comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. 
A backslash character can also be used, and it tells the interpreter that the next character – following the slash – should be treated as a new line. This character is useful when you don’t want to start a new line in the text itself but in the code. 
A binary file is any type of file that is not a text file. Because of their nature, binary files can only be processed by an application that know or understand the file’s structure. In other words, they must be applications that can read and interpret binary.

**Open ( ) Function**

In order to open a file for writing or use in Python, you must rely on the built-in open () function. 
As explained above, open ( ) will return a file object, so it is most commonly used with two arguments.  
An argument is nothing more than a value that has been provided to a function, which is relayed when you call it. So, for instance, if we declare the name of a file as “Test File,” that name would be considered an argument. 
The syntax to open a file object in Python is:

*File_object = open("filename","mode")*

**Mode**

Including a mode argument is optional because a default value of ‘r’ will be assumed if it is omitted. The ‘r’ value stands for read mode, which is just one of many. 
The modes are: 
1.	‘r’ – Read mode which is used when the file is only being read 
2. ‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
3.	‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically appended to the end 
4.	‘r+’ – Special read and write mode, which is used to handle both actions when working with a file 


# **Difference between r+ and a+**

r+ Open for reading and writing. The stream is positioned at the beginning of the file.

a+ Open for reading and appending (writing at end of file). The file is created if it does not exist. The initial file position for reading is at the beginning of the file, but output is appended to the end of the file (but in some Unix systems regardless of the current seek position).

In [None]:
f = open("test.txt", "w")
f.write("Hello")
f.write("How are you today")
f.write("What are your plans for your future")
f.write("Python is an interesting language")
f.close()  # you have to close the file so that the data can be written

In [None]:
#open the file again and insert new data
f = open("test.txt", "w")
f.write("Hi Guys !! ")
f.write("Where are yoy ? ")
f.write("What are your plans for today ? ")
f.write("The class should be interesting ")
f.close()  # you have to close the file so that the data can be written

In [None]:
f = open("test.txt", "w")
f.write("Hello \n")
f.write("How are you today \n")
f.write("What are your plans for your future \n")
f.write("Python is an interesting language")
f.close()  # you have to close the file so that the data can be written

In [None]:
f = open('test.txt', 'r')
d = f.read()
print(d)

Hello 
How are you today 
What are your plans for your future 
Python is an interesting language


In [None]:
f = open('test.txt', 'r')
d = f.read(5)
print(d)

Hello


In [None]:
f = open('test.txt', 'r')
d = f.read(24)
print(d)

Hello 
How are you today


In [None]:
f = open("test.txt", "w")
f.writelines("Hello")
f.writelines("How are you today ")
f.writelines("What are your plans for your future ")
f.writelines("Python is an interesting language")
f.close()  # you have to close the file so that the data can be written

In [None]:
f = open("test.txt", "w")
f.write("Hello \n")
f.write("How are you today \n")
f.write("What are your plans for your future \n")
f.write("Python is an interesting language")
f.close()  # you have to close the file so that the data can be written

In [None]:
f = open('test.txt', 'r')
d = f.readlines()
print(d)

['Hello \n', 'How are you today \n', 'What are your plans for your future \n', 'Python is an interesting language']


In [None]:
f = open('test.txt', 'r')
d = f.read()
print(d)

Hello 
How are you today 
What are your plans for your future 
Python is an interesting language


In [None]:
f = open('test.txt', 'r')
d = f.readlines(1)
print(d)

['Hello \n']


In [None]:
f = open('test2.txt', 'a')
f.writelines(['Hello \n','How are you \n', 'Have a nice day'])
f.close()

In [None]:
f = open('test2.txt', 'r')
d = f.readlines()
print(d)

['Hello \n', 'How are you \n', 'Have a nice day']


In [None]:
f = open('test2.txt', 'a')
f.writelines(['In ML \n','Data Handling is very important \n', 'We should have clean data'])
f.close()

In [None]:
f = open('test2.txt', 'r')
for text in f:
  print(text)

Hello 

How are you 

Have a nice dayIn ML 

Data Handling is very important 

We should have clean data


In [None]:
f=open("test2.txt","r")
for txt in f:
  wrd = txt.split() #split the line into tokens (words)
  print(wrd)

['Hello']
['How', 'are', 'you']
['Have', 'a', 'nice', 'dayIn', 'ML']
['Data', 'Handling', 'is', 'very', 'important']
['We', 'should', 'have', 'clean', 'data']


In [None]:
with open("test3.txt",'w') as f:
  f.write("Welocme to my python online classes !!!")
  f.write("\n institue de informatica !!!")

In [None]:
with open('test3.txt', 'r') as f:
  d=f.readlines()
  print(d)

['Welocme to my python online classes !!!\n', ' institue de informatica !!!']


In [None]:
for txt in d:
  wrd = txt.split()
  print(wrd)

['Welocme', 'to', 'my', 'python', 'online', 'classes', '!!!']
['institue', 'de', 'informatica', '!!!']


# **CSV files**

**Module**

CSV

In CSV module documentation you can find following functions:
1. csv.field_size_limit – return maximum field size
2. csv.get_dialect – get the dialect which is associated with the name
3.	csv.list_dialects – show all registered dialects
4.	csv.reader – read data from a csv file
4.	csv.register_dialect - associate dialect with name
5.	csv.writer – write data to a csv file
6.	csv.unregister_dialect - delete the dialect associated with the name the dialect registry
7.	csv.QUOTE_ALL - Quote everything, regardless of type.
8.	csv.QUOTE_MINIMAL - Quote fields with special characters
9.	csv.QUOTE_NONNUMERIC - Quote all fields that aren't numbers value
10.	csv.QUOTE_NONE – Don't quote anything in output


In [None]:
#import module to read a csv file

import csv
# field names 
fields = ['Name', 'Branch', 'Year', 'CGPA'] 
  
# data rows of csv file 
rows = [ ['Nikhil', 'COE', '2021', '9.0'], 
         ['Sanchit', 'COE', '2020', '9.1'], 
         ['Aditya', 'IT', '2021', '9.3'], 
         ['Sagar', 'SE', '2021', '9.5'], 
         ['Prateek', 'MCE', '2020', '7.8'], 
         ['Sahil', 'EP', '2019', '9.1']] 
  
# name of csv file 
filename ="university_records.csv"
# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    csvwriter = csv.writer(csvfile)
    # write the header
    csvwriter.writerow(fields)
    # write the rows
    csvwriter.writerows(rows)


In [None]:
# write from a dictionary

# my data rows as dictionary objects 
mydict =[{'branch': 'COE', 'cgpa': '9.0', 'name': 'Nikhil', 'year': '2020'}, 
         {'branch': 'COE', 'cgpa': '9.1', 'name': 'Sanchit', 'year': '2021'}, 
         {'branch': 'IT', 'cgpa': '9.3', 'name': 'Aditya', 'year': '2020'}, 
         {'branch': 'SE', 'cgpa': '9.5', 'name': 'Sagar', 'year': '2021'}, 
         {'branch': 'MCE', 'cgpa': '7.8', 'name': 'Prateek', 'year': '2021'}, 
         {'branch': 'EP', 'cgpa': '9.1', 'name': 'Sahil', 'year': '2019'}] 
  
# field names 
fields = ['name', 'branch', 'year', 'cgpa'] 
  
# name of csv file 
filename = "university_dict.csv"

with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    writer = csv.DictWriter(csvfile, fieldnames = fields) 
    # write the header
    writer.writeheader() 
    # write the rows
    writer.writerows(mydict)

In [None]:
#read CSV file

with open("university_records.csv") as f:
  data = csv.reader(f)
  for row in data:
    print(row)


['Name', 'Branch', 'Year', 'CGPA']
['Nikhil', 'COE', '2021', '9.0']
['Sanchit', 'COE', '2020', '9.1']
['Aditya', 'IT', '2021', '9.3']
['Sagar', 'SE', '2021', '9.5']
['Prateek', 'MCE', '2020', '7.8']
['Sahil', 'EP', '2019', '9.1']


In [None]:
# csv file name 
filename = "university_records.csv"
  
# initializing the headers and rows list 
fields = [] 
rows = [] 
  
# reading csv file 
with open(filename, 'r') as csvfile: 
    # creating a csv reader object 
    csvreader = csv.reader(csvfile) 
      
    # extracting field names through first row 
    fields = next(csvreader) 
    
    for row in csvreader:
      rows.append(row)
      # get total number of rows 
      print("Total no. of rows: %d"%(csvreader.line_num))
# printing the field names 
print('Header names are:' + ', '.join(field for field in fields))

#  printing first 5 rows 
print('\nFirst 5 rows are:\n') 
for row in rows[1:5]: 
    # parsing each column of a row 
    for col in row: 
        print("%10s"%col), 
    print('\n') 



Total no. of rows: 2
Total no. of rows: 3
Total no. of rows: 4
Total no. of rows: 5
Total no. of rows: 6
Total no. of rows: 7
Header names are:Name, Branch, Year, CGPA

First 5 rows are:

   Sanchit
       COE
      2020
       9.1


    Aditya
        IT
      2021
       9.3


     Sagar
        SE
      2021
       9.5


   Prateek
       MCE
      2020
       7.8




In [None]:
reader = csv.DictReader(open("university_records.csv"))
for txt in reader:
	print(txt)


OrderedDict([('Name', 'Nikhil'), ('Branch', 'COE'), ('Year', '2021'), ('CGPA', '9.0')])
OrderedDict([('Name', 'Sanchit'), ('Branch', 'COE'), ('Year', '2020'), ('CGPA', '9.1')])
OrderedDict([('Name', 'Aditya'), ('Branch', 'IT'), ('Year', '2021'), ('CGPA', '9.3')])
OrderedDict([('Name', 'Sagar'), ('Branch', 'SE'), ('Year', '2021'), ('CGPA', '9.5')])
OrderedDict([('Name', 'Prateek'), ('Branch', 'MCE'), ('Year', '2020'), ('CGPA', '7.8')])
OrderedDict([('Name', 'Sahil'), ('Branch', 'EP'), ('Year', '2019'), ('CGPA', '9.1')])


In [None]:
import pandas as pd
data=pd.read_csv("university_records.csv", header=0)
print(data)

      Name Branch  Year  CGPA
0   Nikhil    COE  2021   9.0
1  Sanchit    COE  2020   9.1
2   Aditya     IT  2021   9.3
3    Sagar     SE  2021   9.5
4  Prateek    MCE  2020   7.8
5    Sahil     EP  2019   9.1


In [None]:
data=pd.read_csv("university_records.csv", header=1)
print(data)

    Nikhil  COE  2021  9.0
0  Sanchit  COE  2020  9.1
1   Aditya   IT  2021  9.3
2    Sagar   SE  2021  9.5
3  Prateek  MCE  2020  7.8
4    Sahil   EP  2019  9.1


In [None]:
#get the shape of the matrix(data)
data=pd.read_csv("university_records.csv", header=0)
print(data.shape)
print(data.columns)
print(data.dtypes)

(6, 4)
Index(['Name', 'Branch', 'Year', 'CGPA'], dtype='object')
Name       object
Branch     object
Year        int64
CGPA      float64
dtype: object


In [None]:
#add own headers
import pandas as pd 
data=pd.read_csv("university_records.csv",skiprows=1, names=['SName', 'Course', 'Duration', 'Score']) 
print(data) 

     SName Course  Duration  Score
0   Nikhil    COE      2021    9.0
1  Sanchit    COE      2020    9.1
2   Aditya     IT      2021    9.3
3    Sagar     SE      2021    9.5
4  Prateek    MCE      2020    7.8
5    Sahil     EP      2019    9.1


In [None]:
#skip the rows and retain headers
data=pd.read_csv("university_records.csv",skiprows=[1,2]) 
print(data) 

      Name Branch  Year  CGPA
0   Aditya     IT  2021   9.3
1    Sagar     SE  2021   9.5
2  Prateek    MCE  2020   7.8
3    Sahil     EP  2019   9.1


In [None]:
data=pd.read_csv("university_records.csv",header=None) 
print(data) 

         0       1     2     3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021   9.3
4    Sagar      SE  2021   9.5
5  Prateek     MCE  2020   7.8
6    Sahil      EP  2019   9.1


In [None]:
#Add Prefix to column names
data=pd.read_csv("university_records.csv",header=None, prefix='var') 
print(data) 

      var0    var1  var2  var3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021   9.3
4    Sagar      SE  2021   9.5
5  Prateek     MCE  2020   7.8
6    Sahil      EP  2019   9.1


In [None]:
# field names 
fields = ['Name', 'Branch', 'Year', 'CGPA'] 
  
# data rows of csv file 
rows = [ ['Nikhil', 'COE', '2021', '9.0'], 
         ['Sanchit', 'COE', '2020', '9.1'], 
         ['Aditya', 'IT', '2021', ' '], 
         ['Sagar', 'SE', '2021', '9.5'], 
         ['Prateek', 'MCE', 'nan', '7.8'], 
         ['Sahil', 'EP', '2019', '9.1']] 
  
# name of csv file 
filename = "university_records2.csv"

# writing to csv file 
with open(filename, 'w') as csvfile: 
    # creating a csv writer object 
    csvwriter = csv.writer(csvfile)      
    # writing the headers 
    csvwriter.writerow(fields) 
    # writing the data rows 
    csvwriter.writerows(rows)

In [None]:
data=pd.read_csv("university_records2.csv",header=None, na_values="-") 
print(data)


         0       1     2     3
0     Name  Branch  Year  CGPA
1   Nikhil     COE  2021   9.0
2  Sanchit     COE  2020   9.1
3   Aditya      IT  2021      
4    Sagar      SE  2021   9.5
5  Prateek     MCE   NaN   7.8
6    Sahil      EP  2019   9.1


In [None]:
print(type(data))

<class 'pandas.core.frame.DataFrame'>


In [None]:
#reading CSV file from a url
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv")
print(medals)


      Year      City       Sport  ...            Event Event gender   Medal
0     1924  Chamonix     Skating  ...       individual            M  Silver
1     1924  Chamonix     Skating  ...       individual            W    Gold
2     1924  Chamonix     Skating  ...            pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...         four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...       ice hockey            M    Gold
...    ...       ...         ...  ...              ...          ...     ...
2306  2006     Turin      Skiing  ...        Half-pipe            M  Silver
2307  2006     Turin      Skiing  ...        Half-pipe            W    Gold
2308  2006     Turin      Skiing  ...        Half-pipe            W  Silver
2309  2006     Turin      Skiing  ...  Snowboard Cross            M    Gold
2310  2006     Turin      Skiing  ...  Snowboard Cross            W  Silver

[2311 rows x 8 columns]


In [None]:
#skip last 5 rows
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skipfooter=5) 
print(medals) 

      Year      City       Sport  ...                  Event Event gender   Medal
0     1924  Chamonix     Skating  ...             individual            M  Silver
1     1924  Chamonix     Skating  ...             individual            W    Gold
2     1924  Chamonix     Skating  ...                  pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...               four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...             ice hockey            M    Gold
...    ...       ...         ...  ...                    ...          ...     ...
2301  2006     Turin      Skiing  ...        Alpine combined            M    Gold
2302  2006     Turin      Skiing  ...           giant slalom            W    Gold
2303  2006     Turin      Skiing  ...                 moguls            M  Bronze
2304  2006     Turin      Skiing  ...  Giant parallel slalom            W  Bronze
2305  2006     Turin      Skiing  ...              Half-pipe            M    Gold

[2306 rows x 8 

  


In [None]:
#Read few rows
medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5) 
print(medals) 

   Year      City       Sport  ...       Event Event gender   Medal
0  1924  Chamonix     Skating  ...  individual            M  Silver
1  1924  Chamonix     Skating  ...  individual            W    Gold
2  1924  Chamonix     Skating  ...       pairs            X    Gold
3  1924  Chamonix   Bobsleigh  ...    four-man            M  Bronze
4  1924  Chamonix  Ice Hockey  ...  ice hockey            M    Gold

[5 rows x 8 columns]


In [None]:
#read specific column

medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,5,7]) 
print(medals) 

          City            Event   Medal
0     Chamonix       individual  Silver
1     Chamonix       individual    Gold
2     Chamonix            pairs    Gold
3     Chamonix         four-man  Bronze
4     Chamonix       ice hockey    Gold
...        ...              ...     ...
2306     Turin        Half-pipe  Silver
2307     Turin        Half-pipe    Gold
2308     Turin        Half-pipe  Silver
2309     Turin  Snowboard Cross    Gold
2310     Turin  Snowboard Cross  Silver

[2311 rows x 3 columns]


In [None]:
#time taken to read the file

medals = pd.read_csv("http://winterolympicsmedals.com/medals.csv", verbose=True) 
print(medals) 

Tokenization took: 0.73 ms
Type conversion took: 2.49 ms
Parser memory cleanup took: 0.01 ms
      Year      City       Sport  ...            Event Event gender   Medal
0     1924  Chamonix     Skating  ...       individual            M  Silver
1     1924  Chamonix     Skating  ...       individual            W    Gold
2     1924  Chamonix     Skating  ...            pairs            X    Gold
3     1924  Chamonix   Bobsleigh  ...         four-man            M  Bronze
4     1924  Chamonix  Ice Hockey  ...       ice hockey            M    Gold
...    ...       ...         ...  ...              ...          ...     ...
2306  2006     Turin      Skiing  ...        Half-pipe            M  Silver
2307  2006     Turin      Skiing  ...        Half-pipe            W    Gold
2308  2006     Turin      Skiing  ...        Half-pipe            W  Silver
2309  2006     Turin      Skiing  ...  Snowboard Cross            M    Gold
2310  2006     Turin      Skiing  ...  Snowboard Cross            W  Si

In [None]:
#read the file with delimiters
medals = pd.read_csv("university_dict.csv", sep=":") 
print(medals) 

  name,branch,year,cgpa
0   Nikhil,COE,2020,9.0
1  Sanchit,COE,2021,9.1
2    Aditya,IT,2020,9.3
3     Sagar,SE,2021,9.5
4  Prateek,MCE,2021,7.8
5     Sahil,EP,2019,9.1


In [None]:
#change column type while importing data
data = pd.read_csv("university_dict.csv", dtype={"year":"float64"}) 
print(data) 

      name branch    year  cgpa
0   Nikhil    COE  2020.0   9.0
1  Sanchit    COE  2021.0   9.1
2   Aditya     IT  2020.0   9.3
3    Sagar     SE  2021.0   9.5
4  Prateek    MCE  2021.0   7.8
5    Sahil     EP  2019.0   9.1


# **Read PDF files**

In [None]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading PyPDF2-1.26.0.tar.gz (77 kB)
[?25l[K     |████▎                           | 10 kB 24.9 MB/s eta 0:00:01[K     |████████▌                       | 20 kB 29.6 MB/s eta 0:00:01[K     |████████████▊                   | 30 kB 14.9 MB/s eta 0:00:01[K     |█████████████████               | 40 kB 10.9 MB/s eta 0:00:01[K     |█████████████████████▏          | 51 kB 7.2 MB/s eta 0:00:01[K     |█████████████████████████▍      | 61 kB 7.6 MB/s eta 0:00:01[K     |█████████████████████████████▋  | 71 kB 7.2 MB/s eta 0:00:01[K     |████████████████████████████████| 77 kB 4.0 MB/s 
[?25hBuilding wheels for collected packages: PyPDF2
  Building wheel for PyPDF2 (setup.py) ... [?25l[?25hdone
  Created wheel for PyPDF2: filename=PyPDF2-1.26.0-py3-none-any.whl size=61102 sha256=a1c908a32a75584b0300bcbd6aa91ec4aeaa2f5f27faf204724c9f708ea8782a
  Stored in directory: /root/.cache/pip/wheels/80/1a/24/648467ade3a77ed20f35cfd2badd32134e96dd25ca811e64b3
Successf

In [None]:
import PyPDF2
pdfName = "AI.pdf"
read_pdf = PyPDF2.PdfFileReader(pdfName) 
page = read_pdf.getPage(0) 
page_content = page.extractText() 
print(page_content) 


Artificial Intelligence
 
 
Artificial Intelligence refers to the intelligence displayed by computers. In today's world, 
Artificial Intelligence has become highly popular. It is the replication of human intelligence in 
computers that have been programmed to learn and replicate human 
activities. These computers 
can learn from their mistakes and do human
-
like jobs. Artificial intelligence (AI) will have a 
significant influence on our quality of life as it develops. It's only natural that everyone 
nowadays wants to engage with AI technol
ogy in some way, whether as a consumer or as a 
professional in the field.
 
 
What is Intelligence
?
 
Calculation, reasoning, perceiving relationships and analogies, learning from experience, 
storing and retrieving information from memory, solving problems, comprehending complex 
ideas
, fluently using natural language, classifying, generalising, and adapting to new situations 
are all capabilities of a system.
 
 
Types
 
Intelligence occurs i

In [None]:
for i in range(read_pdf.getNumPages()):
  page = read_pdf.getPage(i) 
  print('Page No - ' + str(1+read_pdf.getPageNumber(page))) 
  page_content = page.extractText() 
  print(page_content) 

Page No - 1
Artificial Intelligence
 
 
Artificial Intelligence refers to the intelligence displayed by computers. In today's world, 
Artificial Intelligence has become highly popular. It is the replication of human intelligence in 
computers that have been programmed to learn and replicate human 
activities. These computers 
can learn from their mistakes and do human
-
like jobs. Artificial intelligence (AI) will have a 
significant influence on our quality of life as it develops. It's only natural that everyone 
nowadays wants to engage with AI technol
ogy in some way, whether as a consumer or as a 
professional in the field.
 
 
What is Intelligence
?
 
Calculation, reasoning, perceiving relationships and analogies, learning from experience, 
storing and retrieving information from memory, solving problems, comprehending complex 
ideas
, fluently using natural language, classifying, generalising, and adapting to new situations 
are all capabilities of a system.
 
 
Types
 
Intellige

In [None]:
#display information of the file

with open(pdfName, 'rb') as f: 
        pdf = PyPDF2.PdfFileReader(f) 
        info = pdf.getDocumentInfo() 
        number_of_pages = pdf.getNumPages() 

print(info) 
author = info.author 
creator = info.creator 
producer = info.producer 
subject = info.subject 
title = info.title 
print(author)
print(creator)
print(producer)
print(subject)
print(title)

{'/Author': 'Anuj Saxena', '/Creator': 'Microsoft® Word for Microsoft 365', '/CreationDate': "D:20220121194836+05'30'", '/ModDate': "D:20220121194836+05'30'", '/Producer': 'Microsoft® Word for Microsoft 365'}
Anuj Saxena
Microsoft® Word for Microsoft 365
Microsoft® Word for Microsoft 365
None
None


In [None]:
import os #operating system
from PyPDF2 import PdfFileReader, PdfFileWriter  

pdfName = 'AI.pdf' 
fname = os.path.splitext(os.path.basename(pdfName))[0] 
pdf = PdfFileReader(pdfName) 
for page in range(pdf.getNumPages()): 
        pdf_writer = PdfFileWriter() 
        pdf_writer.addPage(pdf.getPage(page)) 
        output_filename = '{}_page_{}.pdf'.format(fname, page+1) 

        with open(output_filename, 'wb') as out: 
            pdf_writer.write(out) 
        print('Created: {}'.format(output_filename)) 

Created: AI_page_1.pdf
Created: AI_page_2.pdf
Created: AI_page_3.pdf
Created: AI_page_4.pdf
Created: AI_page_5.pdf


# **Read / Write a DOCX file**

In [None]:
!pip install python-docx

Collecting python-docx
  Downloading python-docx-0.8.11.tar.gz (5.6 MB)
[K     |████████████████████████████████| 5.6 MB 6.6 MB/s 
Building wheels for collected packages: python-docx
  Building wheel for python-docx (setup.py) ... [?25l[?25hdone
  Created wheel for python-docx: filename=python_docx-0.8.11-py3-none-any.whl size=184507 sha256=742ba82f2527807630814ae76a1ed0ec91b7ddd6adb55bb39df4a680fd5f987c
  Stored in directory: /root/.cache/pip/wheels/f6/6f/b9/d798122a8b55b74ad30b5f52b01482169b445fbb84a11797a6
Successfully built python-docx
Installing collected packages: python-docx
Successfully installed python-docx-0.8.11


In [None]:
import docx
#create an instance
doc = docx.Document()
doc.add_heading('My document file using Python', 0)
#add a paragraph
doc_p = doc.add_paragraph("We are adding a pragraph in the document")

#add_run to add bold, italics, undeline
doc_p.add_run("We will make the text bold").bold=True
doc_p.add_run(', and')
doc_p.add_run("This text will be italic").italic = True

#break the page
doc.add_page_break()

#add level 2 heading
doc.add_heading("First heading level 1", 1)
doc.add_heading("Second heading level 2", 2)

#save the file
doc.save("test1.docx")

In [None]:
from docx.shared import Inches

document = docx.Document()

document.add_heading('Document with text and image', 0)

p = document.add_paragraph('We will be adding an image in this document ')
p.add_run('bold').bold = True
p.add_run(' and some text also ')
p.add_run('and make it italic.').italic = True

document.add_heading('Let us add Heading, level 1', level=1)
document.add_paragraph('Check the intense quote', style='Intense Quote')

document.add_paragraph('First item in the Bullet', style='List Bullet')
document.add_paragraph('First item in the ordered list', style='List Number')
document.add_picture('lenna.tif', width=Inches(1.25))

records = ((3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam'))

#add table
table = document.add_table(rows=1, cols=3)
# create an object of the table
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
#add data
for qty, id, desc in records:
    #create row
    row_cells = table.add_row().cells
    #add data in row
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('test2.docx')

In [None]:
#read a docx file
from docx import Document
#read a document file
doc = Document("test2.docx")

#read the contents
#list paragraphs 
print("Paragraphs ...")
print(doc.paragraphs)

Paragraphs ...
[<docx.text.paragraph.Paragraph object at 0x7f471d04cd50>, <docx.text.paragraph.Paragraph object at 0x7f471d04c0d0>, <docx.text.paragraph.Paragraph object at 0x7f471d04cf50>, <docx.text.paragraph.Paragraph object at 0x7f471d04c290>, <docx.text.paragraph.Paragraph object at 0x7f471d04c310>, <docx.text.paragraph.Paragraph object at 0x7f471d04c2d0>, <docx.text.paragraph.Paragraph object at 0x7f471d04c390>, <docx.text.paragraph.Paragraph object at 0x7f471d04ced0>]


In [None]:
#print list of runs

print("\nlist of runs..")
print(doc.paragraphs[0].runs)



list of runs..
[<docx.text.run.Run object at 0x7f471d044c10>]


In [None]:
#print text in runs

print(doc.paragraphs[0].text)

Document with text and image


In [None]:
#print the entrie document

for p in doc.paragraphs:
  print(p.text)

Document with text and image
We will be adding an image in this document bold and some text also and make it italic.
Let us add Heading, level 1
Check the intense quote
First item in the Bullet
First item in the ordered list





In [None]:
doc = Document("def.docx")
for p in doc.paragraphs:
  print(p.text)

Public Relations(PR)
Public Relations(PR) is a strategic communication process that builds mutually beneficaial relationships between organization and their public. 


Public Relations(PR) in Media
Public Relations(PR) in media is defined as the practice of deliberately managing the release and spread of information between an individual or an organization(such as business, government agency, or nonprofit organization) and the public.
Strategies used for Public Relations(PR)
1)Developing Relations with Media and Influncers-  by interacting with journalists, media outlets, and influencers on social media they extend the public relationship beyond the press release pitch. This increases the chances of gaining media coverage with right outlets.
2)Facilitate the Brand Message through Content Distribution- by distributing original and branded content audience is a tried and true social media tactic. In addition, it allows public relations teams to strategically facilitate the brand message 