### SESSION 10 - File Handling  Serialization & Deserialization


### Some Theory

**Types of data used for I/O:**
- Text - '12345' as a sequence of unicode chars
- Binary - 12345 as a sequence of bytes of its binary equivalent

**Hence there are 2 file types to deal with**
- Text files - All program files are text files
- Binary Files - Images,music,video,exe files

**How File I/O is done in most programming languages (Basic step):**
- Open a file
- Read/Write data
- Close the file

#### File Access Modes :

![Access%20file%20mode%20in%20file%20handling.png](attachment:Access%20file%20mode%20in%20file%20handling.png)

##### Writing to a file :

In [15]:
# case 1 - if the file is not present
f = open('File Handling Files\Write.txt','w')
f.write('The Attack on Titan The Final Season anime premiered on NHK')
f.close()

In [16]:
# write multiline strings
f = open('File Handling Files\WriteMulti.txt','w')
f.write('The Attack on Titan The Final Season anime premiered on NHK')
f.write('\nThe Final Season') # add new line
f.write('\n23-FEB-2023')
f.close()

In [17]:
# case 2 - if the file is already present
f = open('File Handling Files\Write.txt','w')
f.write('New content')
f.close()

In [23]:
# Problem with w mode means overwrite the file
# introducing append mode
# add new content without overwrite
f = open('File Handling Files\Write.txt','a') # append mode
f.write('\nNew line added without overwrite')
f.close()

In [24]:
# Writelines 
# add multiple lines at once
L = ['Hello\n','How are you?\n','I am fine\n']
f = open('File Handling Files\WriteLines.txt','w')
f.writelines(L)
f.close()

##### Read File in Python :
- In Python, temporary data that is locally used in a module will be stored in a variable. 
- In large volumes of data, a file is used such as text and CSV files and there are methods in Python to read or write data in those files.

##### Access Modes for Reading a file :

![Access%20file%20mode%20for%20read%20a%20files.png](attachment:Access%20file%20mode%20for%20read%20a%20files.png)

##### File Read Methods :

![File%20read%20methods.png](attachment:File%20read%20methods.png)

In [2]:
# Read a Text File
# using Read() file method

f = open('File Handling Files\WriteLines.txt','r')
r = f.read()
print(r)
f.close()

Hello
How are you?
I am fine



In [4]:
# reading upto n chars

f = open('File Handling Files\WriteLines.txt','r')
r = f.read(15)
print(r)
f.close()

Hello
How are y


In [8]:
# readline() :
# > readline() method, we can read a file line by line. by default, this method reads the first line in the file.
f = open('File Handling Files\WriteLines.txt','r')
s = f.readline()
s1 = f.readline()
print(s,end='')
print(s1,end='')
f.close()


Hello
How are you?


**[IQ]** 
**when to use readline() mehtod ?**
- If we have large number of file  and we can not load that file into the memory means it well not affect the memory

In [None]:
# reading entire file using redline()
#We can use the readline() method to read the entire file using the while loop. 
We need to check whether the pointer has reached the End of the File and then loop through the file line by line.

f = open('File Handling Files\WriteLines.txt','r')
while True: 
    data = f.readline()  
    if data == ' ':
        break
    else:
        print(data,end='')   
f.close()

Hello
How are you?
I am fine


#### Using Context Manager (with statement) :

In [9]:
#Ex. with write
with open('File Handling Files\WriteLines.txt','w') as f:
    print(f.write('Writing file Eren Yeager'))

24


In [10]:
#Ex. with read
with open('File Handling Files\WriteLines.txt','r') as f:
    print(f.read())

Writing file Eren Yeager


In [12]:
# moving within a file -> 10 char then 10 char
with open('File Handling Files\WriteLines.txt','r') as f:
    print(f.read(10))
    print(f.read(10))

Writing fi
le Eren Ye


##### Reading big  file in chunks :

In [18]:
# benefit? -> to load a big file in memory
BigFile = ['OPM' for i in range(1000)]

with open('File Handling Files\BigFile.txt','w') as f:
    f.writelines(BigFile)

In [22]:
# loading file in chunks

with open('File Handling Files\BigFile.txt','r') as f:
    
    chunk_size = 50
    
    # Here read() method read the len of chunk size text that we deside that is grater than zero
    while len(f.read(chunk_size))>0:
        
        # printing chunk size and after every chunk size it will print '---'
        print(f.read(chunk_size)) # end = ''
        
        # load next chunk 
        f.read(chunk_size)
        
# Ther is no diffrence in output but in memory we load by 50 character (chunk size) at once

MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMO---MOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPMOPM

#### Tell() Function To Get File Handle Position :
- **tell() method:**
  - The tell() method returns the current file position in a file stream.
- **syntax :** file_object.tell()

In [49]:
with open('File Handling Files\Seek.txt','w') as f:
    print(f.write('First line\n'))
    print(f.write('Writing file Eren Yeager\n'))
    print(f.tell())

with open('File Handling Files\Seek.txt','r') as r:
    print(r.read(5))
    print(r.tell())
    
    print(r.read(5))
    print(r.tell())

11
25
38
First
5
 line
10


#### Seek() Method : Move File Pointer Position :
- **Seek() Method:**
  - In Python, seek() function is used to change the position of the File Handle to a given specific position. 

  - Position start from 0 and it must be positive integer.

  - File handle is like a cursor, which defines from where the data has to be read or written in the file. 

- **syntax :** filr_object.seek(position)

In [52]:
# EX. Here we read 5 character of file after using seek() method it will change the position of reading the character form any positon in that file
with open('File Handling Files\Seek.txt','r') as f:
    print(f.read(5))
    print(f.seek(0)) # 0 indicates the first byte, which is the beginning of the file.
    print(f.read(5))
    print(f.seek(6)) # counting start from 6 character
    print(f.tell())

First
0
First
6
6


In [54]:
# seek() during the write
with open('File Handling Files\Seek.txt','w') as f:
    print(f.write('Second line\n'))
    print(f.seek(0))
    print(f.write('x'))

12
0
1


#### Problems with working in text mode
- can't work with binary files like images
- not good for other data types like int/float/list/tuples

In [11]:
# working with binary file :
with open('File Handling Files\Kilota.jpg',mode='rb') as rb:
    with open('File Handling Files\New_Kilota.jpg',mode='wb') as wb:
        wb.write(rb.read())

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 19-20: malformed \N character escape (121653168.py, line 3)

In [27]:
# working with other data types
with open('File Handling Files\Datatype.txt','w') as f:
    f.write('5') # write() argument must be str, not int

with open('File Handling Files\Datatype.txt','r') as r:
    #print( r.read() + 5)
    #convert into int
    print(int(r.read()) + 5)
    

10


In [3]:
# more complex data

d = {
    'name' : 'akash',
    'age' : 21
}

with open('File Handling Files\complexdata.txt','w') as f:
    #f.read(d)
    #converting in string
    f.write(str(5342))

In [32]:
with open('File Handling Files\complexdata.txt','r') as f:
    print(f.read())
    print(type(f.read()))

5342
<class 'str'>


### Serialization and Deserialization :
- Useful to sloved complex datatype writing in file.
- **Serialization** - process of converting python data types to JSON format
- **Deserialization** - process of converting JSON to python data types

#### What is JSON ?
- JOSN stand for JavaScript Object Notation
- Universal data format 
- JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java™, JavaScript, Perl, Python, and many others.

![JSON.png](attachment:JSON.png)

**Serialization using json module (with importing json module) :**
- **json.dump() :**
   - json module in Python module provides a method called dump() which converts the Python objects into appropriate json objects. It is a slight variant of dumps() method.
- In simple world , dump() method used to convert a Python object to a JSON file:
- **Syntax** : 
  - json.dump(d, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None)
  - Where, A py_obj can be a string, list, dictionary, etc.
  - indent :Give indentation like json format to makes every key-value pair appear in new line.

In [1]:
# importing json module 
# with list
import json
L = [1,2,3,4,5]

with open('File Handling Files/demo.json','w') as w:
    json.dump(L,w)

In [9]:
# with dictionary
import json 

d = {
    'name' : 'akash',
    'age' : 21
}

with open('File Handling Files\json_with_dictionary.json','w') as f:
    json.dump(d, f ,indent=4)

**Deserialization using json module :**
- **json.load():** 
json.load() accepts file object, parses the JSON data, populates a Python dictionary with the data and returns it back to you.

- **Syntax :** json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

In [13]:
import json 
with open('File Handling Files/json_with_dictionary.json','r') as r:
    d =json.load(r)
    print(d)
    print(type(d))

# Here we put complex datatype(list ,dictionaries, tuple , 2D Dict.. ,sets etc.) in json and also we retrive the that datatype

{'name': 'akash', 'age': 21}
<class 'dict'>


**Serializing and Deserializing custom objects**

In [12]:
class Person:
    def __init__(self,fname,lname,age):
        self.fname = fname
        self.lname = lname
        self.age = age
    
# format to printed in 
# -> Akash Pagi age -> 21 

In [13]:
# we have dump this object into json format
p = Person('Akash','pagi',21) 

**isinstance() method/function :**
- **Syntax:**
   - **isinstance(object, classinfo)**
   - It takes two arguments, and both are mandatory.
   - The isinstance() function checks if the object argument is an instance or subclass of classinfo class argument
- Using isinstance() function, we can test whether an object/variable is an instance of the specified type or class such as int or list. In the case of inheritance, we can checks if the specified class is the parent class of an object.
- The isinstance() returns True if an object or variable is of a specified type otherwise False.



In [14]:
# As a string
# Here we specifiy how object is serialize 
import json

def Show_object(p):
    if isinstance(p,Person):
        return 'Name is {} {} and my age is {}'.format(p.fname,p.lname,p.age)

with open('File Handling Files\Seril_obj.json','w') as w:
    json.dump(p, w, default=Show_object)

In [20]:
# As a Dictionary
# serializing
import json

def Show_object(p):
    if isinstance(p,Person):
        return {'Name':p.fname +' '+ p.lname ,'age':p.age}

with open('File Handling Files\Seril_obj_Dictionary.json','w') as w:
    json.dump(p, w, default=Show_object, indent=4)

In [23]:
# As a Dictionary
# Deserializing
with open('File Handling Files\Seril_obj_Dictionary.json','r') as r:
    print(json.load(r))

{'Name': 'Akash pagi', 'age': 21}


### Pickling in python :
- **Pickling :**
  - It is the process whereby a **Python object hierarchy is converted into a byte stream (binary file)**
- **Unpickling:** 
  - It is the inverse operation, whereby a **byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy**.
- **importing:**
  - Before write or read pickle file we have to import the pickle - **import pickle**
  
- **Pickle.dump() function :**
  - pickle.dump() function to **store the object data to the file**. pickle.dump() function takes 3 arguments.
    - The first argument is the **object** that you want to store. 
    - The second argument is the file object you get by opening the desired file in **write-binary (wb) mode**. 
    - The third argument is the key-value argument.This argument defines the protocol. There are two type of protocol - **pickle.HIGHEST_PROTOCOL and pickle.DEFAULT_PROTOCOL**. 

- **Pickle.load() function :**

In [37]:
# Ex.
class Person:
    def __init__(self,name,age):
        self.name = name
        self.age = age
        
    def display_info(self):
        print('Hi my name is',self.name,'and I am', self.age,'years old')

In [38]:
pp = Person('Saitama', 33)

#### Pickling Function/Method :


- **Pickle.dump() function :**
  - pickle.dump() function to **store the object data to the file**. pickle.dump() function takes 3 arguments.
    - The first argument is the **object** that you want to store. 
    - The second argument is the file object you get by opening the desired file in **write-binary (wb) mode**. 
    - The third argument is the key-value argument.This argument defines the protocol. There are two type of protocol - **pickle.HIGHEST_PROTOCOL and pickle.DEFAULT_PROTOCOL**. 


In [43]:
# importing the pickle & use pickle.dump()
import pickle
with open('File Handling Files\Pick_dump.pkl','wb') as wb:
    pickle.dump(pp, wb)

- **Pickle.load() function :**
  - pickle.load() function is **retrieve pickled data**.
  - The primary argument of pickle load function is the file object that you get by opening the file in **read-binary (rb) mode**.

In [48]:
# importing the pickle & use pickle.load()
import pickle
with open('File Handling Files\Pick_dump.pkl','rb') as rb:
    r = pickle.load(rb)
r.display_info()

AttributeError: 'Person' object has no attribute 'zname'