## File I/O

### Reading from a file

For convenience, we will call our file *pi.txt* and it will contain the digits of pi written in different lines.

To read the entire file, we can do this:

In [1]:
with open('pi.txt') as f: # Opens our file and we can refer to it using f
    contents = f.read() # Read everything in the file as a string using the read method
print(contents)

3.14159265358979323846264338327950288419716939937510582097494
4592307816406286208998628034825342117067982148086513282306647
0938446095505822317253594081284811174502841027019385211055596
4462294895493038196442881097566593344612847564823378678316527
1201909145648566923460348610454326648213393607260249141273724
5870066063155881748815209209628292540917153643678925903600113
3053054882046652138414695194151160943305727036575959195309218
6117381932611793105118548074462379962749567351885752724891227
9381830119491298336733624406566430860213949463952247371907021
7986094370277053921717629317675238467481846766940513200056812
7145263560827785771342757789609173637178721468440901224953430
1465495853710507922796892589235420199561121290219608640344181
5981362977477130996051870721134999999837297804995105973173281
6096318595024459455346908302642522308253344685035261931188171
0100031378387528865875332083814206171776691473035982534904287
5546873115956286388235378759375195778185778053217122680661300
19278766

The `open()` function needs one argument: the name of the file you want to open and returns an object representing the file. Python looks for this file in the directory where the program that’s currently being executed is stored.

The keyword `with` closes the file once access to it is no longer needed. We can explicitly open and close files as follows (though it is not usually done and the `with` method is used).

In [2]:
f = open('pi.txt')
# Do stuff on f
f.close()

It is not necessary that the text file is in the same directory, you can also provide absolute or relative *file paths*. For example

In [3]:
path = r"C:\Users\shubh\Desktop\PyCk\Lecture 4\pi.txt"
with open(path) as f:
    print(f.read())

3.14159265358979323846264338327950288419716939937510582097494
4592307816406286208998628034825342117067982148086513282306647
0938446095505822317253594081284811174502841027019385211055596
4462294895493038196442881097566593344612847564823378678316527
1201909145648566923460348610454326648213393607260249141273724
5870066063155881748815209209628292540917153643678925903600113
3053054882046652138414695194151160943305727036575959195309218
6117381932611793105118548074462379962749567351885752724891227
9381830119491298336733624406566430860213949463952247371907021
7986094370277053921717629317675238467481846766940513200056812
7145263560827785771342757789609173637178721468440901224953430
1465495853710507922796892589235420199561121290219608640344181
5981362977477130996051870721134999999837297804995105973173281
6096318595024459455346908302642522308253344685035261931188171
0100031378387528865875332083814206171776691473035982534904287
5546873115956286388235378759375195778185778053217122680661300
19278766

It is often needed to read each line for a file, this can be done using a for loop

In [4]:
"""
You'll notice that we get an extra blank line after each line. This is because each line 
in the file ends with a \n character, and print adds its own \n, so to avoid this, as
said before, add rstrip(), or use the end="" keyword arg
"""
with open('pi.txt') as f: # Open the file
    for line in f: # Loop over all the lines
        print(line.rstrip()) # or
        # print(line, end="")

3.14159265358979323846264338327950288419716939937510582097494
4592307816406286208998628034825342117067982148086513282306647
0938446095505822317253594081284811174502841027019385211055596
4462294895493038196442881097566593344612847564823378678316527
1201909145648566923460348610454326648213393607260249141273724
5870066063155881748815209209628292540917153643678925903600113
3053054882046652138414695194151160943305727036575959195309218
6117381932611793105118548074462379962749567351885752724891227
9381830119491298336733624406566430860213949463952247371907021
7986094370277053921717629317675238467481846766940513200056812
7145263560827785771342757789609173637178721468440901224953430
1465495853710507922796892589235420199561121290219608640344181
5981362977477130996051870721134999999837297804995105973173281
6096318595024459455346908302642522308253344685035261931188171
0100031378387528865875332083814206171776691473035982534904287
5546873115956286388235378759375195778185778053217122680661300
19278766

To store the lines as a list, we can use the readlines() method

In [5]:
with open('pi.txt') as f:
    lines = f.readlines()
lines[:3]

['3.14159265358979323846264338327950288419716939937510582097494\n',
 '4592307816406286208998628034825342117067982148086513282306647\n',
 '0938446095505822317253594081284811174502841027019385211055596\n']

In [6]:
# Alternate, Ditch the \n character
with open('pi.txt') as f:
    lines = f.read().split("\n")
lines[:3]

['3.14159265358979323846264338327950288419716939937510582097494',
 '4592307816406286208998628034825342117067982148086513282306647',
 '0938446095505822317253594081284811174502841027019385211055596']

To store all the digits of π (at least in the file) together, we can append it to a variable. Consider the lines variable above, we can just do

In [7]:
pi_string = ""
for line in lines:
    pi_string += line.strip() 
    
pi_string

'3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201

### Writing to an Empty File

To write text to a file, you need to call `open()` with a second argument telling
Python that you want to write to the file.

In [8]:
with open('my_file.txt', 'w') as f: 
    f.write('Hello World\n')
    f.write('Good Night\n')

In [9]:
# with open('my_file.txt', 'w') as f: 
#     f.write('Adding another line\n')
# # This is overwrite our file and previous data will be lost

Note that we can optionally use 'r' to open the file in read-mode, but that's the default so it can omit it if we only want to read from the file.

### Appending to a File

We can use the `'a'` value of the second argument to open the file in append mode, where Python doesn’t erase the contents of the file before returning the file object. If the file doesn’t exist yet, Python will create an empty file for you.

In [10]:
with open('my_file.txt', 'a') as f:
    f.write('Oh I did not overwrite the file!\n')

## The Python `pickle` Module: Persisting Objects in Python

You may sometimes need to share or transfer complex object hierarchies out of your session or save the internal state of your objects to a disk or database for later use. 

To accomplish this, you can use a process called **serialization**, which is fully supported by the python standard library thanks to the `pickle` module

### Serialization in Python

The **serialization** process is a way to convert a data structure into a linear form that can be stored or transmitted over a network.

In Python, serialization allows you to take a complex object structure and transform it into a stream of bytes that can be saved to a disk or sent over a network.

Example class whose objects we'll learn to pickle:

In [11]:
class example_class:
    a_number = 35
    a_string = "hey"
    a_list = [1, 2, 3]
    a_dict = {"first": "a", "second": 2, "third": [1, 2, 3]}
    a_tuple = (22, 23)

my_object = example_class()

Importing the module:

In [12]:
import pickle

# Methods inside the Python `pickle` module:

### `pickle.dumps()` and `pickle.loads()`

These functions pickle an object to or load an object from a **string**. 

Let's see how this works:

In [13]:
my_pickled_object = pickle.dumps(my_object)  # Pickling the object
print(f"This is my pickled object:\n{my_pickled_object}\n")

This is my pickled object:
b'\x80\x03c__main__\nexample_class\nq\x00)\x81q\x01.'



In [14]:
my_unpickled_object = pickle.loads(my_pickled_object)  # Unpickling the object
print(f"This is a_dict of the unpickled object:\n{my_unpickled_object.a_dict}\n")
print(f"This is a_list of the unpickled object:\n{my_unpickled_object.a_list}\n")

This is a_dict of the unpickled object:
{'first': 'a', 'second': 2, 'third': [1, 2, 3]}

This is a_list of the unpickled object:
[1, 2, 3]



### `pickle.dump()` and `pickle.load()`

These functions pickle an object to or load an object from a **file object**.\

Let's see how this works:

In [15]:
with open('my_pickled_object', 'wb') as f:
    pickle.dump(my_object, f) # Pickling the object

In [16]:
with open('my_pickled_object', 'rb') as f:
    my_unpickled_object = pickle.load(f) # Unpickling the object

print(f"This is a_dict of the unpickled object:\n{my_unpickled_object.a_dict}\n")
print(f"This is a_list of the unpickled object:\n{my_unpickled_object.a_list}\n")

This is a_dict of the unpickled object:
{'first': 'a', 'second': 2, 'third': [1, 2, 3]}

This is a_list of the unpickled object:
[1, 2, 3]



Most object types can be pickled but not all. Read more about pickling objects [here](https://realpython.com/python-pickle-module/).

## The Python `json` Module

Since its inception, JSON has quickly become the de facto standard for information exchange. 

Read more about JSON and how it works [here](https://www.json.org/json-en.html).

### Differences between `json` and `pickle` serialization:

- Most of the pickle module is written in C language and is specific to python only. JSON is derived from JavaScript, but it is not limited to JavaScript only (as the name suggests)
- Pickle supports binary serialization format, whereas JSON is for simple text serialization format.
- JSON is useful for common tasks and is limited to certain types of data. 
Thus, JSON cannot serialize and de-serialize every python object. 
But, pickle can serialize any arbitrary Python object like lists, tuples, and dictionaries. Even classes and methods can be serialized with pickle.
- Pickle's serialization process is faster than JSON

Example dictionary object we will learn to json serialize:

In [17]:
data = {
    "user1": {
        "name": "Shubham Lohiya",
        "age": 21,
        "Place": "Mumbai",
        "Interests": ['Star Wars', 'Food', 'Tech']
    },
    "user2": {
        "name": "Eeshaan Jain",
        "age": 19,
        "Place": "Pune",
        "Interests": ["Cubing", "Piano", "Machine Learning"]
    }
}

Importing the module:

In [18]:
import json

# Methods inside the json module:

### `json.dumps()` and `json.loads()`

These functions json serialize an object to or load an object from a **string**.

Let's see how this works:

In [19]:
encoded_data = json.dumps(data)
print(encoded_data)

{"user1": {"name": "Shubham Lohiya", "age": 21, "Place": "Mumbai", "Interests": ["Star Wars", "Food", "Tech"]}, "user2": {"name": "Eeshaan Jain", "age": 19, "Place": "Pune", "Interests": ["Cubing", "Piano", "Machine Learning"]}}


In [20]:
decoded_data = json.loads(encoded_data)
print(decoded_data["user1"]["name"])

Shubham Lohiya


### `json.dump()` and `json.load()`

These functions json serialize an object to or load an object from a **file object**.

Let's see how this works:

In [21]:
with open('encoded_data.json', 'w') as f:
    json.dump(data, f) # Encoding the object

In [22]:
with open('encoded_data.json', 'r') as f:
    decoded_data = json.load(f) # Decoding the object
    
print(decoded_data["user1"]["name"])

Shubham Lohiya
