<a href="https://colab.research.google.com/github/Yash-Malani-Rick/DATA-SCIENCE-COURSE-BY-CAMPUSX/blob/master/SESSION08DS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# File Handling and I/O in Python : -



File handling in Python allows you to create, read, write, update, and delete files directly from programs. It provides a way to store and access data permanently beyond program execution. Python uses the built-in open() function for file operations.

File Operations : -

1. Creating and Opening Files – Using open(filename, mode)

2. Reading from a File – Using read(), readline(), readlines()

3. Writing to a File – Using write(), writelines()

4. Closing a File – Using close() to free resources

5. Appending to a File – Using 'a' mode

6. Deleting Files – Using os.remove()



File Opening Modes : -

| Mode   | Description                                          |
| ------ | ---------------------------------------------------- |
| `'r'`  | Read-only (default); error if file does not exist    |
| `'w'`  | Write mode; creates file or overwrites existing file |
| `'a'`  | Append mode; creates file if it does not exist       |
| `'x'`  | Creates new file; error if file already exists       |
| `'b'`  | Binary mode (e.g., `'rb'`, `'wb'`)                   |
| `'t'`  | Text mode (default)                                  |
| `'r+'` | Read and write mode                                  |
| `'w+'` | Write and read mode                                  |
| `'a+'` | Append and read mode                                 |


File Handling Methods/Functions : -

| Method             | Description                                |
| ------------------ | ------------------------------------------ |
| `open()`           | Opens or creates a file                    |
| `close()`          | Closes the file                            |
| `read(size)`       | Reads specified number of characters/bytes |
| `readline()`       | Reads a single line                        |
| `readlines()`      | Reads all lines as a list                  |
| `write(string)`    | Writes string to file                      |
| `writelines(list)` | Writes multiple lines                      |
| `seek(pos)`        | Moves file pointer to specific position    |
| `tell()`           | Returns current cursor position            |
| `flush()`          | Flushes internal buffer                    |
| `truncate(size)`   | Resizes the file to given size             |


File Handling Code Examples :


In [1]:
# Creating and Writing to a File :

file = open("example.txt", "w")
file.write("Hello, this is a sample text file.\n")
file.write("This is line 2.")
file.close()

In [2]:
# Reading a File :

file = open("example.txt", "r")
print(file.read())        # Reads entire file
file.close()

Hello, this is a sample text file.
This is line 2.


In [3]:
# Reading Line by Line :

file = open("example.txt", "r")
print(file.readline())    # Reads first line
print(file.readline())    # Reads next line
file.close()

Hello, this is a sample text file.

This is line 2.


In [4]:
# Reading all Lines as a list :

file = open("example.txt", "r")
lines = file.readlines()
for line in lines:
    print(line.strip())
file.close()

Hello, this is a sample text file.
This is line 2.


In [5]:
# Appending to a File :

file = open("example.txt", "a")
file.write("\nThis line is added using append mode.")
file.close()

In [6]:
# Using 'with' Statement (Best Practice) :
# Automatically closes the file after use, avoiding manual close() calls.

with open("example.txt", "r") as file:
    data = file.read()
    print(data)

Hello, this is a sample text file.
This is line 2.
This line is added using append mode.


In [7]:
# File Pointer Position :

with open("example.txt", "r") as file:
    print(file.tell())   # Prints current position
    file.read(5)         # Reads 5 chars
    print(file.tell())   # Prints new position

0
5


In [8]:
# Truncating a File :

with open("example.txt", "w") as file:
    file.write("1234567890")
    file.truncate(5)  # File will have only first 5 characters

In [9]:
# Deleting a File :

import os
if os.path.exists("example.txt"):
    os.remove("example.txt")
else:
    print("File not found.")



---



Seek and Tell :

In Python, seek() and tell() are file handling methods used to manipulate and track the file pointer position.

* tell(): Returns the current position of the file pointer (in bytes) from the start of the file. It helps determine where reading or writing will occur next.

* seek(offset, whence=0): Moves the file pointer to a specific position.

  * offset: Number of bytes to move.

  * whence: Reference position (0=start, 1=current, 2=end).

These methods are essential for random file access, allowing precise control over reading and writing data within a file.



---



Summary :

* Use open() with proper mode to access a file.

* Always use close() or with statement to free resources.

* Modes control whether you read, write, append, or create a file.

* Python provides many methods (read(), write(), seek(), etc.) for fine control.

* Files can be text or binary, depending on mode.

---

Problems On Working with Text File : -

While working with text files in Python, several problems may occur. FileNotFoundError arises if the file path is incorrect or missing. PermissionError occurs when the program lacks read/write permissions. Data corruption can happen if files are not closed properly. Encoding issues, like UnicodeDecodeError, occur when reading files with mismatched encodings. Accidentally opening a file in write mode ('w') can erase existing data. Reading large files may cause memory overload. Concurrent file access by multiple processes may lead to race conditions. Improper handling of newline characters can cause formatting issues. These problems can be avoided with proper error handling and correct file modes.



---



Working with Binary Files : -

Binary files store data in binary format (0s and 1s) instead of plain text. They are often used for images, audio, video, executables, or serialized objects. Python handles them using the built-in open() function with mode 'rb' (read binary) or 'wb' (write binary).

Key Points :  

* Open in binary mode: 'rb', 'wb', 'ab' (append), 'rb+' (read/write).

* Read/Write methods: read(), readline() (rarely used for binary), write().

* Use with to ensure automatic file closing.

* Binary mode prevents Python from interpreting data as text, avoiding encoding issues.

In [10]:
# Code Example :
# Writing and Reading Binary File :

# Writing binary data
data = bytes([65, 66, 67, 68])  # ASCII values for A, B, C, D
with open("binary_file.bin", "wb") as f:
    f.write(data)

# Reading binary data
with open("binary_file.bin", "rb") as f:
    content = f.read()
    print(content)  # b'ABCD'


b'ABCD'




---



In [None]:
# Code Example for Copying a File In Binary Mode :

# Copy file in binary mode
with open("source_file.bin", "rb") as src:   # Open source in read-binary mode
    with open("copy_file.bin", "wb") as dest:  # Open destination in write-binary mode
        dest.write(src.read())  # Read all binary data and write to new file

print("File copied successfully in binary mode!")


Why binary mode for copying?

* Prevents encoding/decoding issues (important for images, videos, executables).

* Preserves exact byte data without modification.

* Works for any file type (not just text).

💡 If the file is very large, reading it in chunks is better to save memory :

In [None]:
# Copy large file in chunks
with open("big_source.bin", "rb") as src, open("big_copy.bin", "wb") as dest:
    while chunk := src.read(4096):  # Read 4KB at a time
        dest.write(chunk)




---



Working with Other DataTypes in File Handling :

When working with file handling in Python, you’re not limited to just text or binary files — you can also store and retrieve other data types like integers, floats, lists, tuples, dictionaries, or even entire objects.
This is done using serialization (converting Python objects into storable formats) and deserialization (reading them back).



1. Using str() and eval() / int() / float() for Simple Data Types :

  * You can write numbers or strings to a text file and convert them back while reading.

In [15]:
# Writing integers and floats to a file
with open("data.txt", "w") as f:
    f.write(str(100) + "\n")
    f.write(str(45.67) + "\n")

# Reading and converting back to numbers
with open("data.txt", "r") as f:
    int_val = int(f.readline())
    float_val = float(f.readline())

print("Integer:", int_val, "Float:", float_val)


Integer: 100 Float: 45.67


2. Storing Complex Data Types as Strings :

  * You can store lists, tuples, or dictionaries by converting them to strings and then back with eval() or ast.literal_eval() (safer).

In [14]:
import ast

# Writing list and dictionary to file
with open("complex_data.txt", "w") as f:
    f.write(str([1, 2, 3, 4]) + "\n")
    f.write(str({"name": "Yash", "age": 21}) + "\n")

# Reading and converting back
with open("complex_data.txt", "r") as f:
    my_list = ast.literal_eval(f.readline())
    my_dict = ast.literal_eval(f.readline())

print("List:", my_list)
print("Dictionary:", my_dict)


List: [1, 2, 3, 4]
Dictionary: {'name': 'Yash', 'age': 21}


3. Using pickle for Any Python Object :

  * pickle can serialize and deserialize any Python object (lists, sets, objects, etc.) in binary format.

In [13]:
import pickle

# Example object
student = {"name": "Yash", "age": 21, "marks": [85, 90, 95]}

# Writing object to binary file
with open("student.pkl", "wb") as f:
    pickle.dump(student, f)

# Reading object from binary file
with open("student.pkl", "rb") as f:
    loaded_student = pickle.load(f)

print("Loaded Object:", loaded_student)


Loaded Object: {'name': 'Yash', 'age': 21, 'marks': [85, 90, 95]}


4. Using json for Human-Readable Storage :

  * json works best for storing dictionaries and lists in text form, especially if you need cross-language compatibility.

In [12]:
import json

# Writing
data = {"name": "Yash", "skills": ["Python", "AI"], "score": 98}
with open("data.json", "w") as f:
    json.dump(data, f)

# Reading
with open("data.json", "r") as f:
    loaded_data = json.load(f)

print("JSON Data:", loaded_data)


JSON Data: {'name': 'Yash', 'skills': ['Python', 'AI'], 'score': 98}


Summary Table :

| Data Type         | Method to Store                   | File Mode       | Module Needed |
| ----------------- | --------------------------------- | --------------- | ------------- |
| int, float, str   | `str()` / `int()` / `float()`     | `"w"` / `"r"`   | No            |
| list, tuple, dict | `str()` + `ast.literal_eval()`    | `"w"` / `"r"`   | `ast`         |
| Any Python Object | `pickle.dump()` / `pickle.load()` | `"wb"` / `"rb"` | `pickle`      |
| JSON-compatible   | `json.dump()` / `json.load()`     | `"w"` / `"r"`   | `json`        |


---

Serialization : -

Serialization is the process of converting a Python object into a format that can be stored or transmitted (such as a file, database, or over a network) and later reconstructed back into the original object.
The reverse process is called deserialization.

Why Serialization is Needed :

* Save program data to a file (for persistence).

* Send Python objects over a network (e.g., API calls, sockets).

* Store complex data structures (lists, dicts, custom objects) in binary or text format.

* Share data between programs (possibly written in different languages).



Serialization Formats :

| Format     | Characteristics                      | Common Python Module    |
| ---------- | ------------------------------------ | ----------------------- |
| **Binary** | Compact, fast, Python-specific       | `pickle`                |
| **JSON**   | Human-readable, language-independent | `json`                  |
| **XML**    | Structured, verbose, cross-platform  | `xml.etree.ElementTree` |
| **CSV**    | Tabular text data                    | `csv`                   |


In [17]:
# Code Example :
# Serialization Using 'pickle' :

import pickle

data = {"name": "Yash", "age": 21, "skills": ["Python", "AI"]}

# Serialize (dump) to file
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

# Deserialize (load) from file
with open("data.pkl", "rb") as f:
    loaded_data = pickle.load(f)

print("Loaded Data:", loaded_data)


Loaded Data: {'name': 'Yash', 'age': 21, 'skills': ['Python', 'AI']}


In [18]:
# Code Example :
# Serialization Using 'json' :

import json

data = {"name": "Yash", "age": 21, "skills": ["Python", "AI"]}

# Serialize to JSON file
with open("data.json", "w") as f:
    json.dump(data, f)

# Deserialize from JSON file
with open("data.json", "r") as f:
    loaded_data = json.load(f)

print("Loaded Data:", loaded_data)


Loaded Data: {'name': 'Yash', 'age': 21, 'skills': ['Python', 'AI']}


In [19]:
# Code Example :
# Serialization Using Multiple Datatypes in One File :

import pickle

int_data = 42
list_data = [1, 2, 3]
dict_data = {"x": 1, "y": 2}

with open("multi.pkl", "wb") as f:
    pickle.dump(int_data, f)
    pickle.dump(list_data, f)
    pickle.dump(dict_data, f)

with open("multi.pkl", "rb") as f:
    i = pickle.load(f)
    l = pickle.load(f)
    d = pickle.load(f)

print(i, l, d)


42 [1, 2, 3] {'x': 1, 'y': 2}


✅ Key Points:

* Serialization = Python object → Storable/transferable format.

* Deserialization = Storable/transferable format → Python object.

* pickle is Python-specific but supports any Python object.

* json is text-based, more universal, but supports only JSON-compatible types (no custom Python objects directly).



---

De - Serialization : -

Deserialization is the reverse process of serialization — it means converting stored or transmitted data back into a Python object so it can be used in your program.

If serialization is “packing data into a file or string,” deserialization is “unpacking it back into its original form.”

Why Deserialization is Used :

* Load saved program state from a file (like game progress).

* Read data sent over a network.

* Restore objects for further processing.

* Share data between different programs or languages.

In [20]:
# Code Example :
# Deserialization Using 'pickle' :

import pickle

# Assume data.pkl was created earlier using pickle.dump()
with open("data.pkl", "rb") as f:
    loaded_data = pickle.load(f)

print("Deserialized Data:", loaded_data)


Deserialized Data: {'name': 'Yash', 'age': 21, 'skills': ['Python', 'AI']}


In [21]:
# Code Example :
# Deserialization with 'json' :

import json

# Assume data.json was created earlier using json.dump()
with open("data.json", "r") as f:
    loaded_data = json.load(f)

print("Deserialized Data:", loaded_data)


Deserialized Data: {'name': 'Yash', 'age': 21, 'skills': ['Python', 'AI']}


Key Points:

* Serialization = Object → Storable format.

* Deserialization = Storable format → Object.

* Binary deserialization (pickle) is faster but Python-specific.

* Text-based deserialization (json, xml, csv) is more universal but limited to supported data types.



---



Serialization vs Deserialization In Python :

| **Aspect**                    | **Serialization**                                                                    | **Deserialization**                                                           |
| ----------------------------- | ------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------- |
| **Definition**                | Converting a Python object into a storable or transmittable format (binary or text). | Converting stored/transmitted data back into a Python object.                 |
| **Purpose**                   | Save data to files, send over network, store in DB, share between programs.          | Restore data for use, reload program state, receive and use transmitted data. |
| **Input**                     | Python object (list, dict, custom class object, etc.).                               | Serialized file/string/data.                                                  |
| **Output**                    | Serialized file/string/data.                                                         | Original Python object.                                                       |
| **Binary Example** (`pickle`) | `pickle.dump(obj, file)`                                                             | `pickle.load(file)`                                                           |
| **Text Example** (`json`)     | `json.dump(obj, file)`                                                               | `json.load(file)`                                                             |
| **Pros**                      | Saves exact object structure; can be fast in binary mode.                            | Restores exact objects easily.                                                |
| **Cons**                      | Binary is Python-specific; JSON limited to certain data types.                       | Requires same format & structure as during serialization.                     |


1. Binary Serialization Example (pickle) :

In [22]:
import pickle

# Serialization
data = {"name": "Alice", "age": 25, "skills": ["Python", "ML"]}
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

# Deserialization
with open("data.pkl", "rb") as f:
    loaded_data = pickle.load(f)

print("Deserialized:", loaded_data)


Deserialized: {'name': 'Alice', 'age': 25, 'skills': ['Python', 'ML']}


2. Text Serialization Example (json) :

In [23]:
import json

# Serialization
data = {"name": "Bob", "age": 30, "skills": ["JavaScript", "React"]}
with open("data.json", "w") as f:
    json.dump(data, f)

# Deserialization
with open("data.json", "r") as f:
    loaded_data = json.load(f)

print("Deserialized:", loaded_data)


Deserialized: {'name': 'Bob', 'age': 30, 'skills': ['JavaScript', 'React']}


Quick Analogy :

* Serialization → Like packing clothes into a suitcase.

* Deserialization → Like unpacking clothes when you reach your destination.





---



With Tuples as the input datatype when we do the serialization for it , it gets stored in the file in the format of a list and when we deserialize it again we get a list only .




---



Serialization and Deserialization of an Custom Object or DataType :

1. Using pickle (Binary Format)
pickle can handle almost any Python object, including custom classes.

In [24]:
import pickle

# Custom class
class Student:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return f"Student(name={self.name}, age={self.age})"

# Create object
student_obj = Student("Alice", 22)

# --- Serialization ---
with open("student.pkl", "wb") as f:
    pickle.dump(student_obj, f)

# --- Deserialization ---
with open("student.pkl", "rb") as f:
    loaded_student = pickle.load(f)

print("Deserialized Object:", loaded_student)


Deserialized Object: Student(name=Alice, age=22)


✅ Works for any Python object without extra conversion.

2. Using json (Text Format)
JSON cannot directly store custom objects — you must convert them to dict form first (and back later).

In [25]:
import json

# Custom class
class Student:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def to_dict(self):
        return {"name": self.name, "age": self.age}
    @classmethod
    def from_dict(cls, data):
        return cls(data["name"], data["age"])

student_obj = Student("Bob", 23)



Key Takeaways :

* pickle → Best when working only in Python and want exact object restoration.

* json → Best for portability across languages but needs manual conversion.

* Custom objects must be handled carefully so attributes are stored & restored correctly.



---



Pickling : -

Pickling is the process of converting a Python object (like lists, dictionaries, or even custom objects) into a byte stream so it can be stored in a file or sent over a network. It is done using the pickle module. This byte stream can later be unpickled (deserialized) to recreate the original object.



In [26]:
import pickle

data = {"name": "Alice", "age": 25, "marks": [90, 85, 88]}

# Serialize (pickle) the object into a file
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)


Unpickling (Deserialization) : -

Unpickling is the reverse of pickling — it reads the byte stream and reconstructs the original Python object.

In [27]:
# Deserialize (unpickle) the object from file
with open("data.pkl", "rb") as f:
    loaded_data = pickle.load(f)

print(loaded_data)


{'name': 'Alice', 'age': 25, 'marks': [90, 85, 88]}


Key Points :

*  Pickle functions:

    * pickle.dump(obj, file) → Writes pickled object to a binary file.

    * pickle.load(file) → Reads pickled object from a binary file.

    * pickle.dumps(obj) → Returns pickled object as a bytes object.

    * pickle.loads(bytes) → Loads object from a bytes object.

* Pros: Works with complex Python objects, very easy to use.

* Cons: Python-specific, not secure against untrusted data (can execute arbitrary code when unpickling).





---

