## File I/O Basics

**Data Types**:
- *Text*: Unicode chars (e.g., '12345' in UTF-8/ASCII)
- *Binary*: Raw bytes (e.g., number 12345)

**File Types**:
- *Text Files*: Human-readable (e.g., source code, config files)
- *Binary Files*: Non-readable data (e.g., images, multimedia)

**Process**:
1. *Open*: Connects program to file
2. *Read/Write*: Handles data based on type
3. *Close*: Completes operations, frees resources

In [None]:
Writing to a File ---> `.txt` extension (Notepad).

In [None]:
# Case 1 - File Not Present

f = open('sample.txt', 'w')
f.write('Hello world')
f.close()

# Create file in current dir

In [None]:
# Error: File Closed

f.write('hello')

In [None]:
# Write multiline strings to a file

f = open('sample1.txt', 'w')
f.write('hello world')
f.write('\n how are you?')
f.close()

In [None]:
# Case 2 - File Overwrite in Write Mode ('w')

f = open('sample.txt', 'w')
f.write('salman khan')
f.close()

# Note: Opening in 'w' mode replaces all existing content in 'sample.txt'.

## How `open()` Works in Python

Handles file I/O; interacts with disk files.

Example: `f = open('sample.txt', 'w')` - opens 'sample.txt' in write mode.

**File Access & RAM Interaction:** File loaded from disk (ROM) to RAM buffer.

**File Operations & Modes:** Modes (e.g., 'w' for write) determine file interactions (`f.write('salman khan')` writes to RAM).

**Data Integrity:** `f.close()` saves buffer changes back to disk.

In [None]:
`open()`  ---> File in RAM.

`write()` ---> Modify RAM buffer.

`close()` ---> Save to disk.

**Source:** [Python Documentation](https://docs.python.org/3/library/functions.html#open).

In [None]:
# Problem with 'w' mode ---> Overwrites file content.
# To preserves existing content, use 'a' mode (append).

f = open('/content/sample1.txt', 'a')
f.write('\nI am fine')
f.close()

In [None]:
# Write Multiple Lines to a File

L = ['hello\n','hi\n','how are you\n', 'I am fine']

f = open('/content/temp/sample.txt', 'w')
f.writelines(L) # Efficiently writes multiple lines
f.close()

When you use `f.close()` to close a file, it serves two main purposes:

1. **Memory Management:**
- Releases RAM resources.
- Crucial for large/multiple files.

2. **Security:**
- Closes file buffers.
- Prevents unauthorized access.

*Always use `f.close()` after file operations; Manages memory & security.*

## Reading from Files

1. **`read()`**: Reads all content into a single string. Efficient for small files.

   **Pros**: Simple. **Cons**: Memory-heavy for large files.

2. **`readline()`**: Reads one line at a time. Good for large files and sequential processing.

   **Pros**: Memory-efficient. **Cons**: Slower for full content access.

In [None]:
# `read()` Usage

f = open('/content/sample.txt', 'r')
s = f.read()
print(s)
f.close()

# NOTE : File I/O handles data as strings.
#       `txt` files process data as text only, no other formats.

hello
hi
how are you
I am fine


In [None]:
# Read up to n chars

f = open('/content/sample.txt', 'r')
s = f.read(10)
print(s)
f.close()

hello
hi
h


In [None]:
# Using `readline()`

f = open('/content/sample.txt', 'r')
print(f.readline(), end='') # Avoid auto newline
print(f.readline(), end='')
f.close()

hello
hi


In [None]:
`read()` Method:

Smaller files    ---> loads entire content.

Immediate access ---> full data available.

Memory use       ---> risky for large files.

`readline()` Method:

Large files      ---> processes line-by-line.

Memory-efficient ---> avoids full file load.

Handles datasets ---> prevents overflow.

In [None]:
# Count Lines in File Efficiently ---> Avoid readline() per line; use custom code for efficiency.

f = open('/content/sample.txt', 'r')
while True:
  data = f.readline()
  if data == '':
    break
  else:
    print(data, end='')
f.close()

hello
hi
how are you
I am fine

## Context Manager (`with`)

Efficient resource management (e.g., files).

`with` ensures auto cleanup, no manual file close needed.
  
**Purpose of `with` Statement**
- **File Management**: Handles file operations (read/write).
- **Resource Release**: Auto-closes files, freeing system resources.
  
**Avoids**
- **Memory Leaks**: Manual closure prevents leaks
- **File Locking**: Prevents locking issues
  
**Benefits**:
- **Automated Cleanup**: Ensures auto-closure of files
- **Exception Handling**: Closes files if exceptions occur
- **Readability**: Clarifies file access scopes
- **Reliability**: Reduces bugs, ensures robust resource management

In [None]:
# `with` Statement

with open('/content/sample1.txt', 'w') as f:
  f.write('selmon bhai')

In [None]:
f.write('hello')

ValueError: ignored

In [None]:
# `f.readline()`

with open('/content/sample.txt', 'r') as f:
  print(f.readline())

hello



In [None]:
# Reading 10 Characters at a Time

with open('sample.txt', 'r') as f:
    print(f.read(10))  # First 10 chars
    print(f.read(10))  # Next 10 chars
    print(f.read(10))  # Next 10 chars
    print(f.read(10))  # Next 10 chars
    # Each `print(f.read(10))` reads next 10 chars sequentially.
    
# Buffering tracks processed chars; `read()` resumes from buffer.

hello
hi
h
ow are you

I am fine



## File Processing Strategy for Large Files

*Crucial for files > RAM.*

**Chunk-Based Processing**
- Process in chunks, not all at once. e.g., 10 GB file, 8 GB RAM ---> 2000 chars/chunk.

**Advantages**
- *Memory Efficiency*: RAM used for one chunk only.
- *Scalability*: Handles files > RAM.
- *Performance*: Avoids system slowdowns.

In [None]:
# Purpose: Save dataset to file (avoid memory load).

big_L = ['hello world ' for i in range(1000)]

with open('big.txt', 'w') as f:
  f.writelines(big_L)

In [None]:
with open('big.txt', 'r') as f:
  chunk_size = 10
  while len(f.read(chunk_size)) > 0:
    print(f.read(chunk_size), end='***')
    f.read(chunk_size) # Skip to next chunk

# Handles large files, processes in chunks, avoiding memory overload.
# Libraries like Pandas, Keras use chunk-based data processing.

d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo**

In [None]:
# Seek and Tell Function

with open('sample.txt', 'r') as f:
    f.seek(15)         # Move to 15th char
    print(f.read(10))  # Read 10 chars
    print(f.tell())    # Position after read
    print(f.read(10))  # Read next 10 chars
    print(f.tell())    # New position

e you
I am
25
 fine
30


In [None]:
`seek` ---> Set desired location within context.
       ---> Like YouTube red line for precise navigation.
       ---> Moves to specified points in system.

`tell` ---> Reveals current position/status.
       ---> Acts as a marker indicating present state.
       ---> Provides feedback without changing position.

# `seek` navigates to points (YouTube red line analogy).
# `tell` shows current position/status.

In [None]:
# Seek during write

with open('sample.txt', 'w') as f:
    f.write('Hello')
    f.seek(0)          # Cursor to start
    f.write('Xa')      # Overwrite 'He' ---> 'Xa'

## Limitations of Text Mode

- **Binary Files**: Incompatible with non-text data (e.g., images, binaries).
- **Data Type Efficiency**: Inefficient for non-text types (integers, floats, lists, tuples).

**Binary Files**: 
- Contain non-textual binary data.
- Text Mode cannot process these effectively.

**Non-Textual Data**:
- Incompatible with Text Mode.
- Requires specific methods for management.

**Structured Data**:
- Struggles with types like integers, floats, lists, tuples.
- Needs specialized handling.

In [None]:
# Read Binary File

with open('screenshot1.png', 'r') as f:
  f.read()

UnicodeDecodeError: ignored

In [None]:
# Binary File I/O

with open('screenshot1.png', 'rb') as f:          # Read binary
    with open('screenshot_copy.png', 'wb') as wf: # Write binary
        wf.write(f.read())

In [None]:
# Working with a Large Binary File

In [None]:
# Working with Different Data Types

with open('sample.txt', 'w') as f:
    f.write(str(5))

# Error: Text must be Unicode; ensure data is a string.

TypeError: ignored

In [None]:
with open('sample.txt', 'w') as f:
  f.write('5')

In [None]:
with open('sample.txt', 'r') as f:
  print(int(f.read()) + 5) # convert read() output to int

10


In [None]:
# More Complex Data

d = {
    'name':'nitish',
     'age':33,
     'gender':'male'
}

with open('sample.txt', 'w') as f:
  f.write(str(d))

In [None]:
with open('sample.txt', 'r') as f:
  print(dict(f.read()))

# Error: str ---> dict

ValueError: ignored

In [None]:
# Text-based Limitations for Complex Data Storage:

1. Storage    ---> Plain Text Files ideal for simple textual data.
                   Complex Data (e.g., Python Dicts) contains structured data with key-value pairs.

2. Conversion ---> Saving Dicts with `write()` converts dicts to strings.
                  `{'name': 'John', 'age': 30}` ---> `"{'name': 'John', 'age': 30}"`
                   This flattening loses structure and format.

3. Retrieval  ---> Retrieval returns as a string; requires parsing to reconstruct original dict.
                   Error-Prone parsing can introduce errors.

# NOTE: for Simple Data use text files; for Complex Data use serialization libraries or binary formats.

## JSON Serialization & Deserialization

**Serialization**:

Convert Python data ---> JSON.

`json.dumps()`

Human-readable & machine-parsable.

**Deserialization**:

Convert JSON ---> Python.

`json.loads()`

Manipulate JSON data in Python.

## What is JSON?

JavaScript Object Notation

Widely adopted in Web apps, APIs, data interchange.

Simple syntax, supports key-value pairs, arrays, nested objects.

```json
{
  "d": {
    "results": [
      {
        "_metadata": {
          "type": "Employee Details. Employee"
        },
        "UserID": "E12012",
        "RoleCode": "35"
      }
    ]
  }
}
```

*JSON is a widely-used text format across languages.*

In [None]:
# JSON Serialization

# List to JSON
import json

L = [1, 2, 3, 4]

with open('demo.json', 'w') as f:
  json.dump(L, f) # Serialize L to 'demo.json'

In [None]:
# Dict to JSON
d = {
    'name':'nitish',
     'age':33,
     'gender':'male'
}

with open('demo.json', 'w') as f:
  json.dump(d, f, indent=4) # Serialize dict d with indentation

In [None]:
# Deserialization

import json

with open('demo.json', 'r') as f:
  d = json.load(f)
  print(d)
  print(type(d))

{'name': 'nitish', 'age': 33, 'gender': 'male'}
<class 'dict'>


`Serialization` and `Deserialization` Convert complex data (lists, dicts, 2D dicts, tuples, sets) to/from JSON.

`Serialization`: Complex ---> JSON (for storage).

`Deserialization`: JSON ---> Original (for retrieval).

Handles complex data efficiently, overcoming string-based limitations.

In [None]:
# Serialize/Deserialize Tuple

import json

t = (1, 2, 3, 4, 5)

with open('demo.json', 'w') as f:
  json.dump(t, f)

In [None]:
# Note: Serialization/Deserialization

Serialize tuple   ---> List (using `dump`)

Deserialize       ---> List (not tuple)

Need tuple later? ---> Explicit conversion required

In [None]:
# Serialize/Deserialize Nested Dict

d = {
    'student':'nitish',
     'marks':[23, 14, 34, 45, 56]
}

with open('demo.json', 'w') as f:
  json.dump(d, f)

## Serializing & Deserializing Custom Objects

In [None]:
class Person:

  def __init__(self, fname, lname, age, gender):
    self.fname = fname
    self.lname = lname
    self.age = age
    self.gender = gender

# Print format:
# Name: {fname} {lname}
# Age: {age}
# Gender: {gender}

In [None]:
person = Person('Nitish', 'Singh', 33, 'male')

Python serializes built-in types natively (e.g., dicts).

*Custom Classes Needs Custom Serialization (Explicit).*

In [None]:
# String Representation

import json

def show_object(person):
  if isinstance(person, Person):
    return "{} {} age -> {} gender -> {}".format(person.fname, person.lname, person.age, person.gender)

with open('demo.json', 'w') as f:
  json.dump(person, f, default=show_object)

In [None]:
# Dictionary Representation

import json

def show_object(person):
  if isinstance(person, Person):
    return {'name':person.fname + ' ' + person.lname, 'age':person.age, 'gender':person.gender}

with open('demo.json', 'w') as f:
  json.dump(person, f, default=show_object, indent=4)

In [None]:
# indent attribute

# As a dict

In [None]:
# Deserializing JSON

import json

with open('demo.json', 'r') as f:
  d = json.load(f)
  print(d)
  print(type(d))

{'name': 'Nitish Singh', 'age': 33, 'gender': 'male'}
<class 'dict'>


Until now, we've printed **Python Custom Objects** (dicts, strings) in specific formats. 

**Cross-file Object Usage** i.e. Direct use of class/obj from another file not possible.

**Solution:** Convert object to binary format for cross-file compatibility.

## Pickling and Unpickling

In [None]:
+-------------------------------------------+----------------------------------------------+
|                 Pickling                  |               Unpickling                     |
+-------------------------------------------+----------------------------------------------+
|                                           |                                              |
| Serialize Python objects to byte stream.  | Deserialize byte stream to original objects. |
|                                           |                                              |
| Byte stream compactly represents objects. | Reconstructs objects/data structures.        |
|                                           |                                              |
| Enables storage/transmission of objects.  | Restores objects for use by Python.          |
|                                           |                                              |
+------------------------------------------------------------------------------------------+
|                                        Purpose                                           |
+------------------------------------------------------------------------------------------+
|                                           |                                              |
| Convert objects to portable byte format.  | Restore objects from byte format.            |
|                                           |                                              |
| Save/load data, caching, IPC.             | Save/load data, caching, IPC.                |
|                                           |                                              |
+------------------------------------------------------------------------------------------+
|                                      Applications                                        |
+------------------------------------------------------------------------------------------+
|                                           |                                              |
| Save/load complex data.                   | Restore complex data.                        |
|                                           |                                              |
| Cache objects.                            | Rebuild cached objects.                      |
|                                           |                                              |
| Transmit objects over networks.           | Handle transmitted objects.                  |
|                                           |                                              |
+------------------------------------------------------------------------------------------+

In [None]:
class Person:

  def __init__(self, name, age):
    self.name = name
    self.age = age

  def display_info(self):
    print('Hi my name is', self.name, 'and I am ', self.age, 'years old')

In [None]:
p = Person('nitish', 33)

In [None]:
# Pickle Dump

import pickle
with open('person.pkl', 'wb') as f:
  pickle.dump(p, f)

In [None]:
# Pickle Load

import pickle
with open('person.pkl', 'rb') as f:
  p = pickle.load(f)

p.display_info()

Hi my name is nitish and I am  33 years old


In [None]:
Obj  ---> Bin File

Send ---> Extract ---> Use

Works like original

## Pickle vs JSON

In [None]:
+-----------------------------------------------+-------------------------------------------------+
|                    Pickle                     |                      JSON                       |
+-----------------------------------------------+-------------------------------------------------+
|                                               |                                                 |
| Binary format; Python-specific.               | Text-based; cross-platform.                     |
|                                               |                                                 |
| Non-human-readable, Python-only.              | Human-readable, interoperable.                  |
|                                               |                                                 |
| Potential security risks with untrusted data. | Safer for untrusted data.                       |
|                                               |                                                 |
| Efficient for complex Python structures.      | Ideal for web APIs, configs, and data exchange. |
|                                               |                                                 |
+-----------------------------------------------+-------------------------------------------------+