## File I/O Basics
**Data Types:**

- Text: Unicode chars (e.g., '12345' in UTF-8/ASCII)
- Binary: Raw bytes (e.g., number 12345)
  
**File Types:**

- Text Files: Human-readable (e.g., source code, config files)
- Binary Files: Non-readable data (e.g., images, multimedia)
  
**Process:**

- Open: Connects program to file
- Read/Write: Handles data based on type
- Close: Completes operations, frees resources

In [None]:
Writing to a File ---> `.txt` extension (Notepad).

In [16]:
# Case 1 - File Not Present

f = open('sample.txt', 'w')
f.write('Hello world')
f.close()

# Create file in current dir

In [17]:
# Error: File Closed

f.write('hello')

ValueError: I/O operation on closed file.

In [18]:
# Write multiline strings to a file

f = open('sample1.txt', 'w')
f.write('hello world')
f.write('\n how are you?')
f.close()

In [19]:
# Case 2 - File Overwrite in Write Mode ('w')

f = open('sample.txt', 'w')
f.write('salman khan')
f.close()

# Note: Opening in 'w' mode replaces all existing content in 'sample.txt'.

## How `open()` Works in Python
Handles file I/O; interacts with disk files.

Example: `f = open('sample.txt', 'w')` - opens 'sample.txt' in write mode.

**File Access & RAM Interaction:** File loaded from disk (ROM) to RAM buffer.

**File Operations & Modes:** Modes (e.g., 'w' for write) determine file interactions `(f.write('salman khan')` writes to RAM).

**Data Integrity:** `f.close()` saves buffer changes back to disk.

In [20]:
# Problem with 'w' mode ---> Overwrites file content.
# To preserves existing content, use 'a' mode (append).

f = open('sample1.txt', 'a')
f.write('\nI am fine')
f.close()

In [21]:
# Write Multiple Lines to a File

L = ['hello\n','hi\n','how are you\n', 'I am fine']

f = open('sample.txt', 'w')
f.writelines(L) # Efficiently writes multiple lines
f.close()

When you use f.close() to close a file, it serves two main purposes:

1. **Memory Management:**
    
- Releases RAM resources.
- Crucial for large/multiple files.

2. **Security:**
- Closes file buffers.
- Prevents unauthorized access.
  
*Always use `f.close()` after file operations; Manages memory & security.*

## Reading from Files

1. **read()**: Reads all content into a single string. Efficient for small files.
- **Pros**: Simple.
- **Cons**: Memory-heavy for large files.

2. **readline()**: Reads one line at a time. Good for large files and sequential processing.
- **Pros**: Memory-efficient.
- **Cons**: Slower for full content access.

In [22]:
# `read()` Usage

f = open('sample.txt', 'r')
s = f.read()
print(s)
f.close()

# NOTE : File I/O handles data as strings.
#       `txt` files process data as text only, no other formats.

hello
hi
how are you
I am fine


In [23]:
# Read up to n chars

f = open('sample.txt', 'r')
s = f.read(10)
print(s)
f.close()

hello
hi
h


In [24]:
# Using `readline()`

f = open('sample.txt', 'r')
print(f.readline(), end='') # Avoid auto newline
print(f.readline(), end='')
f.close()

hello
hi


In [None]:
`read()` Method:

Smaller files    ---> loads entire content.

Immediate access ---> full data available.

Memory use       ---> risky for large files.

`readline()` Method:

Large files      ---> processes line-by-line.

Memory-efficient ---> avoids full file load.

Handles datasets ---> prevents overflow.

In [25]:
# Count Lines in File Efficiently ---> Avoid readline() per line; use custom code for efficiency.

f = open('sample.txt', 'r')
while True:
  data = f.readline()
  if data == '':
    break
  else:
    print(data, end='')
f.close()

hello
hi
how are you
I am fine

## Context Manager (`with`)

Efficient resource management (e.g., files).

`with` ensures auto cleanup, no manual file close needed.

**Purpose of `with` Statement**

- **File Management**: Handles file operations (read/write).
- **Resource Release**: Auto-closes files, freeing system resources.

**Avoids**

- **Memory Leaks**: Manual closure prevents leaks
- **File Locking**: Prevents locking issues

**Benefits:**

- **Automated Cleanup**: Ensures auto-closure of files
- **Exception Handling**: Closes files if exceptions occur
- **Readability**: Clarifies file access scopes
- **Reliability**: Reduces bugs, ensures robust resource management

In [2]:
# `with` Statement

with open('sample1.txt', 'w') as f:
  f.write('selmon bhai')

In [3]:
f.write('hello')

ValueError: I/O operation on closed file.

In [2]:
# `f.readline()`

with open('sample.txt', 'r') as f:
  print(f.readline())

hello



In [1]:
# Reading 10 Characters at a Time

with open('sample.txt', 'r') as f:
    print(f.read(10))  # First 10 chars
    print(f.read(10))  # Next 10 chars
    print(f.read(10))  # Next 10 chars
    print(f.read(10))  # Next 10 chars
    # Each `print(f.read(10))` reads next 10 chars sequentially.
    
# Buffering tracks processed chars; `read()` resumes from buffer.

hello
hi
h
ow are you

I am fine



## File Processing Strategy for Large Files

*Crucial for files > RAM.*

**Chunk-Based Processing**
- Process in chunks, not all at once. e.g., 10 GB file, 8 GB RAM ---> 2000 chars/chunk.

**Advantages**
- Memory Efficiency: RAM used for one chunk only.
- Scalability: Handles files > RAM.
- Performance: Avoids system slowdowns.

In [1]:
# Purpose: Save dataset to file (avoid memory load).

big_L = ['hello world ' for i in range(1000)]

with open('big.txt', 'w') as f:
  f.writelines(big_L)

In [1]:
with open('big.txt', 'r') as f:
  chunk_size = 10
  while len(f.read(chunk_size)) > 0:
    print(f.read(chunk_size), end='***')
    f.read(chunk_size) # Skip to next chunk

# Handles large files, processes in chunks, avoiding memory overload.
# Libraries like Pandas, Keras use chunk-based data processing.

d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo***o world he***d hello wo**

In [1]:
# Seek and Tell Function

with open('sample.txt', 'r') as f:
    f.seek(15)         # Move to 15th char
    print(f.read(10))  # Read 10 chars
    print(f.tell())    # Position after read
    print(f.read(10))  # Read next 10 chars
    print(f.tell())    # New position

are you
I 
26
am fine
33


In [None]:
`seek` ---> Set desired location within context.
       ---> Like YouTube red line for precise navigation.
       ---> Moves to specified points in system.

`tell` ---> Reveals current position/status.
       ---> Acts as a marker indicating present state.
       ---> Provides feedback without changing position.

# `seek` navigates to points (YouTube red line analogy).
# `tell` shows current position/status.

In [1]:
# Seek during write

with open('sample.txt', 'w') as f:
    f.write('Hello')
    f.seek(0)          # Cursor to start
    f.write('Xa')      # Overwrite 'He' ---> 'Xa'

## Limitations of Text Mode
- **Binary Files**: Incompatible with non-text data (e.g., images, binaries).
- **Data Type Efficiency**: Inefficient for non-text types (integers, floats, lists, tuples).

**Binary Files:**
- Contain non-textual binary data.
- Text Mode cannot process these effectively.

**Non-Textual Data:**
- Incompatible with Text Mode.
- Requires specific methods for management.

**Structured Data:**
- Struggles with types like integers, floats, lists, tuples.
- Needs specialized handling.

In [1]:
# Read Binary File

with open('screenshot1.png', 'r') as f:
  f.read()

FileNotFoundError: [Errno 2] No such file or directory: 'screenshot1.png'

In [None]:
# Binary File I/O

with open('screenshot1.png', 'rb') as f:          # Read binary
    with open('screenshot_copy.png', 'wb') as wf: # Write binary
        wf.write(f.read())

In [None]:
# Working with a Large Binary File

In [None]:
# Working with Different Data Types

with open('sample.txt', 'w') as f:
    f.write(str(5))

# Error: Text must be Unicode; ensure data is a string.