
# Python File Handling




## Table of Contents
1. [Why `with open()`?](#why-with-open)
2. [Reading Text Files](#reading-text-files)
3. [Writing Text Files](#writing-text-files)
4. [CSV Files: Read & Write (built-in `csv` module)](#csv-files)
5. [Encodings 101 (UTF‑8, errors, BOM)](#encodings)
6. [Mini Exercises](#exercises)
7. [Common Pitfalls & Tips](#pitfalls)



## 1. Why `with open()`? <a id="why-with-open"></a>

`with open(path, mode, encoding=...) as f:` is the **safest** way to work with files.

- It **automatically closes** the file, even if an error occurs.
- It limits the file object's lifetime to the `with` block, reducing bugs.
- It makes your code shorter and clearer.

**Common modes:**
- `'r'` – read (default), error if file does not exist
- `'w'` – write (create/overwrite)
- `'a'` – append (create if missing)
- `'x'` – create (error if exists)
- add `'b'` for binary (e.g., `'rb'`, `'wb'`)

**Always set an encoding for text files (e.g., `encoding='utf-8'`).**


## Reading Text Files

Text files store **human-readable data** such as letters, numbers, and symbols.  
Python provides several methods to read them:

- **`.read()`** → Reads the **entire file** as one string.  
- **`.readline()`** → Reads the **next single line**.  
- **`for line in file:`** → Iterates **line by line** (best for large files).  
- **`.readlines()`** → Reads all lines into a **list of strings**.

### When to use what
- `.read()` → Small files, when you need all content at once.  
- `.readline()` → Step through a file gradually.  
- `for line in file` → Efficient for large files or logs.  
- `.readlines()` → Quick random access to multiple lines.


In [None]:
# Setup: create a sample text file
from pathlib import Path

data_dir = Path('/mnt/data')
data_dir.mkdir(exist_ok=True)
sample_txt = data_dir / 'sample.txt'

lines = [
    "Hello, world!\n",
    "This is a sample file.\n",
    "Python makes file handling easy.\n",
    "Accented: café, naïve, jalapeño\n",
]

with open(sample_txt, 'w', encoding='utf-8') as f:
    f.writelines(lines)

sample_txt, sample_txt.exists()


### 2.1 Read the whole file with `.read()`
Use when the file is small enough to fit in memory.


In [None]:
with open(sample_txt, 'r', encoding='utf-8') as f:
    content = f.read()

print("Type:", type(content))
print("--- File contents ---")
print(content)


### 2.2 Read line-by-line with `.readline()` or iterate
Use this when you want to process one line at a time.


In [None]:
# Using a loop
with open(sample_txt, 'r', encoding='utf-8') as f:
    for i, line in enumerate(f, start=1):
        print(f"{i:02d}: {line.strip()}")


### 2.3 Read all lines as a list with `.readlines()`
Useful if you need random access to lines.


In [None]:
with open(sample_txt, 'r', encoding='utf-8') as f:
    lines_list = f.readlines()

print(type(lines_list), "length:", len(lines_list))
print(lines_list[:2])  # preview


## Writing Text Files

To write data to a file, open it in the appropriate **mode**:

- `'w'` → **Write**: Creates a new file or **overwrites** if it exists.  
- `'a'` → **Append**: Adds new content at the **end** of the file.  
- `'x'` → **Exclusive create**: Errors out if the file already exists.  
- Add `'b'` → **Binary mode** (e.g., `'wb'` for images).

### Important Notes
- Writing does **not** add newlines automatically — you must include `\n` where needed.  
- Overwriting with `'w'` will delete existing content without warning.  
- Use `'a'` when keeping logs or history.



### 3.1 Overwrite with `'w'` mode
If the file exists, its contents are replaced.


In [None]:
new_txt = data_dir / 'output_overwrite.txt'

with open(new_txt, 'w', encoding='utf-8') as f:
    f.write("First line\n")
    f.write("Second line\n")

print("Wrote:", new_txt)


### 3.2 Append with `'a'` mode
Adds to the end of the file or creates a new file if missing.


In [None]:
append_txt = data_dir / 'output_append.txt'

with open(append_txt, 'a', encoding='utf-8') as f:
    f.write("Log entry 1\n")

with open(append_txt, 'a', encoding='utf-8') as f:
    f.write("Log entry 2\n")

print(append_txt.read_text(encoding='utf-8'))


### 3.3 Write many lines with `writelines()`
Remember: **you** must include `\n` if you want newlines.


In [None]:
writelines_txt = data_dir / 'output_writelines.txt'
lines = [f"Item {i}\n" for i in range(1, 6)]

with open(writelines_txt, 'w', encoding='utf-8') as f:
    f.writelines(lines)

print(writelines_txt.read_text(encoding='utf-8'))

## CSV Files: Read & Write (built-in `csv` module)

CSV (**Comma-Separated Values**) files are plain-text spreadsheets where each row is separated by a newline and columns are separated by commas.

### Why CSV?
- Simple, lightweight, human-readable.
- Widely used in data exchange between Excel, databases, and Python.





### Writing CSVs
- `csv.writer` → Writes rows from lists.  
- `csv.DictWriter` → Writes rows from dictionaries with headers.  

In [1]:
import csv

csv_path = 'people.csv'

rows = [
    ['name', 'age', 'city'],
    ['Alice', 30, 'Boston'],
    ['Bob', 25, 'Austin'],
    ['Zoë', 28, 'Zürich'],  # non-ASCII characters
]

with open(csv_path, 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerows(rows)

print("Wrote CSV:", csv_path)

Wrote CSV: people.csv


### Reading CSVs
- `csv.reader` → Reads rows as lists (`['Alice', '30', 'Boston']`).  
- `csv.DictReader` → Reads rows as dictionaries (`{'name': 'Alice', 'age': '30', 'city': 'Boston'}`).

In [2]:
with open(csv_path, 'r', newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    data = list(reader)

data

[['name', 'age', 'city'],
 ['Alice', '30', 'Boston'],
 ['Bob', '25', 'Austin'],
 ['Zoë', '28', 'Zürich']]


### 4.3 Read CSV (as dicts) with `csv.DictReader`
Access columns by **name**—often easier to work with.


In [3]:
with open(csv_path, 'r', newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    people = list(reader)

people

[{'name': 'Alice', 'age': '30', 'city': 'Boston'},
 {'name': 'Bob', 'age': '25', 'city': 'Austin'},
 {'name': 'Zoë', 'age': '28', 'city': 'Zürich'}]


### 4.4 Write CSV (from dicts) with `csv.DictWriter`


In [6]:
csv_out = 'people_out.csv'
fieldnames = ['name', 'age', 'city']

with open(csv_out, 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for p in people:
        writer.writerow(p)
