# 🗄️ Mastering File Handling & OS Interactions in Python

**Welcome!** This notebook is your guide to effectively interacting with the file system and operating system using Python. We'll cover everything from basic file reading/writing to modern path manipulation with `pathlib`, essential `os` module functions, and handling common data formats like CSV, JSON, and XML.

**Target Audience:** Python developers needing to read, write, manipulate files, and interact with the underlying operating system environment.

**Learning Objectives:**
*   Understand fundamental file I/O operations using `open()` and context managers (`with`).
*   Master modern, object-oriented path manipulation using the `pathlib` module.
*   Learn key functions from the `os` module for environment variables, process interaction, and directory management.
*   Effectively read and write structured data in CSV, JSON, and XML formats.
*   Implement robust error handling for file operations.
*   Understand best practices for cross-platform compatibility, security, and performance.
*   Explore advanced topics like temporary files and high-level file operations (`shutil`).
*   Identify common pitfalls and prepare for related interview questions.

## 1. Introduction: Python as Your Digital Filing Assistant

Many applications need to interact with files and the operating system. Whether it's reading configuration, writing logs, processing user data, managing datasets, or organizing project files, Python provides powerful tools to handle these tasks.

**Analogy: The Digital Filing Cabinet**

Think of your computer's file system as a complex filing cabinet. 
*   `open()` is like taking out a specific file to read or write in it.
*   `pathlib` is like having an intelligent assistant who understands the cabinet's structure (folders, files), can find items easily, label them, move them, and tell you properties about them (is it a folder? when was it created?).
*   The `os` module gives you tools to interact with the cabinet room itself – checking the environment (temperature/environment variables), managing who's working (`os.getpid`), or even running external tools (`subprocess`).
*   Handling CSV, JSON, or XML is like knowing how to read and write specific *types* of documents within the files (spreadsheets, structured notes, tagged documents).

Mastering these tools allows you to automate tasks, manage data efficiently, and build more sophisticated applications.

## 2. Basic File I/O: The `open()` Function and Context Managers

The built-in `open()` function is the fundamental way to interact with individual files.

`open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)`

*   `file`: Path to the file (string or `pathlib.Path` object).
*   `mode`: A string indicating how the file is to be opened.
*   `encoding`: **Crucial!** The encoding used to decode/encode the file (e.g., `'utf-8'`, `'cp1252'`). Use `locale.getpreferredencoding(False)` for the system default, but **explicit `'utf-8'` is strongly recommended** for cross-platform compatibility and modern text handling.
*   `errors`: How encoding/decoding errors should be handled ('strict', 'ignore', 'replace').

### 2.1 File Modes

| Mode | Description                                                     | Behavior if File Exists | Behavior if File Doesn't Exist |
| :--- | :-------------------------------------------------------------- | :---------------------- | :----------------------------- |
| `'r'` | **Read** (default).                                             | Read from start         | `FileNotFoundError`            |
| `'w'` | **Write**. Truncates file to zero length or creates new file. | **Overwrites**          | Creates new file               |
| `'a'` | **Append**. Writes to end of file or creates new file.        | Appends to end          | Creates new file               |
| `'x'` | **Exclusive Creation**. Writes only if file does not exist.     | `FileExistsError`       | Creates new file               |
| `'b'` | **Binary mode**. Append to mode (e.g., `'rb'`, `'wb'`).         | N/A                     | N/A                            |
| `'t'` | **Text mode** (default). Append to mode (e.g., `'rt'`, `'wt'`).| N/A                     | N/A                            |
| `'+'` | **Update** (Reading and Writing). Append (e.g.,`'r+'`,`'w+'`).| Varies (see below)      | Varies (see below)             |

*   `'r+'`: Read/Write. Doesn't truncate. `FileNotFoundError` if not exists.
*   `'w+'`: Write/Read. **Overwrites** or creates. 
*   `'a+'`: Append/Read. Appends or creates. Reading starts at beginning, writing at end.
*   `'x+'`: Exclusive Create/Read/Write. `FileExistsError` if exists.


### 2.2 The `with` Statement (Context Manager) - **The Right Way**

**Best Practice:** Always use the `with` statement when working with files. It ensures the file is automatically closed, even if errors occur within the block. This prevents resource leaks.


In [1]:
import logging
import locale

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s', force=True)

# Determine preferred encoding, but default to UTF-8
preferred_encoding = locale.getpreferredencoding(False)
print(f"System preferred encoding: {preferred_encoding}")
# For consistency and broader compatibility, we'll explicitly use UTF-8
FILE_ENCODING = 'utf-8'

file_path = "my_sample_file.txt"

# --- Writing to a file ('w' mode - overwrites or creates) --- 
try:
    # 'with' ensures file is closed automatically
    with open(file_path, mode='w', encoding=FILE_ENCODING) as f:
        f.write("Hello from Python!\n")
        f.write("This is the second line. Special chars: äöüß €\n")
        lines_to_write = ["Third line\n", "Fourth line\n"]
        f.writelines(lines_to_write) # Writes a list of strings
    logging.info(f"Successfully wrote to {file_path}")
except IOError as e:
    logging.error(f"Error writing to {file_path}: {e}")

# --- Appending to a file ('a' mode) --- 
try:
    with open(file_path, mode='a', encoding=FILE_ENCODING) as f:
        f.write("This line was appended.\n")
    logging.info(f"Successfully appended to {file_path}")
except IOError as e:
    logging.error(f"Error appending to {file_path}: {e}")

# --- Reading from a file ('r' mode) --- 
try:
    with open(file_path, mode='r', encoding=FILE_ENCODING) as f:
        print(f"\n--- Reading {file_path} --- ")
        
        # Method 1: Read the whole file into a string
        # content = f.read()
        # print("Full content:\n", content)
        
        # Method 2: Read line by line (memory efficient for large files)
        print("Content line by line:")
        # f.seek(0) # Go back to start if you already read()
        # for line in f:
        #     print(f"  Line: {line.strip()}") # strip() removes leading/trailing whitespace/newline
        
        # Method 3: Read all lines into a list
        f.seek(0) # Go back to start
        all_lines = f.readlines()
        print(f"Content as list of lines: {all_lines}")
        
        # Method 4: Read specific number of characters
        f.seek(0) # Go back to start
        first_chars = f.read(10) 
        print(f"First 10 chars: '{first_chars}'")
        
except FileNotFoundError:
    logging.error(f"Error: File {file_path} not found.")
except IOError as e:
    logging.error(f"Error reading {file_path}: {e}")

# --- Reading/Writing Binary Files ('rb', 'wb') --- 
binary_file_path = "my_binary_file.bin"
try:
    some_bytes = b'\x00\x01\x02\xFF\xFEHello binary!' # Note the b'' prefix
    with open(binary_file_path, mode='wb') as bf: # No encoding for binary
        bf.write(some_bytes)
    logging.info(f"Successfully wrote binary data to {binary_file_path}")
    
    with open(binary_file_path, mode='rb') as bf:
        read_bytes = bf.read()
        print(f"\nRead binary data: {read_bytes}")
        assert read_bytes == some_bytes # Verify data integrity

except IOError as e:
    logging.error(f"Error with binary file {binary_file_path}: {e}")

INFO: Successfully wrote to my_sample_file.txt
INFO: Successfully appended to my_sample_file.txt
INFO: Successfully wrote binary data to my_binary_file.bin


System preferred encoding: UTF-8

--- Reading my_sample_file.txt --- 
Content line by line:
Content as list of lines: ['Hello from Python!\n', 'This is the second line. Special chars: äöüß €\n', 'Third line\n', 'Fourth line\n', 'This line was appended.\n']
First 10 chars: 'Hello from'

Read binary data: b'\x00\x01\x02\xff\xfeHello binary!'


**Pitfalls with `open()`:**
*   **Forgetting `with`:** Leading to resource leaks if files aren't closed, especially if errors occur.
*   **Incorrect Mode:** Using `'r'` when intending to write, or `'w'` when intending to append, leading to errors or data loss.
*   **Encoding Issues:** Not specifying `encoding='utf-8'` (or the correct encoding) can lead to `UnicodeDecodeError`, `UnicodeEncodeError`, or silently corrupted data, especially across different operating systems.
*   **Not Handling `FileNotFoundError`:** Trying to read a file that doesn't exist without a `try...except` block will crash the program.

## 3. Modern Path Handling: The `pathlib` Module

Introduced in Python 3.4, `pathlib` offers an **object-oriented** way to represent and interact with filesystem paths, making code cleaner, more readable, and less error-prone than traditional string manipulation with `os.path`.

**Key Advantages:**
*   **Object-Oriented:** Paths are objects with methods and attributes, not just strings.
*   **Cross-Platform:** Handles path separator differences ( `/` vs `\`) automatically.
*   **Readability:** Operations like joining paths or getting parent directories are more intuitive.
*   **Integrated Operations:** Many common file operations (checking existence, reading/writing basic files, iterating directories) are built into the `Path` object.

**Recommendation:** Use `pathlib` for all new code involving path manipulation.

In [2]:
from pathlib import Path
import os # Import os for comparison/specific tasks

# --- Creating Path Objects --- 
# Current working directory
cwd = Path.cwd()
print(f"Current Directory (cwd): {cwd}")

# Home directory
home = Path.home()
print(f"Home Directory: {home}")

# Creating paths from strings (automatically handles separators)
file_path_obj = Path("./my_sample_file.txt") # Relative path
config_dir = Path("/etc/my_app/config") # Absolute path (example)
log_dir = Path("logs")
log_file = log_dir / "app.log" # Intuitive joining using '/'
print(f"File Path Object: {file_path_obj}")
print(f"Log File Path: {log_file}")

# --- Path Components & Properties --- 
print(f"\n--- Path Components for: {log_file} ---")
print(f"Parent directory: {log_file.parent}")
print(f"Filename: {log_file.name}")
print(f"Stem (name without suffix): {log_file.stem}")
print(f"Suffix (extension): {log_file.suffix}")
print(f"Is absolute path? {log_file.is_absolute()}")

# Get absolute path
print(f"Absolute path of log_file: {log_file.resolve()}")

# --- Checking Existence and Type --- 
print(f"\n--- Existence Checks ---")
print(f"Does '{file_path_obj}' exist? {file_path_obj.exists()}")
print(f"Is '{file_path_obj}' a file? {file_path_obj.is_file()}")
print(f"Is '{file_path_obj}' a directory? {file_path_obj.is_dir()}")

non_existent = Path("no_such_file_here.dat")
print(f"Does '{non_existent}' exist? {non_existent.exists()}")

# --- Creating Files and Directories --- 
print("\n--- Creating and Deleting (Use with caution!) ---")
new_dir = Path("my_temp_dir/subdir")
new_file = new_dir / "temp_file.txt"

try:
    # Create directories recursively (like os.makedirs)
    new_dir.mkdir(parents=True, exist_ok=True) # exist_ok=True prevents error if dir exists
    print(f"Directory '{new_dir}' created or already exists.")
    
    # Create an empty file (like 'touch' command)
    if not new_file.exists():
        new_file.touch()
        print(f"File '{new_file}' created.")
    else:
        print(f"File '{new_file}' already exists.")
        
    # --- Basic File Read/Write with pathlib --- 
    # Simple text writing (overwrites)
    bytes_written = new_file.write_text("Content written by pathlib!", encoding=FILE_ENCODING)
    print(f"Wrote {bytes_written} bytes to {new_file}")
    
    # Simple text reading
    content = new_file.read_text(encoding=FILE_ENCODING)
    print(f"Read from {new_file}: '{content}'")
    
    # Simple binary writing/reading
    new_file.write_bytes(b'\xCA\xFE\xBA\xBE')
    binary_content = new_file.read_bytes()
    print(f"Read binary from {new_file}: {binary_content}")
    
    # --- Deleting Files and Directories --- 
    new_file.unlink() # Delete the file (remove link)
    print(f"File '{new_file}' deleted.")
    new_file.unlink(missing_ok=True) # missing_ok=True prevents error if already gone
    
    # Delete empty directories (must be empty)
    new_dir.rmdir()
    print(f"Directory '{new_dir}' deleted.")
    # To remove parent as well if empty:
    new_dir.parent.rmdir()
    print(f"Directory '{new_dir.parent}' deleted.")
    
except OSError as e:
    print(f"Error during file/dir operation: {e}")

# --- Iterating Directory Contents --- 
print(f"\n--- Iterating CWD ({cwd}) ---")
for item in cwd.iterdir():
    item_type = "Dir" if item.is_dir() else "File" if item.is_file() else "Other"
    # print(f"  [{item_type}] {item.name}")
    # Pass on printing all CWD items for brevity in example output

# --- Globbing (Pattern Matching) --- 
print(f"\n--- Globbing for '.txt' files in CWD ({cwd}) ---")
txt_files = list(cwd.glob('*.txt'))
print(f"Found .txt files: {txt_files}")

print(f"\n--- Recursive Globbing for '*.py' files starting from CWD's parent ---")
# rglob() searches recursively
# py_files = list(cwd.parent.rglob('*.py')) 
# print(f"Found .py files recursively: {py_files}") # Might find many files!
print("(Skipping recursive glob output for brevity)")

Current Directory (cwd): /mnt/Study/Python/THEORY/2-Advance
Home Directory: /home/ansh
File Path Object: my_sample_file.txt
Log File Path: logs/app.log

--- Path Components for: logs/app.log ---
Parent directory: logs
Filename: app.log
Stem (name without suffix): app
Suffix (extension): .log
Is absolute path? False
Absolute path of log_file: /mnt/Study/Python/THEORY/2-Advance/logs/app.log

--- Existence Checks ---
Does 'my_sample_file.txt' exist? True
Is 'my_sample_file.txt' a file? True
Is 'my_sample_file.txt' a directory? False
Does 'no_such_file_here.dat' exist? False

--- Creating and Deleting (Use with caution!) ---
Directory 'my_temp_dir/subdir' created or already exists.
File 'my_temp_dir/subdir/temp_file.txt' created.
Wrote 27 bytes to my_temp_dir/subdir/temp_file.txt
Read from my_temp_dir/subdir/temp_file.txt: 'Content written by pathlib!'
Read binary from my_temp_dir/subdir/temp_file.txt: b'\xca\xfe\xba\xbe'
File 'my_temp_dir/subdir/temp_file.txt' deleted.
Directory 'my_temp_

## 4. OS Module Interactions (`os`)

While `pathlib` handles *path manipulation* beautifully, the `os` module provides lower-level access to operating system functionality, including:
*   Environment variables
*   Process information and management (though `subprocess` is often preferred for running external commands)
*   User/Group information (less common in general scripts)
*   Some directory operations (often have `pathlib` equivalents)

### 4.1 Environment Variables

In [3]:
import os

# Get all environment variables (returns a dict-like object)
environment_vars = os.environ
# print("--- All Environment Variables ---")
# for key, value in environment_vars.items():
#     print(f"  {key}={value}")
print("(Skipping printing all env vars for brevity)")

# Get a specific environment variable (case-sensitive, raises KeyError if not found)
try:
    user_home_os = os.environ['HOME'] # Common on Linux/macOS
    # user_home_os = os.environ['USERPROFILE'] # Common on Windows
    print(f"\nUser Home (from os.environ): {user_home_os}")
except KeyError:
    print("\nCould not find standard home directory variable in os.environ")

# Get a specific environment variable safely (returns None or default if not found)
api_key = os.getenv('MY_APP_API_KEY')
db_host = os.getenv('DB_HOST', 'localhost') # Provide a default value

print(f"API Key (from os.getenv): {api_key}") # Likely None unless set externally
print(f"DB Host (from os.getenv): {db_host}")

# Set an environment variable (for the current process and its children)
# os.environ['MY_TEMP_VAR'] = 'my_value'
# print(f"MY_TEMP_VAR: {os.getenv('MY_TEMP_VAR')}")
# del os.environ['MY_TEMP_VAR'] # Unset it

(Skipping printing all env vars for brevity)

User Home (from os.environ): /home/ansh
API Key (from os.getenv): None
DB Host (from os.getenv): localhost


### 4.2 Running External Commands (`subprocess` - Preferred over `os.system`)

While `os.system("command")` exists, it's generally **discouraged** due to security risks (shell injection) and lack of control over input/output.

**Best Practice:** Use the `subprocess` module for running external commands. It's more secure and flexible.

In [4]:
import subprocess
import sys

# Determine the command based on OS
list_command = "dir" if sys.platform == "win32" else "ls -lha"

print(f"\n--- Running '{list_command}' using subprocess --- ")
try:
    # Run command, capture output, check for errors
    # text=True decodes output as text using default encoding
    # shell=True can be a security risk if command includes untrusted input!
    # Better to pass command and args as a list: ['ls', '-lha'] when shell=False
    process = subprocess.run(list_command, 
                             capture_output=True, 
                             text=True, 
                             check=True, # Raises CalledProcessError if command returns non-zero exit code
                             shell=True, # Use with caution 
                             encoding=locale.getpreferredencoding(False))
    
    print(f"Command executed successfully (Return Code: {process.returncode})")
    print("Output:")
    print(process.stdout[:500] + "...") # Print first 500 chars of output

except FileNotFoundError as e:
    # If the command itself isn't found
    print(f"Error: Command not found: {e}")
except subprocess.CalledProcessError as e:
    # If the command returns an error exit code
    print(f"Error: Command '{e.cmd}' failed with return code {e.returncode}")
    print(f"Stderr:\n{e.stderr}")
except Exception as e:
    print(f"An unexpected error occurred running subprocess: {e}")

# Example without shell=True (safer)
list_command_parts = ["ls", "-lha"] if sys.platform != "win32" else ["cmd", "/c", "dir"] 
print(f"\n--- Running '{' '.join(list_command_parts)}' using subprocess (shell=False) --- ")
try:
    process_safe = subprocess.run(list_command_parts,
                                capture_output=True,
                                text=True,
                                check=True,
                                encoding=locale.getpreferredencoding(False))
    print("Command executed successfully.")
    # print(process_safe.stdout[:500] + "...")
except (FileNotFoundError, subprocess.CalledProcessError, Exception) as e:
    print(f"Error running command safely: {e}")



--- Running 'ls -lha' using subprocess --- 
Command executed successfully (Return Code: 0)
Output:
total 750K
drwxrwxrwx 1 root root 4.0K Apr 20 16:33 .
drwxrwxrwx 1 root root  12K Apr 19 16:56 ..
-rwxrwxrwx 1 root root  74K Apr 20 16:13 01-OOPS.ipynb
-rwxrwxrwx 1 root root  47K Apr 20 16:13 02-Logging.ipynb
-rwxrwxrwx 1 root root  37K Apr 20 16:13 03-Generators.ipynb
-rwxrwxrwx 1 root root  56K Apr 20 16:14 04-Decoratos.ipynb
-rwxrwxrwx 1 root root  46K Apr 20 16:14 05-Exceptions.ipynb
-rwxrwxrwx 1 root root  13K Apr 19 14:27 06-Concurrency.ipynb
-rwxrwxrwx 1 root root  11K Apr 20 16:16 06-C...

--- Running 'ls -lha' using subprocess (shell=False) --- 
Command executed successfully.


### 4.3 Other Useful `os` Functions

*   `os.getcwd()`: Get current working directory (returns string - use `Path.cwd()` for a Path object).
*   `os.chdir(path)`: Change current working directory.
*   `os.listdir(path='.')`: List directory contents (returns list of strings - `Path.iterdir()` is often preferred).
*   `os.makedirs(name, mode=0o777, exist_ok=False)`: Recursive directory creation (like `Path.mkdir(parents=True)`).
*   `os.remove(path)` or `os.unlink(path)`: Remove/delete a file (like `Path.unlink()`).
*   `os.rmdir(path)`: Remove an *empty* directory (like `Path.rmdir()`).
*   `os.getpid()`: Get the current process ID.
*   `os.path.join(path, *paths)`: Join path components intelligently (use `Path / ...` instead).
*   `os.path.exists(path)`: Check if path exists (use `Path.exists()` instead).
*   `os.path.isfile(path)` / `os.path.isdir(path)`: Check type (use `Path.is_file()` / `Path.is_dir()` instead).

**Takeaway:** While `pathlib` replaces many `os.path` functions, `os` itself is still needed for environment variables, process interactions, and sometimes lower-level file descriptor operations.

## 5. Working with Common File Formats

Beyond plain text, you'll often work with structured data formats.

### 5.1 CSV (Comma Separated Values)

Use the built-in `csv` module for basic reading and writing.
**Note:** For complex CSV manipulation, filtering, and analysis, the `pandas` library is highly recommended.

In [5]:
import csv
from pathlib import Path

csv_file_path = Path("data.csv")
dict_csv_file_path = Path("data_dict.csv")

# --- Writing CSV Data (List of Lists) --- 
data_to_write = [
    ["Name", "Department", "Salary"],
    ["Alice", "Engineering", 80000],
    ["Bob", "Sales", 75000],
    ["Charlie", "Engineering", 90000]
]

try:
    # newline='' is important to prevent extra blank rows on some OS
    with csv_file_path.open(mode='w', newline='', encoding=FILE_ENCODING) as f:
        writer = csv.writer(f)
        writer.writerows(data_to_write) # Write all rows at once
        # Or write row by row: 
        # writer.writerow(["David", "HR", 60000])
    logging.info(f"Successfully wrote list data to {csv_file_path}")
except IOError as e:
    logging.error(f"Error writing CSV: {e}")

# --- Reading CSV Data (as Lists) --- 
try:
    with csv_file_path.open(mode='r', newline='', encoding=FILE_ENCODING) as f:
        reader = csv.reader(f)
        print(f"\n--- Reading {csv_file_path} (as lists) ---")
        header = next(reader) # Read the header row
        print(f"Header: {header}")
        for i, row in enumerate(reader):
            # Row is a list of strings
            print(f"  Row {i+1}: {row}") 
            # Example: Accessing data
            # name = row[0]
            # salary = int(row[2]) # Remember data is read as strings
except (IOError, StopIteration) as e:
    logging.error(f"Error reading CSV: {e}")

# --- Writing CSV Data (List of Dictionaries) --- 
dict_data_to_write = [
    {'ID': 1, 'Product': 'Laptop', 'Price': 1200.50},
    {'ID': 2, 'Product': 'Mouse', 'Price': 25.99},
    {'ID': 3, 'Product': 'Keyboard', 'Price': 75.00}
]
fieldnames = ['ID', 'Product', 'Price'] # Must specify fieldnames

try:
    with dict_csv_file_path.open(mode='w', newline='', encoding=FILE_ENCODING) as f:
        # Use DictWriter for writing lists of dictionaries
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        
        writer.writeheader() # Write the header row from fieldnames
        writer.writerows(dict_data_to_write) # Write all dicts
        # Or write single dict: writer.writerow({'ID': 4, ...})
    logging.info(f"Successfully wrote dict data to {dict_csv_file_path}")
except IOError as e:
    logging.error(f"Error writing Dict CSV: {e}")

# --- Reading CSV Data (as Dictionaries) --- 
try:
    with dict_csv_file_path.open(mode='r', newline='', encoding=FILE_ENCODING) as f:
        # Use DictReader for reading rows as dictionaries
        reader = csv.DictReader(f)
        print(f"\n--- Reading {dict_csv_file_path} (as dicts) ---")
        print(f"Fieldnames detected: {reader.fieldnames}")
        for i, row_dict in enumerate(reader):
            # row_dict is an OrderedDict or dict (depending on Python version)
            print(f"  Row {i+1}: {row_dict}")
            # Example: Accessing data by key
            # product_name = row_dict['Product']
            # price = float(row_dict['Price']) # Data still read as strings
except IOError as e:
     logging.error(f"Error reading Dict CSV: {e}")


INFO: Successfully wrote list data to data.csv
INFO: Successfully wrote dict data to data_dict.csv



--- Reading data.csv (as lists) ---
Header: ['Name', 'Department', 'Salary']
  Row 1: ['Alice', 'Engineering', '80000']
  Row 2: ['Bob', 'Sales', '75000']
  Row 3: ['Charlie', 'Engineering', '90000']

--- Reading data_dict.csv (as dicts) ---
Fieldnames detected: ['ID', 'Product', 'Price']
  Row 1: {'ID': '1', 'Product': 'Laptop', 'Price': '1200.5'}
  Row 2: {'ID': '2', 'Product': 'Mouse', 'Price': '25.99'}
  Row 3: {'ID': '3', 'Product': 'Keyboard', 'Price': '75.0'}


### 5.2 JSON (JavaScript Object Notation)

Use the built-in `json` module. (Assumes familiarity from the dedicated JSON notebook - focuses here on file I/O aspects).

In [6]:
import json
from pathlib import Path

json_file_path = Path("data.json")
python_object = {
    "name": "Sensor Array",
    "location": {"lat": 40.7128, "lon": -74.0060},
    "readings": [
        {"timestamp": "2023-10-26T10:00:00Z", "value": 15.5, "unit": "C"},
        {"timestamp": "2023-10-26T10:05:00Z", "value": 15.8, "unit": "C"}
    ],
    "active": True,
    "calibration_factor": None
}

# --- Writing JSON to File (json.dump) --- 
try:
    with json_file_path.open(mode='w', encoding=FILE_ENCODING) as f:
        # dump serializes and writes to file object
        json.dump(python_object, f, indent=4) # indent for pretty printing
    logging.info(f"Successfully wrote JSON to {json_file_path}")
except (IOError, TypeError) as e:
    logging.error(f"Error writing JSON: {e}")

# --- Reading JSON from File (json.load) --- 
try:
    with json_file_path.open(mode='r', encoding=FILE_ENCODING) as f:
        # load reads from file object and deserializes
        loaded_data = json.load(f)
        print(f"\n--- Reading {json_file_path} --- ")
        print(f"Loaded data type: {type(loaded_data)}")
        # print(json.dumps(loaded_data, indent=2)) # Pretty print the loaded data
        print(f"Location: {loaded_data.get('location')}")
except (IOError, json.JSONDecodeError) as e:
    logging.error(f"Error reading JSON: {e}")

# For details on dumps/loads (strings) and custom objects, see the JSON notebook.

INFO: Successfully wrote JSON to data.json



--- Reading data.json --- 
Loaded data type: <class 'dict'>
Location: {'lat': 40.7128, 'lon': -74.006}


### 5.3 XML (eXtensible Markup Language)

Use the built-in `xml.etree.ElementTree` module for basic XML parsing and creation.
**Note:** For complex XML, namespaces, validation, and transformations (XPath, XSLT), the third-party `lxml` library is significantly more powerful and generally recommended.

In [7]:
import xml.etree.ElementTree as ET
from pathlib import Path
import logging

xml_file_path = Path("data.xml")

# --- Creating and Writing XML --- 
# Create root element
root = ET.Element("catalog")

# Create child elements
book1 = ET.SubElement(root, "book", isbn="978-0321765723")
title1 = ET.SubElement(book1, "title")
title1.text = "The Pragmatic Programmer"
author1 = ET.SubElement(book1, "author")
author1.text = "Andrew Hunt"

book2 = ET.SubElement(root, "book", isbn="978-0132350884")
title2 = ET.SubElement(book2, "title")
title2.text = "Clean Code"
author2 = ET.SubElement(book2, "author")
author2.text = "Robert C. Martin"

# Build the tree structure
tree = ET.ElementTree(root)

# Pretty print function (optional, for nicer output)
def indent_xml(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent_xml(subelem, level+1)
        if not subelem.tail or not subelem.tail.strip():
            subelem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

try:
    indent_xml(root) # Apply indentation
    # Write the tree to a file
    # encoding='unicode' writes as text, 'utf-8' would write as bytes
    tree.write(xml_file_path, encoding='unicode', xml_declaration=True)
    logging.info(f"Successfully wrote XML to {xml_file_path}")

    # Get XML as string
    # xml_string = ET.tostring(root, encoding='unicode')
    # print(xml_string)

except IOError as e:
    logging.error(f"Error writing XML: {e}")

# --- Parsing and Reading XML --- 
try:
    # Parse an XML file
    parsed_tree = ET.parse(xml_file_path)
    parsed_root = parsed_tree.getroot()
    
    print(f"\n--- Reading {xml_file_path} --- ")
    print(f"Root element tag: {parsed_root.tag}")
    
    # Find specific elements
    print("Books found:")
    # findall looks for direct children matching the tag
    for book in parsed_root.findall('book'): 
        isbn = book.get('isbn') # Get attribute value
        title = book.find('title').text # Find child and get its text content
        author = book.find('author').text
        print(f"  - Title: {title}, Author: {author}, ISBN: {isbn}")
        
    # Example: Find book by attribute
    clean_code_book = parsed_root.find(".//book[@isbn='978-0132350884']") # Basic XPath support
    if clean_code_book is not None:
        print(f"Found Clean Code title: {clean_code_book.find('title').text}")

except FileNotFoundError:
    logging.error(f"XML file not found: {xml_file_path}")
except ET.ParseError as e:
    logging.error(f"Error parsing XML file {xml_file_path}: {e}")
except IOError as e:
    logging.error(f"Error reading XML file {xml_file_path}: {e}")

INFO: Successfully wrote XML to data.xml



--- Reading data.xml --- 
Root element tag: catalog
Books found:
  - Title: The Pragmatic Programmer, Author: Andrew Hunt, ISBN: 978-0321765723
  - Title: Clean Code, Author: Robert C. Martin, ISBN: 978-0132350884
Found Clean Code title: Clean Code


## 6. Advanced Topics & Enterprise Considerations

### 6.1 Temporary Files and Directories (`tempfile`)
When you need temporary storage that gets cleaned up automatically.


In [8]:
import tempfile
import os

# --- Create a temporary file --- 
# NamedTemporaryFile is deleted when closed
# Use delete=False to keep the file after closing (manual cleanup needed)
try:
    with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix=".tmp", prefix="myapp_", encoding=FILE_ENCODING) as temp_file:
        temp_file_path = temp_file.name
        print(f"Created temporary file: {temp_file_path}")
        temp_file.write("Temporary data\n")
        temp_file.seek(0)
        print(f"  Content: {temp_file.read().strip()}")
    # File is now closed. If delete=False, it remains.
    print(f"Does temp file exist after close? {os.path.exists(temp_file_path)}")
    # Manual cleanup if delete=False
    if os.path.exists(temp_file_path):
       os.remove(temp_file_path)
       print(f"Manually removed temp file: {temp_file_path}")
except (IOError, OSError) as e:
    print(f"Error with temporary file: {e}")

# --- Create a temporary directory --- 
# TemporaryDirectory is automatically cleaned up when the context manager exits
try:
    with tempfile.TemporaryDirectory(suffix="_work", prefix="data_") as temp_dir_path:
        print(f"\nCreated temporary directory: {temp_dir_path}")
        # You can create files inside this directory
        temp_subfile = Path(temp_dir_path) / "results.txt"
        temp_subfile.write_text("Results data", encoding=FILE_ENCODING)
        print(f"  Created subfile: {temp_subfile}")
        print(f"  Does temp dir exist inside 'with'? {os.path.exists(temp_dir_path)}")
    # The directory and its contents are deleted upon exiting the 'with' block
    print(f"Does temp dir exist after 'with'? {os.path.exists(temp_dir_path)}") 
except (IOError, OSError) as e:
    print(f"Error with temporary directory: {e}")

Created temporary file: /tmp/myapp_7nkys_05.tmp
  Content: Temporary data
Does temp file exist after close? True
Manually removed temp file: /tmp/myapp_7nkys_05.tmp

Created temporary directory: /tmp/data_trwhuihr_work
  Created subfile: /tmp/data_trwhuihr_work/results.txt
  Does temp dir exist inside 'with'? True
Does temp dir exist after 'with'? False


### 6.2 High-Level File Operations (`shutil`)
For operations like copying files/directories, moving, or deleting entire directory trees.

In [9]:
import shutil
from pathlib import Path

# Setup: Create source file and destination directory
source_file = Path("source_to_copy.txt")
dest_dir = Path("destination_dir")
dest_file = dest_dir / source_file.name
source_dir = Path("source_dir_to_copy")
source_dir_subfile = source_dir / "subfile.txt"
dest_dir_tree = Path("dest_dir_tree")

try:
    source_file.write_text("Data to be copied.", encoding=FILE_ENCODING)
    dest_dir.mkdir(exist_ok=True)
    source_dir.mkdir(exist_ok=True)
    source_dir_subfile.write_text("Subfile data", encoding=FILE_ENCODING)

    print("\n--- shutil Operations ---")
    # Copy file (src, dst) - dst can be file or dir
    shutil.copy(source_file, dest_dir)
    print(f"Copied {source_file} to {dest_dir}. Checksum: {dest_file.exists()}")
    
    # Copy file preserving metadata (copy2)
    # shutil.copy2(source_file, dest_dir / "copied_meta.txt")
    
    # Copy directory tree
    if dest_dir_tree.exists(): # Remove existing dest if needed for copytree
        shutil.rmtree(dest_dir_tree)
    shutil.copytree(source_dir, dest_dir_tree)
    print(f"Copied directory tree {source_dir} to {dest_dir_tree}. Checksum: {(dest_dir_tree / source_dir_subfile.name).exists()}")

    # Move file or directory
    move_dest = dest_dir / "moved_file.txt"
    shutil.move(dest_file, move_dest)
    print(f"Moved {dest_file} to {move_dest}. Checksum: {move_dest.exists()}")

    # Remove directory tree (use with extreme caution!)
    # shutil.rmtree(dest_dir_tree)
    # print(f"Removed directory tree {dest_dir_tree}")

except (IOError, OSError, shutil.Error) as e:
    print(f"Error during shutil operation: {e}")
finally:
    # Cleanup demo files/dirs
    source_file.unlink(missing_ok=True)
    move_dest.unlink(missing_ok=True)
    if dest_dir_tree.exists(): shutil.rmtree(dest_dir_tree)
    source_dir_subfile.unlink(missing_ok=True)
    source_dir.rmdir()
    dest_dir.rmdir()


--- shutil Operations ---
Copied source_to_copy.txt to destination_dir. Checksum: True
Copied directory tree source_dir_to_copy to dest_dir_tree. Checksum: True
Moved destination_dir/source_to_copy.txt to destination_dir/moved_file.txt. Checksum: True


### 6.3 Performance Considerations
*   **Buffering:** File I/O is usually buffered by default (OS level). Reading/writing in larger chunks can sometimes be more efficient than many small operations, but the defaults are often good.
*   **Memory Usage:** Avoid reading entire large files into memory (`read()`, `readlines()`). Iterate line by line or process in chunks.
*   **`pathlib` vs `os.path`:** `pathlib` might have a small overhead for object creation, but it's negligible for most applications and the readability gain is significant.
*   **`shutil`:** Generally efficient for high-level operations as it often uses optimized OS calls.

### 6.4 Security Considerations
*   **Path Traversal:** Never construct file paths directly from untrusted user input (e.g., web requests). An attacker could provide input like `../../../etc/passwd` to access sensitive files. Sanitize and validate input, or use safer methods like generating unique IDs for user files stored in a designated base directory.
*   **Permissions:** Be mindful of file permissions when creating or modifying files, especially in multi-user environments. Use `os.chmod` if needed.
*   **Running Commands:** Be extremely careful when running external commands (`subprocess`). Avoid `shell=True` with untrusted input. Sanitize arguments passed to commands.
*   **Race Conditions:** In concurrent environments, multiple processes/threads accessing the same file can lead to race conditions (e.g., checking `exists()` then trying to `open()` might fail if another process deletes the file in between). Use file locking mechanisms or atomic operations where necessary.

### 6.5 Idempotency
*   Ensure file operations can be repeated safely without unintended side effects (e.g., using `Path.mkdir(exist_ok=True)`, checking existence before writing if overwriting is not desired).

## 7. Pitfalls and Common Interview Questions

**Common Pitfalls:**

*   **Not using `with open(...)`:** Forgetting to close files, leading to resource leaks or data not being flushed.
*   **Encoding Errors:** Assuming default encoding or using the wrong one, causing `UnicodeDecodeError`/`UnicodeEncodeError` or corrupted text.
*   **Path Separator Issues:** Hardcoding `\` or `/` instead of using `pathlib` or `os.path.join` for cross-platform compatibility.
*   **Overwriting Files:** Using `'w'` mode accidentally when `'a'` (append) or existence checks were needed.
*   **Path Traversal Vulnerabilities:** Building paths from unsanitized user input.
*   **Permissions Errors:** Trying to read/write files without sufficient OS permissions.
*   **Forgetting `newline=''` for `csv`:** Resulting in extra blank rows in the CSV file on Windows.
*   **Reading Large Files into Memory:** Causing performance issues or crashes.
*   **Race Conditions:** In concurrent access scenarios without proper locking.

**Common Interview Questions:**

1.  How do you open a file in Python? What is the importance of the `with` statement?
2.  Explain the different file modes (`r`, `w`, `a`, `b`, `+`).
3.  Why is specifying file encoding (like `utf-8`) important?
4.  What is `pathlib` and why is it preferred over `os.path`?
5.  Show how to join path components using `pathlib`.
6.  How do you check if a file or directory exists using `pathlib`?
7.  How can you list the contents of a directory?
8.  How do you read/write a CSV file in Python?
9.  How do you read/write a JSON file?
10. How do you run an external command safely in Python? (Mention `subprocess`). Why avoid `os.system`?
11. How do you get the value of an environment variable?
12. What are some security considerations when working with files and paths?
13. How would you handle potential `FileNotFoundError` when opening a file?
14. How can you copy or move files/directories using Python?

## 8. Mini-Project: File Organizer

**Goal:** Create a script that organizes files in a specified directory into subdirectories based on their file extension.

**Tasks:**

1.  **Input:** The script should take a source directory path as input (e.g., via command-line argument using `argparse`, or just hardcoded for simplicity in the notebook).
2.  **Logic:**
    *   Use `pathlib` to represent the source directory.
    *   Iterate through all *files* (not directories) directly within the source directory (`Path.iterdir()` combined with `is_file()`).
    *   For each file, get its extension (e.g., `.txt`, `.jpg`, `.pdf`). Handle files with no extension (e.g., place them in a `no_extension` folder or skip them).
    *   Create a subdirectory named after the extension (lowercase, without the leading dot, e.g., `txt`, `jpg`, `pdf`) inside the source directory if it doesn't already exist (`Path.mkdir(exist_ok=True)`).
    *   Move the file into the corresponding subdirectory (`shutil.move` or `Path.rename`).
3.  **Error Handling:** Handle potential errors like the source directory not existing or permission issues during file moving/directory creation (use `try...except`).
4.  **Logging:** Add logging messages (INFO level) indicating which file is being moved and to which directory, and log any errors encountered.
5.  **Setup:** Before running, manually create a test directory (e.g., `organize_me`) and place a few dummy files with different extensions inside it (e.g., `report.txt`, `image.jpg`, `document.pdf`, `archive.zip`, `no_extension_file`).

**(Bonus):** Add an option to organize recursively through subdirectories.

In [10]:
# --- Solution Space for Mini-Project ---
import logging
from pathlib import Path
import shutil
import os

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

def organize_files(source_dir_path_str: str):
    """Organizes files in a source directory into subdirs by extension.

    Args:
        source_dir_path_str: The path to the directory to organize.
    """
    source_dir = Path(source_dir_path_str)

    if not source_dir.is_dir():
        logging.error(f"Source directory '{source_dir}' not found or is not a directory.")
        return

    logging.info(f"Starting organization for directory: {source_dir}")

    try:
        for item in source_dir.iterdir():
            if item.is_file(): # Process only files
                file_extension = item.suffix.lower() # Get extension (e.g., '.txt')

                if not file_extension: # Handle files with no extension
                    dest_folder_name = "no_extension"
                else:
                    # Remove leading dot for folder name
                    dest_folder_name = file_extension[1:] 
                
                destination_dir = source_dir / dest_folder_name
                
                try:
                    # Create destination directory if it doesn't exist
                    destination_dir.mkdir(exist_ok=True)
                    
                    # Move the file
                    destination_path = destination_dir / item.name
                    shutil.move(str(item), str(destination_path)) # shutil.move often needs strings
                    logging.info(f"Moved '{item.name}' to '{dest_folder_name}/'")
                    
                except OSError as e:
                    logging.error(f"Could not move file '{item.name}' or create directory '{destination_dir}': {e}")
                except shutil.Error as e:
                     logging.error(f"Error moving file '{item.name}' using shutil: {e}")

    except OSError as e:
         logging.error(f"Error iterating source directory '{source_dir}': {e}")

    logging.info(f"Organization finished for directory: {source_dir}")

# --- Setup and Run --- 
# 1. Manually create a directory named 'organize_me'
# 2. Manually create dummy files inside 'organize_me': 
#    e.g., report.txt, image.jpg, data.csv, script.py, NO_EXT
test_directory = "organize_me"

# Create directory and dummy files if they don't exist
setup_dir = Path(test_directory)
setup_dir.mkdir(exist_ok=True)
(setup_dir / "report.txt").touch()
(setup_dir / "image.jpg").touch()
(setup_dir / "document.PDF").touch() # Test case insensitivity
(setup_dir / "data.csv").touch()
(setup_dir / "NO_EXTENSION_FILE").touch()
(setup_dir / "archive.tar.gz").touch() # Test multi-part extension

# Run the organizer
organize_files(test_directory)

# Check the 'organize_me' directory afterwards!

2025-04-20 16:33:10,192 - INFO - Starting organization for directory: organize_me
2025-04-20 16:33:10,195 - INFO - Moved 'archive.tar.gz' to 'gz/'
2025-04-20 16:33:10,198 - INFO - Moved 'data.csv' to 'csv/'
2025-04-20 16:33:10,200 - INFO - Moved 'document.PDF' to 'pdf/'
2025-04-20 16:33:10,202 - INFO - Moved 'image.jpg' to 'jpg/'
2025-04-20 16:33:10,204 - INFO - Moved 'NO_EXTENSION_FILE' to 'no_extension/'
2025-04-20 16:33:10,207 - INFO - Moved 'report.txt' to 'txt/'
2025-04-20 16:33:10,208 - INFO - Organization finished for directory: organize_me


## 9. Conclusion

Python provides a rich and robust ecosystem for interacting with the file system and operating system. By leveraging the modern `pathlib` module for path manipulation, understanding the fundamentals of `open()` with context managers for file I/O, knowing key `os` functions for environment/process interactions, and utilizing specialized modules (`csv`, `json`, `xml`) for data formats, you can write clean, efficient, cross-platform, and maintainable code.

Always prioritize using `with` for resource management, specify encoding (prefer `'utf-8'`), handle exceptions gracefully, and be mindful of security implications, especially when dealing with external input or running commands. With these tools and practices, you can confidently tackle a wide range of file and OS-related tasks in your Python projects.