# A Practical Course on Handling Binary Files in Python

Welcome! So far, you've likely worked with **text files** (`.txt`, `.csv`, `.json`), which are human-readable. This notebook dives into the world of **binary files**, which are designed to be read by machines.

Understanding binary files is essential for working with images, audio, video, compressed archives, compiled code, and high-performance data storage.

### What is a Binary File?

Think of it this way:
- **Text File**: A sequence of characters. Python automatically handles encoding (e.g., UTF-8) to translate characters into bytes and vice-versa. It also handles newline characters (`\n`) differently across operating systems.
- **Binary File**: A sequence of raw **bytes**. There is no encoding/decoding and no special handling of characters. What you write is exactly what gets stored, byte for byte.

| Feature | Text Files | Binary Files |
| :--- | :--- | :--- |
| **Content** | Human-readable characters (letters, numbers, symbols) | Raw bytes (numbers from 0-255) | 
| **Mode** | `'r'`, `'w'`, `'a'`, `'r+'` (often with `'t'`) | `'rb'`, `'wb'`, `'ab'`, `'rb+'` (must include `'b'`) |
| **Python Type**| `str` | `bytes` |
| **Use Cases** | Code, configuration, CSV, JSON, logs | Images, audio, executables, compressed files, `pickle` |

### Table of Contents
1. [The Basics: Reading and Writing in Binary Mode](#basics)
2. [Use Case 1: Copying an Image File](#images)
3. [Use Case 2: Storing Structured Data with `struct`](#struct)
4. [Use Case 3: Saving Python Objects with `pickle`](#pickle)
5. [In-Memory Binary Data with `io.BytesIO`](#bytesio)

<a id='basics'></a>
## 1. The Basics: Reading and Writing in Binary Mode

To work with binary files, you simply add a `'b'` to the mode string in the `open()` function.
- `'rb'`: Read Binary
- `'wb'`: Write Binary
- `'ab'`: Append Binary
- `'rb+'`: Read and Write Binary

When you read from a binary file, you get a `bytes` object, not a `str`. You can identify a bytes literal in code by the `b''` prefix.

In [None]:
# Adding Emoji to a normal text file
# This script reads a text file, replaces certain words with emojis, and writes the result to a new file.

info = "My name is Shuvam and I am a Python Developer. 😀"

with open("input.txt", "wb") as file:
    file.write(f"{info}\n")

with open("input.txt", "rb") as file:
    content = file.read()
    print(content)
    
    
   

TypeError: a bytes-like object is required, not 'str'

In [8]:
info = "😀"

with open("input.txt", "w") as file:
    file.write(f"{info}\n")

with open("input.txt", "r") as file:
    content = file.read()
    print(content) 

😀



In [None]:
# Let's write some raw bytes to a file
byte_data = b'\x48\x65\x6c\x6c\x6f' # This is 'Hello' in ASCII hexadecimal bytes

with open('data.bin', 'wb') as f:
    f.write(byte_data)

print("Wrote 5 bytes to data.bin")

# Now, let's read them back
with open('data.bin', 'rb') as f:
    read_bytes = f.read()

print(f"\nRead back: {read_bytes}")
print(f"Type of data read: {type(read_bytes)}")

# You can decode bytes back into a string if you know the encoding
decoded_string = read_bytes.decode('utf-8', errors='ignore')
print(f"Decoded string: {decoded_string}")

Wrote 5 bytes to data.bin

Read back: b'Hello'
Type of data read: <class 'bytes'>
Decoded string: Hello


<a id='images'></a>
## 2. Use Case 1: Copying an Image File

Images are a perfect example of binary files. You can't open a `.jpg` or `.png` in a text editor and make sense of it. To work with an image, we can read its raw bytes and write them to a new file, effectively creating a perfect copy.

First, we'll use the `Pillow` library to create a sample image to work with.

In [15]:
!pip install pillow



In [18]:
# You may need to install Pillow: pip install Pillow
from PIL import Image
import os

# Create a simple 100x100 red image
try:
    img = Image.new('RGB', (100, 100), color = 'red')
    img.save('original_image.png')
    print("Created 'original_image.png'")

    # --- The Binary File Handling Part ---
    # 1. Read all bytes from the original image file
    with open('original_image.png', 'rb') as original_file:
        image_bytes = original_file.read()

    print(f"Read {len(image_bytes)} bytes from the original image.")

    # 2. Write those exact same bytes to a new file
    with open('copied_image.png', 'wb') as new_file:
        new_file.write(image_bytes)

    print("Wrote bytes to 'copied_image.png', creating a perfect copy.")

finally:
    # # Clean up the created files
    # if os.path.exists('original_image.png'): os.remove('original_image.png')
    # if os.path.exists('copied_image.png'): os.remove('copied_image.png')
    print("\nCleaned up image files.")

Created 'original_image.png'
Read 287 bytes from the original image.
Wrote bytes to 'copied_image.png', creating a perfect copy.

Cleaned up image files.


<a id='struct'></a>
## 3. Use Case 2: Storing Structured Data with `struct`

How do you save numbers like integers and floats in a compact, efficient, binary way? If you write `'123'` to a text file, it takes 3 bytes. A 4-byte integer can store numbers up to ~2 billion.

The `struct` module is Python's way of converting between Python values and C structs represented as Python `bytes` objects. It lets you "pack" data into a compact binary format according to a format string.

**Common Format Characters:**
- `i`: integer (4 bytes)
- `f`: float (4 bytes)
- `d`: double (8 bytes)
- `q`: long long integer (8 bytes)
- `s`: bytes (e.g., `4s` means 4 bytes)
- `>`: big-endian byte order (network standard)

In [None]:
import struct

# Let's pack an integer, a float, and a 4-char string
record_id = 101
temperature = 98.6
sensor_code = b'TEMP' # Must be bytes

# Format: > (big-endian), i (integer), f (float), 4s (4 bytes)
record_format = '>if4s'

# 1. Pack the data into a bytes object
packed_data = struct.pack(record_format, record_id, temperature, sensor_code)

print(f"Packed data: {packed_data}")
print(f"Size of packed data: {len(packed_data)} bytes") # 4 (int) + 4 (float) + 4 (string) = 12 bytes

# 2. Write the packed data to a binary file
with open('sensor_reading.dat', 'wb') as f:
    f.write(packed_data)

# 3. Read the data back
with open('sensor_reading.dat', 'rb') as f:
    read_data = f.read()

# 4. Unpack the bytes back into Python objects
unpacked_data = struct.unpack(record_format, read_data)

print(f"\nUnpacked data: {unpacked_data}")

<a id='pickle'></a>
## 4. Use Case 3: Saving Python Objects with `pickle`

The `pickle` module provides an even higher-level way to handle binary data. It **serializes** entire Python objects (like dictionaries, lists, or custom classes) into a byte stream. This is incredibly convenient for saving program state.

> **Security Warning:** `pickle` is not secure. Never unpickle data from an untrusted source, as it can be forced to execute arbitrary code.

In [21]:
import pickle

my_config = {
    'api_key': 'xyz-123-abc',
    'retries': 3,
    'endpoints': ['/api/v1', '/api/v2']
}

# Save the dictionary to a pickle file using 'wb'
with open('config.pkl', 'wb') as f:
    pickle.dump(my_config, f)

print("Saved config dictionary to config.pkl")

# Load the object back from the file using 'rb'
with open('config.pkl', 'rb') as f:
    loaded_config = pickle.load(f)

print(f"\nLoaded config: {loaded_config}")
print(f"Type of loaded object: {type(loaded_config)}")

Saved config dictionary to config.pkl

Loaded config: {'api_key': 'xyz-123-abc', 'retries': 3, 'endpoints': ['/api/v1', '/api/v2']}
Type of loaded object: <class 'dict'>


<a id='bytesio'></a>
## 5. In-Memory Binary Data with `io.BytesIO`

Sometimes, you have binary data in memory (e.g., downloaded from an API) and need to pass it to a library that expects a file. `io.BytesIO` creates an **in-memory binary stream** (a file-like object) that you can use for this.

This avoids the need to save the data to a temporary file on disk.

In [None]:
import io

# Imagine this byte string came from a network request
# We'll re-use the packed data from our struct example
in_memory_data = packed_data 

# Create an in-memory binary file from the bytes
memory_file = io.BytesIO(in_memory_data)

# Now we can .read() from it just like a real file
read_from_memory = memory_file.read()

# We can pass this 'memory_file' object to any function that expects a file opened in 'rb' mode
unpacked_from_memory = struct.unpack(record_format, read_from_memory)

print(f"Data read from memory file: {read_from_memory}")
print(f"Unpacked from memory file: {unpacked_from_memory}")

## Conclusion

You now have a solid foundation for working with binary files in Python!

**Key Takeaways:**
- **Use `'b'` in the mode** (`'rb'`, `'wb'`) to enter binary mode.
- You will work with `bytes` objects, not `str` objects.
- For simple byte-for-byte operations like copying files, just `read()` and `write()`.
- For high-performance, compact storage of numerical data, use the **`struct`** module.
- For conveniently saving and loading entire Python objects, use the **`pickle`** module (with caution!).
- To treat in-memory bytes as a file, use **`io.BytesIO`**.

This knowledge opens the door to a huge range of applications that go far beyond simple text processing.