# Byte Analysis and Huffman Compression Testing

This notebook demonstrates how to read binary data from different file formats (.txt, .png, .wav) and test the Huffman compression algorithm.

## Setup

First, let's import the necessary modules and set up our environment.

In [2]:
import os
import sys
from PIL import Image

# Add the current directory to the path to import project modules
sys.path.append(os.path.abspath('.'))

from core.file_utils import read_binary_file, display_all_bytes
from core.huffman import huffman_encode, huffman_decode
from utils.bit_utils import bits_to_bytes

# Additional imports for handling different file types
import wave  # For WAV files

## 1. Reading Text Files

Let's start by reading a text file in binary mode to examine its bytes.

In [3]:
# Read the example text file in binary mode
text_path = os.path.join('examples', 'input_text.txt')

try:
    text_bytes = read_binary_file(text_path)
    print(f"Number of bytes in text file: {len(text_bytes)}")
    print("\nFirst 50 bytes as integers:")
    print(list(text_bytes[:50]))
    print("\nFirst 50 bytes as ASCII:")
    print(text_bytes[:50].decode('ascii'))
except FileNotFoundError:
    print(f"Text file not found at {text_path}")

Number of bytes in text file: 13

First 50 bytes as integers:
[104, 101, 108, 108, 111, 32, 104, 117, 102, 102, 109, 97, 110]

First 50 bytes as ASCII:
hello huffman


## 2. Reading PNG Files

Now let's examine a PNG file's binary data, including its header.

In [4]:
# Create a simple PNG file if it doesn't exist
png_path = os.path.join('examples', 'test.png')

if not os.path.exists(png_path):
    # Create a small colored image
    img = Image.new('RGB', (100, 100), color='red')
    img.save(png_path)
    print(f"Created test PNG file at {png_path}")

# Read the PNG file in binary mode
png_bytes = read_binary_file(png_path)

# Examine the PNG header (first 8 bytes)
png_header = png_bytes[:8]
print("PNG Header (8 bytes):")
print(f"As integers: {list(png_header)}")
print(f"As hex: {png_header.hex()}")

# Expected PNG header: 89 50 4E 47 0D 0A 1A 0A
print("\nVerifying PNG signature...")
is_png = png_header.startswith(b'\x89PNG\r\n\x1a\n')
print(f"Is valid PNG: {is_png}")

print(f"\nTotal file size: {len(png_bytes)} bytes")

PNG Header (8 bytes):
As integers: [137, 80, 78, 71, 13, 10, 26, 10]
As hex: 89504e470d0a1a0a

Verifying PNG signature...
Is valid PNG: True

Total file size: 287 bytes


## 3. Reading WAV Files

Let's examine a WAV file's binary data, focusing on its 44-byte header.

In [5]:
# Function to analyze WAV header
def analyze_wav_header(wav_path):
    with wave.open(wav_path, 'rb') as wav_file:
        print(f"Number of channels: {wav_file.getnchannels()}")
        print(f"Sample width: {wav_file.getsampwidth()} bytes")
        print(f"Frame rate: {wav_file.getframerate()} Hz")
        print(f"Number of frames: {wav_file.getnframes()}")
        print(f"Compression type: {wav_file.getcomptype()}")

# Read a WAV file if it exists
wav_path = os.path.join('examples', 'test.wav')

try:
    # Read the WAV file in binary mode
    wav_bytes = read_binary_file(wav_path)
    
    # Examine the WAV header (first 44 bytes)
    wav_header = wav_bytes[:44]
    print("WAV Header (44 bytes):")
    print(f"As integers: {list(wav_header)}")
    print(f"As hex: {wav_header.hex()}")
    
    print("\nWAV File Analysis:")
    analyze_wav_header(wav_path)
    
    print(f"\nTotal file size: {len(wav_bytes)} bytes")
except FileNotFoundError:
    print(f"WAV file not found at {wav_path}")

WAV file not found at examples/test.wav


## 4. Testing Huffman Compression

Let's test our Huffman compression algorithm on a text file.

In [6]:
# Read the text file content
try:
    with open(text_path, 'r', encoding='utf-8') as f:
        text_content = f.read()
    
    print(f"Original text length: {len(text_content)} characters")
    print(f"Original size: {len(text_content.encode('utf-8'))} bytes")
    
    # Compress using Huffman coding
    encoded_data, codes = huffman_encode(text_content)
    print(f"\nNumber of unique characters: {len(codes)}")
    print("Huffman codes:")
    for char, code in sorted(codes.items()):
        if char.isprintable():
            print(f"'{char}': {code}")
        else:
            print(f"byte {ord(char)}: {code}")
    
    # Convert encoded data to bytes for storage
    encoded_bytes = bits_to_bytes(encoded_data)
    print(f"\nCompressed size: {len(encoded_bytes)} bytes")
    
    # Calculate compression ratio
    compression_ratio = len(text_content.encode('utf-8')) / len(encoded_bytes)
    print(f"Compression ratio: {compression_ratio:.2f}x")
    
    # Test decompression
    decoded_text = huffman_decode(encoded_data, codes)
    print("\nDecompression test:")
    print(f"Decompressed length: {len(decoded_text)} characters")
    print(f"Decompression successful: {decoded_text == text_content}")
except FileNotFoundError:
    print(f"Text file not found at {text_path}")

Original text length: 13 characters
Original size: 13 bytes
Counter({'h': 2, 'l': 2, 'f': 2, 'e': 1, 'o': 1, ' ': 1, 'u': 1, 'm': 1, 'a': 1, 'n': 1})

Number of unique characters: 10
Huffman codes:
' ': 1110
'a': 1011
'e': 1111
'f': 011
'h': 00
'l': 100
'm': 010
'n': 1101
'o': 1010
'u': 1100

Compressed size: 6 bytes
Compression ratio: 2.17x

Decompression test:
Decompressed length: 13 characters
Decompression successful: True


## 5. Detailed Byte Value Analysis

Let's analyze and display all byte values in different formats for each file.

### Text File Byte Values

In [7]:
try:
	text_bytes = read_binary_file(text_path)
	display_all_bytes(text_bytes, "Text File")
except FileNotFoundError:
  print(f"Text file not found at {text_path}")

=== Text File ===

Total bytes: 13

All bytes as:
Byte 0:
  Integer: 104
  Hex: 0x68
  Binary: 01101000
  ASCII: h

Byte 1:
  Integer: 101
  Hex: 0x65
  Binary: 01100101
  ASCII: e

Byte 2:
  Integer: 108
  Hex: 0x6c
  Binary: 01101100
  ASCII: l

Byte 3:
  Integer: 108
  Hex: 0x6c
  Binary: 01101100
  ASCII: l

Byte 4:
  Integer: 111
  Hex: 0x6f
  Binary: 01101111
  ASCII: o

Byte 5:
  Integer: 32
  Hex: 0x20
  Binary: 00100000
  ASCII:  

Byte 6:
  Integer: 104
  Hex: 0x68
  Binary: 01101000
  ASCII: h

Byte 7:
  Integer: 117
  Hex: 0x75
  Binary: 01110101
  ASCII: u

Byte 8:
  Integer: 102
  Hex: 0x66
  Binary: 01100110
  ASCII: f

Byte 9:
  Integer: 102
  Hex: 0x66
  Binary: 01100110
  ASCII: f

Byte 10:
  Integer: 109
  Hex: 0x6d
  Binary: 01101101
  ASCII: m

Byte 11:
  Integer: 97
  Hex: 0x61
  Binary: 01100001
  ASCII: a

Byte 12:
  Integer: 110
  Hex: 0x6e
  Binary: 01101110
  ASCII: n



### PNG File Byte Values

In [8]:
try:
    png_bytes = read_binary_file(png_path)
    display_all_bytes(png_bytes, "PNG File")
except FileNotFoundError:
    print(f"PNG file not found at {png_path}")

=== PNG File ===

Total bytes: 287

All bytes as:
Byte 0:
  Integer: 137
  Hex: 0x89
  Binary: 10001001
  ASCII: .

Byte 1:
  Integer: 80
  Hex: 0x50
  Binary: 01010000
  ASCII: P

Byte 2:
  Integer: 78
  Hex: 0x4e
  Binary: 01001110
  ASCII: N

Byte 3:
  Integer: 71
  Hex: 0x47
  Binary: 01000111
  ASCII: G

Byte 4:
  Integer: 13
  Hex: 0x0d
  Binary: 00001101
  ASCII: .

Byte 5:
  Integer: 10
  Hex: 0x0a
  Binary: 00001010
  ASCII: .

Byte 6:
  Integer: 26
  Hex: 0x1a
  Binary: 00011010
  ASCII: .

Byte 7:
  Integer: 10
  Hex: 0x0a
  Binary: 00001010
  ASCII: .

Byte 8:
  Integer: 0
  Hex: 0x00
  Binary: 00000000
  ASCII: .

Byte 9:
  Integer: 0
  Hex: 0x00
  Binary: 00000000
  ASCII: .

Byte 10:
  Integer: 0
  Hex: 0x00
  Binary: 00000000
  ASCII: .

Byte 11:
  Integer: 13
  Hex: 0x0d
  Binary: 00001101
  ASCII: .

Byte 12:
  Integer: 73
  Hex: 0x49
  Binary: 01001001
  ASCII: I

Byte 13:
  Integer: 72
  Hex: 0x48
  Binary: 01001000
  ASCII: H

Byte 14:
  Integer: 68
  Hex: 0x44
  B

### Encoded Data Byte Values

In [9]:
if 'encoded_bytes' in locals():
    display_all_bytes(encoded_bytes, "Huffman Encoded Data")
else:
    print("No encoded data available. Run the Huffman compression cells first.")

=== Huffman Encoded Data ===

Total bytes: 6

All bytes as:
Byte 0:
  Integer: 62
  Hex: 0x3e
  Binary: 00111110
  ASCII: >

Byte 1:
  Integer: 74
  Hex: 0x4a
  Binary: 01001010
  ASCII: J

Byte 2:
  Integer: 227
  Hex: 0xe3
  Binary: 11100011
  ASCII: .

Byte 3:
  Integer: 27
  Hex: 0x1b
  Binary: 00011011
  ASCII: .

Byte 4:
  Integer: 87
  Hex: 0x57
  Binary: 01010111
  ASCII: W

Byte 5:
  Integer: 160
  Hex: 0xa0
  Binary: 10100000
  ASCII: .

