# Lab 02 Notebook: The Hex Detective**Binary Forensics Investigation**---## IntroductionThis notebook guides you through binary file analysis step-by-step. You will learn to examine files at the byte level, identify file types by their signatures, and repair corrupted files.**Before you begin:**- Read `README.md` for the full case briefing- Review `concepts.md` for technical background- Ensure all files in `data/` are present

---## Phase 1: Field Work - CLI Binary Forensics### Exercise 1.1: Initial File IdentificationUse the `file` command to identify file types.

In [None]:
# Run shell commands using ! prefix!file data/*

In [None]:
# Check file sizes!ls -lh data/

### Exercise 1.2: Hex Dump ExaminationUse `xxd` to view the first bytes of each file.

In [None]:
# View first 64 bytes of unknown_a.bin!xxd -l 64 data/unknown_a.bin

In [None]:
# View first 64 bytes of unknown_b.bin!xxd -l 64 data/unknown_b.bin

In [None]:
# View first 64 bytes of unknown_c.bin!xxd -l 64 data/unknown_c.bin

In [None]:
# View first 64 bytes of corrupted.png!xxd -l 64 data/corrupted.png

### Exercise 1.3: String Extraction

In [None]:
# Extract strings from unknown_c.bin!strings data/unknown_c.bin

In [None]:
# Extract strings from hidden_message.bin!strings data/hidden_message.bin

### Exercise 1.4: Compare Corrupted vs Original

In [None]:
# Compare headersprint("Reference (original) PNG:")!xxd -l 16 data/reference/original.pngprint("\nCorrupted PNG:")!xxd -l 16 data/corrupted.png

---## Phase 2: The Build - BinaryAnalyzer Class### Exercise 2.1: Class Constructor

In [None]:
%%writefile binary_analyzer.py# binary_analyzer.py"""Binary forensics analysis toolkit."""class BinaryAnalyzer:    """A forensic tool for analyzing binary files."""    def __init__(self, filepath):        self.filepath = filepath        self.data = b''        self.file_type = 'Unknown'                # Magic signature database        self.magic_db = {            b'\x89PNG\r\n\x1a\n': 'PNG',            b'\xff\xd8\xff': 'JPEG',            b'GIF87a': 'GIF',            b'GIF89a': 'GIF',            b'%PDF': 'PDF',            b'PK\x03\x04': 'ZIP',            b'\x7fELF': 'ELF',            b'MZ': 'PE/EXE',        }    def load_file(self):        """Load file in binary mode."""        try:            with open(self.filepath, 'rb') as f:                self.data = f.read()            print(f"Loaded {len(self.data)} bytes")        except FileNotFoundError:            print(f"Error: File not found: {self.filepath}")            self.data = b''    def get_header(self, num_bytes=16):        """Return first N bytes as hex string."""        return self.data[:num_bytes].hex(' ')    def detect_type(self):        """Detect file type by magic signature."""        for sig, ftype in self.magic_db.items():            if self.data.startswith(sig):                self.file_type = ftype                return ftype                # Check if text        sample = self.data[:512]        if sample and sum(32 <= b <= 126 or b in (9,10,13) for b in sample) / len(sample) > 0.85:            self.file_type = 'Text/ASCII'            return self.file_type                    self.file_type = 'Unknown'        return self.file_type    def extract_strings(self, min_length=4):        """Extract printable ASCII strings."""        strings = []        current = []        for byte in self.data:            if 32 <= byte <= 126:                current.append(chr(byte))            else:                if len(current) >= min_length:                    strings.append(''.join(current))                current = []        if len(current) >= min_length:            strings.append(''.join(current))        return strings    def hexdump(self, start=0, length=256):        """Generate formatted hex dump."""        lines = []        for i in range(0, min(length, len(self.data) - start), 16):            chunk = self.data[start+i:start+i+16]            offset = f'{start+i:08x}'            hex_bytes = ' '.join(f'{b:02x}' for b in chunk).ljust(47)            ascii_repr = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)            lines.append(f'{offset}  {hex_bytes}  |{ascii_repr}|')        return '\n'.join(lines)    def report(self):        """Print analysis report."""        print("=" * 50)        print(f"File: {self.filepath}")        print(f"Size: {len(self.data)} bytes")        print(f"Type: {self.file_type}")        print(f"Header: {self.get_header(16)}")        strings = self.extract_strings(6)[:5]        if strings:            print(f"Strings: {strings}")        print("=" * 50)class FileRepairer:    """Tool for repairing corrupted file headers."""        SIGNATURES = {        'PNG': bytes([0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A]),        'JPEG': bytes([0xFF, 0xD8, 0xFF, 0xE0]),        'GIF': b'GIF89a',        'PDF': b'%PDF-1.4',    }    def __init__(self, filepath):        self.filepath = filepath        self.data = None        with open(filepath, 'rb') as f:            self.data = bytearray(f.read())        print(f"Loaded {len(self.data)} bytes")    def diagnose(self, file_type='PNG'):        """Compare header to expected signature."""        sig = self.SIGNATURES.get(file_type)        if not sig:            print(f"Unknown type: {file_type}")            return []                print(f"Expected: {sig.hex(' ')}")        print(f"Actual:   {self.data[:len(sig)].hex(' ')}")                corrupted = []        for i, expected in enumerate(sig):            if self.data[i] != expected:                corrupted.append((i, expected, self.data[i]))                print(f"Byte {i}: {self.data[i]:02x} should be {expected:02x}")        return corrupted    def repair(self, file_type='PNG', output_path=None):        """Repair header and save."""        sig = self.SIGNATURES.get(file_type)        if not sig:            return False                if output_path is None:            output_path = self.filepath.replace('.', '_repaired.')                for i, byte in enumerate(sig):            self.data[i] = byte                with open(output_path, 'wb') as f:            f.write(self.data)        print(f"Saved to: {output_path}")        return True    def verify(self, filepath):        """Verify file type."""        with open(filepath, 'rb') as f:            header = f.read(16)        for ftype, sig in self.SIGNATURES.items():            if header.startswith(sig):                return ftype        return 'Unknown'

### Exercise 2.2: Test the BinaryAnalyzer

In [None]:
from binary_analyzer import BinaryAnalyzer# Test on unknown_a.binanalyzer = BinaryAnalyzer('data/unknown_a.bin')analyzer.load_file()analyzer.detect_type()analyzer.report()

In [None]:
# Test on all unknown filesfor fname in ['unknown_a.bin', 'unknown_b.bin', 'unknown_c.bin']:    print(f"\n--- {fname} ---")    a = BinaryAnalyzer(f'data/{fname}')    a.load_file()    print(f"Type: {a.detect_type()}")    print(f"Header: {a.get_header(8)}")

### Exercise 2.3: Extract Hidden Messages

In [None]:
analyzer = BinaryAnalyzer('data/hidden_message.bin')analyzer.load_file()strings = analyzer.extract_strings(min_length=10)print("Hidden messages found:")for s in strings:    print(f"  - {s}")

---## Phase 3: Critical Incident - File Repair### Exercise 3.1: Diagnose the Corruption

In [None]:
from binary_analyzer import FileRepairer# Diagnose the corrupted filerepairer = FileRepairer('data/corrupted.png')print("\nDiagnosis:")corrupted_bytes = repairer.diagnose('PNG')

### Exercise 3.2: Repair the File

In [None]:
# Perform the repairsuccess = repairer.repair('PNG', 'data/repaired.png')if success:    print("\nVerification:")    result = repairer.verify('data/repaired.png')    print(f"Repaired file type: {result}")

In [None]:
# Verify with file command!file data/corrupted.png!file data/repaired.png

### Exercise 3.3: View the Repaired Image

In [None]:
# Display the repaired image (if in Jupyter with PIL)try:    from PIL import Image    import matplotlib.pyplot as plt        img = Image.open('data/repaired.png')    plt.figure(figsize=(4, 4))    plt.imshow(img)    plt.title('Repaired Image')    plt.axis('off')    plt.show()except ImportError:    print("Install PIL and matplotlib to display: pip install pillow matplotlib")except Exception as e:    print(f"Could not display image: {e}")

---## Summary### Files Analyzed:| File | Actual Type | Notes ||------|-------------|-------|| unknown_a.bin | PNG | Image disguised as .bin || unknown_b.bin | JPEG | Image disguised as .bin || unknown_c.bin | Text | Server log file || hidden_message.bin | Binary | Contains embedded messages || corrupted.png | PNG | Header was corrupted, now repaired |### Key Takeaways:1. File extensions can be misleading - always check magic bytes2. Binary analysis requires reading files in `'rb'` mode3. Magic signatures identify file types reliably4. Corrupted headers can be repaired by patching bytes---**End of Notebook**