# Interactive Image Hash Parameter Tuner

This notebook helps you find optimal parameters for perceptual hash-based duplicate image detection:
- **Hash Algorithm**: aHash, dHash, pHash, or wHash
- **Hash Size**: Larger = more precise, smaller = more tolerant
- **Threshold**: Hamming distance threshold for considering images as duplicates

Navigate through frames to see how each image compares to previous images in the sequence.

## 1. Install and Import Required Libraries

In [1]:
# Install required packages (run once)
%pip install imagehash pillow ipywidgets matplotlib -q

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
import imagehash
from PIL import Image
import ipywidgets as widgets
from IPython.display import display, HTML
import matplotlib.pyplot as plt
from pathlib import Path

print("‚úì All libraries imported successfully")

‚úì All libraries imported successfully


## 2. Load Images from Directory

Specify the directory containing your image frames. Images should be named with frame numbers as prefix (e.g., `0001_image.jpg`).

In [3]:
def parse_frame_number(filename):
	try:
		prefix = filename.split('_', 1)[0]
		return int(prefix)
	except:
		return None

def load_image_list(image_dir, extensions=('.png', '.jpg', '.jpeg')):
	"""
	Load all images from directory and sort by frame number.
	Returns list of (frame_num, filepath) tuples.
	"""
	images = []
	for filename in os.listdir(image_dir):
		if filename.lower().endswith(extensions):
			frame_num = parse_frame_number(filename)
			if frame_num is not None:
				filepath = os.path.join(image_dir, filename)
				images.append((frame_num, filepath))
	
	# Sort by frame number
	images.sort(key=lambda x: x[0])
	return images

# Configure your image directory here
IMAGE_DIR = r"D:\Projects\+embedded\YellowToyCar\private\capture\JPEG_640x480\I\frames"  # ‚Üê CHANGE THIS

# Load images
try:
	image_list = load_image_list(IMAGE_DIR)
	print(f"‚úì Loaded {len(image_list)} images from {IMAGE_DIR}")
	if image_list:
		print(f"  Frame range: {image_list[0][0]} to {image_list[-1][0]}")
except Exception as e:
	print(f"‚úó Error loading images: {e}")
	print(f"  Please update IMAGE_DIR path above")
	image_list = []

‚úì Loaded 1457 images from D:\Projects\+embedded\YellowToyCar\private\capture\JPEG_640x480\I\frames
  Frame range: 1 to 1457


## 3. Initialize State Variables

Create state to track current frame index and hash database.

In [4]:
# Global state
state = {
	'current_idx': 0,  # Index in image_list
	'hash_db': {},     # {filepath: hash_value} for all images up to current
}

## 4. Create Interactive UI Controls

Configure hash algorithm, size, and threshold parameters.

In [5]:
# Hash parameter controls
hash_algo_dropdown = widgets.Dropdown(
	options=['ahash', 'dhash', 'phash', 'whash'],
	value='phash',
	description='Algorithm:',
	style={'description_width': '100px'},
	layout=widgets.Layout(width='300px')
)

hash_size_slider = widgets.IntSlider(
	value=8,
	min=4,
	max=256,
	step=1,
	description='Hash Size:',
	continuous_update=False,
	style={'description_width': '100px'},
	layout=widgets.Layout(width='800px')
)

# Threshold max will be adjusted dynamically based on hash_size^2
threshold_slider = widgets.IntSlider(
	value=5,
	min=0,
	max=4096,
	step=1,
	description='Threshold:',
	continuous_update=False,
	style={'description_width': '100px'},
	layout=widgets.Layout(width='800px')
)

# Navigation controls
prev_button = widgets.Button(description='‚Üê Previous', button_style='info')
next_button = widgets.Button(description='Next ‚Üí', button_style='info')
frame_input = widgets.IntText(value=0, description='Frame:', style={'description_width': '70px'}, layout=widgets.Layout(width='200px'))

# Output widgets
output_display = widgets.Output()
info_display = widgets.Output()

print("‚úì UI controls created")

‚úì UI controls created


## 5. Hash Computation Functions

Create functions to compute and compare perceptual hashes.

In [6]:
def compute_hash(filepath, algorithm='phash', hash_size=8):
	"""Compute perceptual hash of an image."""
	img = Image.open(filepath)
	
	if algorithm == 'ahash':
		return imagehash.average_hash(img, hash_size=hash_size)
	elif algorithm == 'dhash':
		return imagehash.dhash(img, hash_size=hash_size)
	elif algorithm == 'phash':
		return imagehash.phash(img, hash_size=hash_size)
	elif algorithm == 'whash':
		return imagehash.whash(img, hash_size=hash_size)
	else:
		raise ValueError(f"Unknown algorithm: {algorithm}")

def find_most_similar(current_hash, hash_db):
	"""
	Find the most similar image from hash database.
	Returns (min_distance, similar_filepath) or (None, None) if database is empty.
	"""
	if not hash_db:
		return None, None
	
	min_distance = None
	similar_path = None
	
	for filepath, stored_hash in hash_db.items():
		distance = current_hash - stored_hash  # Hamming distance
		if min_distance is None or distance < min_distance:
			min_distance = distance
			similar_path = filepath
	
	return min_distance, similar_path

print("‚úì Hash functions defined")

‚úì Hash functions defined


## 6. Display Update Function

Main function to update the display when parameters change or navigation occurs.

In [7]:
def update_display():
	"""Update the image display and distance calculation."""
	if not image_list:
		with output_display:
			output_display.clear_output(wait=True)
			print("No images loaded. Please configure IMAGE_DIR and rerun cell 5.")
		return
	
	idx = state['current_idx']
	if idx < 0 or idx >= len(image_list):
		return
	
	# Get current image
	frame_num, filepath = image_list[idx]
	
	# Get hash parameters
	algorithm = hash_algo_dropdown.value
	hash_size = hash_size_slider.value
	# Keep threshold within valid range based on bits (hash_size^2)
	threshold_slider.max = hash_size * hash_size
	threshold = threshold_slider.value
	
	# Rebuild hash database for all images up to current (when params change)
	# This ensures we recalculate with new parameters
	state['hash_db'].clear()
	for i in range(idx):
		prev_frame, prev_path = image_list[i]
		prev_hash = compute_hash(prev_path, algorithm, hash_size)
		state['hash_db'][prev_path] = prev_hash
	
	# Compute hash for current image
	current_hash = compute_hash(filepath, algorithm, hash_size)
	
	# Find most similar image
	min_distance, similar_path = find_most_similar(current_hash, state['hash_db'])
	
	# Add current image to database for next iteration
	state['hash_db'][filepath] = current_hash
	
	# Unique frames so far = number of unique hashes in the db
	unique_count = len({str(h) for h in state['hash_db'].values()})
	
	# Update frame input widget
	frame_input.value = frame_num
	
	# Display images and info
	with output_display:
		output_display.clear_output(wait=True)
		
		fig, axes = plt.subplots(1, 2, figsize=(14, 6))
		# Use default Matplotlib backgrounds (no transparency forced)
		# This avoids blending with editor and matches the output area's solid background.
		
		# Current image
		current_img = Image.open(filepath)
		axes[0].imshow(current_img)
		axes[0].set_title(f'Current Frame #{frame_num}\n{Path(filepath).name}', fontsize=10)
		axes[0].axis('off')
		
		# Most similar image
		if similar_path:
			similar_img = Image.open(similar_path)
			similar_frame = parse_frame_number(Path(similar_path).name)
			axes[1].imshow(similar_img)
			axes[1].set_title(f'Most Similar Frame #{similar_frame}\n{Path(similar_path).name}', fontsize=10)
			axes[1].axis('off')
		else:
			axes[1].text(0.5, 0.5, 'First Image\n(No previous images to compare)', 
						ha='center', va='center', fontsize=12)
			axes[1].axis('off')
		
		plt.tight_layout()
		plt.show()
	
	# Display distance info
	with info_display:
		info_display.clear_output(wait=True)
		
		if min_distance is not None:
			is_duplicate = min_distance <= threshold
			status = "üî¥ DUPLICATE" if is_duplicate else "‚úÖ UNIQUE"
			status_class = "duplicate" if is_duplicate else "unique"
			
			html_content = f"""
			<div class=\"status-card {status_class}\">
				<h3>{status}</h3>
				<p><b>Hamming Distance:</b> {min_distance}</p>
				<p><b>Threshold:</b> {threshold} (distances ‚â§ threshold are duplicates)</p>
				<p><b>Algorithm:</b> {algorithm}, <b>Hash Size:</b> {hash_size}√ó{hash_size}</p>
				<p><b>Unique frames so far:</b> {unique_count}</p>
				<p><b>Progress:</b> Image {idx + 1} of {len(image_list)}</p>
			</div>
			"""
			display(HTML(html_content))
		else:
			display(HTML("""
			<div class=\"status-card\">
				<h3>First Image</h3>
				<p>This is the first image in the sequence. No previous images to compare.</p>
			</div>
			"""))

print("‚úì Display function defined")

‚úì Display function defined


## 7. Navigation Event Handlers

Wire up the buttons and input controls.

In [8]:
def on_prev_clicked(b):
	"""Handle previous button click."""
	if state['current_idx'] > 0:
		state['current_idx'] -= 1
		update_display()

def on_next_clicked(b):
	"""Handle next button click."""
	if state['current_idx'] < len(image_list) - 1:
		state['current_idx'] += 1
		update_display()

def on_frame_input_change(change):
	"""Handle manual frame number input."""
	target_frame = change['new']
	# Find index of image with this frame number
	for idx, (frame_num, _) in enumerate(image_list):
		if frame_num == target_frame:
			state['current_idx'] = idx
			update_display()
			return
	# If frame not found, show message
	print(f"Frame #{target_frame} not found in image list")

def on_param_change(change):
	"""Handle hash parameter changes."""
	# Keep threshold within valid range based on bits (hash_size^2)
	threshold_slider.max = hash_size_slider.value * hash_size_slider.value
	update_display()

# Attach event handlers
prev_button.on_click(on_prev_clicked)
next_button.on_click(on_next_clicked)
frame_input.observe(on_frame_input_change, names='value')

# Attach parameter change handlers
hash_algo_dropdown.observe(on_param_change, names='value')
hash_size_slider.observe(on_param_change, names='value')
threshold_slider.observe(on_param_change, names='value')

print("‚úì Event handlers attached")

‚úì Event handlers attached


## 8. Display Interactive UI

Run this cell to show the interactive interface!

In [9]:
%%html
<style>
.cell-output-ipywidget-background {
	background: transparent !important;
}
:root {
	--jp-layout-color0: var(--vscode-panel-background);
	--jp-layout-color1: var(--vscode-panel-background);
	--jp-layout-color2: color-mix(in srgb,var(--jp-layout-color1),#fff 20%);
	--jp-layout-color3: color-mix(in srgb,var(--jp-layout-color2),#fff 20%);
	--jp-content-font-color0: var(--vscode-foreground);
	--jp-content-font-color1: var(--vscode-foreground);
	--jp-content-font-color2: color-mix(in srgb,var(--jp-content-font-color1),#fff 20%);
	--jp-content-font-color3: color-mix(in srgb,var(--jp-content-font-color2),#fff 20%);
	--jp-ui-font-color0: --jp-content-font-color0;
    --jp-ui-font-color1: --jp-content-font-color1;
    --jp-ui-font-color2: --jp-content-font-color2;
    --jp-ui-font-color3: --jp-content-font-color3;
}

.status-card {
	border: 1px solid var(--vscode-panel-border);
	padding: 0 1em;
	margin: 1em 0;
	border-radius: 6px;
	box-shadow: 0 1px 2px var(--vscode-widget-shadow);
}
.status-card.duplicate { border-color: var(--vscode-inputValidation-warningBorder); }
.status-card.unique    { border-color: var(--vscode-testing-iconPassed); }
</style>

In [10]:
# Build UI layout
param_box = widgets.VBox([
	widgets.HTML("<h3>Hash Parameters</h3>"),
	hash_algo_dropdown,
	hash_size_slider,
	threshold_slider
])

nav_box = widgets.HBox([prev_button, frame_input, next_button])

ui = widgets.VBox([
	param_box,
	widgets.HTML("<h3>Navigation</h3>"),
	nav_box,
	info_display,
	output_display
])

# Display UI and trigger initial update
display(ui)
if image_list:
	update_display()
else:
	with output_display:
		print("‚ö†Ô∏è No images loaded. Please configure IMAGE_DIR in cell 5 and rerun.")

VBox(children=(VBox(children=(HTML(value='<h3>Hash Parameters</h3>'), Dropdown(description='Algorithm:', index‚Ä¶