# ImageNet-1K Dataset Explorer 📊

This notebook provides comprehensive exploration and visualization tools for the ImageNet-1K dataset. You can:

- 🔍 Browse the dataset structure and class distribution
- 🖼️ View sample images from different classes
- 📈 Analyze image properties and metadata
- 🎛️ Use interactive widgets to explore the dataset
- 📊 Generate detailed statistics and visualizations

## Requirements
- ImageNet-1K dataset downloaded and organized in standard format
- Python libraries: PIL, matplotlib, pandas, numpy, ipywidgets
- Jupyter notebook environment

In [None]:
# Import Required Libraries
import os
import sys
import random
import json
from pathlib import Path
from collections import defaultdict, Counter
import warnings
warnings.filterwarnings('ignore')

# Image processing and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image, ExifTags
import seaborn as sns

# Interactive widgets
try:
    import ipywidgets as widgets
    from IPython.display import display, HTML
    WIDGETS_AVAILABLE = True
    print("✅ Interactive widgets available")
except ImportError:
    WIDGETS_AVAILABLE = False
    print("⚠️ ipywidgets not available. Install with: pip install ipywidgets")

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

print("📚 All required libraries imported successfully!")

In [None]:
# Setup Dataset Path and Configuration
# ⚠️ UPDATE THIS PATH TO YOUR IMAGENET DATASET LOCATION
IMAGENET_ROOT = r"C:\path\to\imagenet"  # Change this to your ImageNet path

# Alternative paths you might need to try:
# IMAGENET_ROOT = r"D:\datasets\imagenet"
# IMAGENET_ROOT = r"/data/imagenet"
# IMAGENET_ROOT = r"/home/user/datasets/imagenet"

TRAIN_DIR = os.path.join(IMAGENET_ROOT, "train")
VAL_DIR = os.path.join(IMAGENET_ROOT, "val")

# Configuration settings
MAX_IMAGES_PER_CLASS = 10  # Max images to display per class
FIGURE_SIZE = (15, 10)     # Default figure size
THUMBNAIL_SIZE = (224, 224) # Thumbnail size for previews

print(f"📁 Dataset root: {IMAGENET_ROOT}")
print(f"📁 Train directory: {TRAIN_DIR}")
print(f"📁 Validation directory: {VAL_DIR}")

# Check if directories exist
if os.path.exists(TRAIN_DIR):
    print("✅ Training directory found")
else:
    print("❌ Training directory not found - please update IMAGENET_ROOT path")

if os.path.exists(VAL_DIR):
    print("✅ Validation directory found")
else:
    print("❌ Validation directory not found - please update IMAGENET_ROOT path")

In [None]:
# Load ImageNet Class Labels
def load_imagenet_classes():
    """Load ImageNet class labels and create mappings"""
    
    # ImageNet 1000 class labels (sample - you can download the full file)
    # Full file available at: https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt
    
    # Create a basic mapping for demonstration
    # In practice, you'd load this from a file or use torchvision.datasets.ImageNet
    
    classes = {}
    idx_to_class = {}
    
    try:
        # Try to get classes from the dataset directory structure
        if os.path.exists(TRAIN_DIR):
            class_dirs = sorted([d for d in os.listdir(TRAIN_DIR) 
                               if os.path.isdir(os.path.join(TRAIN_DIR, d))])
            
            for idx, class_dir in enumerate(class_dirs):
                classes[class_dir] = idx
                idx_to_class[idx] = class_dir
            
            print(f"✅ Loaded {len(classes)} classes from dataset structure")
            
        else:
            print("⚠️ No dataset directory found, using sample classes")
            # Sample classes for demonstration
            sample_classes = [
                "n01440764", "n01443537", "n01484850", "n01491361", "n01494475",
                "n01496331", "n01498041", "n01514668", "n01514859", "n01518878"
            ]
            for idx, class_id in enumerate(sample_classes):
                classes[class_id] = idx
                idx_to_class[idx] = class_id
                
    except Exception as e:
        print(f"❌ Error loading classes: {e}")
        return {}, {}
    
    return classes, idx_to_class

# Load the classes
class_to_idx, idx_to_class = load_imagenet_classes()
num_classes = len(class_to_idx)

print(f"📊 Total classes: {num_classes}")
if num_classes > 0:
    print(f"📝 Sample classes: {list(class_to_idx.keys())[:5]}...")
else:
    print("❌ No classes loaded. Please check your dataset path.")

In [None]:
# Browse Dataset Structure
def analyze_dataset_structure():
    """Analyze the structure of the ImageNet dataset"""
    
    structure_info = {
        'train': {},
        'val': {},
        'total_train_images': 0,
        'total_val_images': 0
    }
    
    print("🔍 Analyzing dataset structure...")
    
    # Analyze training data
    if os.path.exists(TRAIN_DIR):
        print("📁 Scanning training directory...")
        train_classes = os.listdir(TRAIN_DIR)
        
        for class_dir in train_classes[:20]:  # Analyze first 20 classes for speed
            class_path = os.path.join(TRAIN_DIR, class_dir)
            if os.path.isdir(class_path):
                images = [f for f in os.listdir(class_path) 
                         if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
                structure_info['train'][class_dir] = len(images)
                structure_info['total_train_images'] += len(images)
        
        print(f"✅ Analyzed {len(structure_info['train'])} training classes")
    
    # Analyze validation data
    if os.path.exists(VAL_DIR):
        print("📁 Scanning validation directory...")
        val_classes = os.listdir(VAL_DIR)
        
        for class_dir in val_classes[:20]:  # Analyze first 20 classes for speed
            class_path = os.path.join(VAL_DIR, class_dir)
            if os.path.isdir(class_path):
                images = [f for f in os.listdir(class_path) 
                         if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
                structure_info['val'][class_dir] = len(images)
                structure_info['total_val_images'] += len(images)
        
        print(f"✅ Analyzed {len(structure_info['val'])} validation classes")
    
    return structure_info

# Run the analysis
dataset_info = analyze_dataset_structure()

# Display results
print("\n📊 Dataset Structure Summary:")
print(f"🏋️ Training classes analyzed: {len(dataset_info['train'])}")
print(f"🏋️ Total training images (sample): {dataset_info['total_train_images']}")
print(f"✅ Validation classes analyzed: {len(dataset_info['val'])}")
print(f"✅ Total validation images (sample): {dataset_info['total_val_images']}")

if dataset_info['train']:
    train_counts = list(dataset_info['train'].values())
    print(f"📈 Training images per class: min={min(train_counts)}, max={max(train_counts)}, avg={np.mean(train_counts):.1f}")

if dataset_info['val']:
    val_counts = list(dataset_info['val'].values())
    print(f"📈 Validation images per class: min={min(val_counts)}, max={max(val_counts)}, avg={np.mean(val_counts):.1f}")

In [None]:
# Display Sample Images by Class
def display_sample_images(class_name, num_images=6, split='train'):
    """Display sample images from a specific class"""
    
    data_dir = TRAIN_DIR if split == 'train' else VAL_DIR
    class_path = os.path.join(data_dir, class_name)
    
    if not os.path.exists(class_path):
        print(f"❌ Class '{class_name}' not found in {split} directory")
        return
    
    # Get image files
    image_files = [f for f in os.listdir(class_path) 
                   if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
    
    if len(image_files) == 0:
        print(f"❌ No images found in class '{class_name}'")
        return
    
    # Randomly sample images
    sample_images = random.sample(image_files, min(num_images, len(image_files)))
    
    # Create subplot
    cols = 3
    rows = (len(sample_images) + cols - 1) // cols
    fig, axes = plt.subplots(rows, cols, figsize=(15, 5*rows))
    
    if rows == 1:
        axes = axes.reshape(1, -1)
    
    fig.suptitle(f"Sample Images from Class: {class_name} ({split})", fontsize=16, fontweight='bold')
    
    for idx, img_file in enumerate(sample_images):
        row = idx // cols
        col = idx % cols
        
        try:
            img_path = os.path.join(class_path, img_file)
            img = Image.open(img_path)
            
            axes[row, col].imshow(img)
            axes[row, col].set_title(f"{img_file}\nSize: {img.size}", fontsize=10)
            axes[row, col].axis('off')
            
        except Exception as e:
            axes[row, col].text(0.5, 0.5, f"Error loading\n{img_file}", 
                               ha='center', va='center', transform=axes[row, col].transAxes)
            axes[row, col].axis('off')
    
    # Hide empty subplots
    for idx in range(len(sample_images), rows * cols):
        row = idx // cols
        col = idx % cols
        axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print(f"📊 Displayed {len(sample_images)} out of {len(image_files)} images from class '{class_name}'")

# Example: Display images from the first available class
if class_to_idx:
    first_class = list(class_to_idx.keys())[0]
    print(f"🖼️ Displaying sample images from class: {first_class}")
    display_sample_images(first_class, num_images=6)
else:
    print("⚠️ No classes available. Please check your dataset path.")

In [None]:
# Image Metadata Analysis
def analyze_image_metadata(class_name, max_images=50, split='train'):
    """Analyze metadata for images in a specific class"""
    
    data_dir = TRAIN_DIR if split == 'train' else VAL_DIR
    class_path = os.path.join(data_dir, class_name)
    
    if not os.path.exists(class_path):
        print(f"❌ Class '{class_name}' not found")
        return None
    
    image_files = [f for f in os.listdir(class_path) 
                   if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
    
    sample_files = random.sample(image_files, min(max_images, len(image_files)))
    
    metadata = {
        'filenames': [],
        'widths': [],
        'heights': [],
        'file_sizes': [],
        'formats': [],
        'modes': [],
        'aspect_ratios': []
    }
    
    print(f"🔍 Analyzing metadata for {len(sample_files)} images from class '{class_name}'...")
    
    for img_file in sample_files:
        try:
            img_path = os.path.join(class_path, img_file)
            
            # File size
            file_size = os.path.getsize(img_path)
            
            # Image properties
            with Image.open(img_path) as img:
                width, height = img.size
                format_type = img.format
                mode = img.mode
                aspect_ratio = width / height
                
                metadata['filenames'].append(img_file)
                metadata['widths'].append(width)
                metadata['heights'].append(height)
                metadata['file_sizes'].append(file_size)
                metadata['formats'].append(format_type)
                metadata['modes'].append(mode)
                metadata['aspect_ratios'].append(aspect_ratio)
                
        except Exception as e:
            print(f"⚠️ Error processing {img_file}: {e}")
    
    return metadata

def visualize_metadata(metadata, class_name):
    """Visualize image metadata"""
    
    if not metadata or len(metadata['widths']) == 0:
        print("❌ No metadata to visualize")
        return
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle(f"Image Metadata Analysis - Class: {class_name}", fontsize=16, fontweight='bold')
    
    # Width distribution
    axes[0, 0].hist(metadata['widths'], bins=20, alpha=0.7, color='skyblue', edgecolor='black')
    axes[0, 0].set_title('Width Distribution')
    axes[0, 0].set_xlabel('Width (pixels)')
    axes[0, 0].set_ylabel('Frequency')
    
    # Height distribution
    axes[0, 1].hist(metadata['heights'], bins=20, alpha=0.7, color='lightgreen', edgecolor='black')
    axes[0, 1].set_title('Height Distribution')
    axes[0, 1].set_xlabel('Height (pixels)')
    axes[0, 1].set_ylabel('Frequency')
    
    # File size distribution
    file_sizes_mb = [size / (1024*1024) for size in metadata['file_sizes']]
    axes[0, 2].hist(file_sizes_mb, bins=20, alpha=0.7, color='coral', edgecolor='black')
    axes[0, 2].set_title('File Size Distribution')
    axes[0, 2].set_xlabel('File Size (MB)')
    axes[0, 2].set_ylabel('Frequency')
    
    # Aspect ratio distribution
    axes[1, 0].hist(metadata['aspect_ratios'], bins=20, alpha=0.7, color='gold', edgecolor='black')
    axes[1, 0].set_title('Aspect Ratio Distribution')
    axes[1, 0].set_xlabel('Aspect Ratio (W/H)')
    axes[1, 0].set_ylabel('Frequency')
    
    # Format distribution
    format_counts = Counter(metadata['formats'])
    axes[1, 1].bar(format_counts.keys(), format_counts.values(), color='lightpink', edgecolor='black')
    axes[1, 1].set_title('Image Format Distribution')
    axes[1, 1].set_xlabel('Format')
    axes[1, 1].set_ylabel('Count')
    
    # Mode distribution
    mode_counts = Counter(metadata['modes'])
    axes[1, 2].bar(mode_counts.keys(), mode_counts.values(), color='lightcyan', edgecolor='black')
    axes[1, 2].set_title('Color Mode Distribution')
    axes[1, 2].set_xlabel('Mode')
    axes[1, 2].set_ylabel('Count')
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print(f"\n📊 Metadata Summary for class '{class_name}':")
    print(f"📏 Width: min={min(metadata['widths'])}, max={max(metadata['widths'])}, avg={np.mean(metadata['widths']):.1f}")
    print(f"📏 Height: min={min(metadata['heights'])}, max={max(metadata['heights'])}, avg={np.mean(metadata['heights']):.1f}")
    print(f"💾 File size: min={min(file_sizes_mb):.2f}MB, max={max(file_sizes_mb):.2f}MB, avg={np.mean(file_sizes_mb):.2f}MB")
    print(f"📐 Aspect ratio: min={min(metadata['aspect_ratios']):.2f}, max={max(metadata['aspect_ratios']):.2f}, avg={np.mean(metadata['aspect_ratios']):.2f}")

# Example: Analyze metadata for the first available class
if class_to_idx:
    first_class = list(class_to_idx.keys())[0]
    metadata = analyze_image_metadata(first_class, max_images=30)
    if metadata:
        visualize_metadata(metadata, first_class)
else:
    print("⚠️ No classes available for metadata analysis.")

In [None]:
# Interactive Image Viewer
if WIDGETS_AVAILABLE:
    
    class ImageNetExplorer:
        def __init__(self):
            self.current_images = []
            self.current_class = None
            self.current_index = 0
            
        def create_explorer_widget(self):
            """Create interactive explorer widget"""
            
            # Class dropdown
            available_classes = list(class_to_idx.keys())[:20]  # Limit for performance
            self.class_dropdown = widgets.Dropdown(
                options=available_classes,
                value=available_classes[0] if available_classes else None,
                description='Class:',
                style={'description_width': 'initial'}
            )
            
            # Split dropdown
            self.split_dropdown = widgets.Dropdown(
                options=['train', 'val'],
                value='train',
                description='Split:',
                style={'description_width': 'initial'}
            )
            
            # Navigation buttons
            self.prev_button = widgets.Button(description='◀ Previous', button_style='info')
            self.next_button = widgets.Button(description='Next ▶', button_style='info')
            self.random_button = widgets.Button(description='🎲 Random', button_style='warning')
            
            # Image info
            self.image_info = widgets.HTML(value="Select a class to start exploring")
            
            # Image display
            self.image_output = widgets.Output()
            
            # Set up event handlers
            self.class_dropdown.observe(self.on_class_change, names='value')
            self.split_dropdown.observe(self.on_split_change, names='value')
            self.prev_button.on_click(self.show_previous)
            self.next_button.on_click(self.show_next)
            self.random_button.on_click(self.show_random)
            
            # Layout
            controls = widgets.HBox([self.class_dropdown, self.split_dropdown])
            navigation = widgets.HBox([self.prev_button, self.next_button, self.random_button])
            
            return widgets.VBox([controls, navigation, self.image_info, self.image_output])
        
        def load_class_images(self, class_name, split):
            """Load images for a specific class"""
            data_dir = TRAIN_DIR if split == 'train' else VAL_DIR
            class_path = os.path.join(data_dir, class_name)
            
            if os.path.exists(class_path):
                self.current_images = [f for f in os.listdir(class_path) 
                                     if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
                self.current_class = class_name
                self.current_index = 0
                return True
            return False
        
        def on_class_change(self, change):
            """Handle class selection change"""
            if self.load_class_images(change['new'], self.split_dropdown.value):
                self.show_current_image()
        
        def on_split_change(self, change):
            """Handle split selection change"""
            if self.current_class and self.load_class_images(self.current_class, change['new']):
                self.show_current_image()
        
        def show_previous(self, button):
            """Show previous image"""
            if self.current_images:
                self.current_index = (self.current_index - 1) % len(self.current_images)
                self.show_current_image()
        
        def show_next(self, button):
            """Show next image"""
            if self.current_images:
                self.current_index = (self.current_index + 1) % len(self.current_images)
                self.show_current_image()
        
        def show_random(self, button):
            """Show random image"""
            if self.current_images:
                self.current_index = random.randint(0, len(self.current_images) - 1)
                self.show_current_image()
        
        def show_current_image(self):
            """Display current image with info"""
            with self.image_output:
                self.image_output.clear_output(wait=True)
                
                if not self.current_images:
                    print("No images found for this class")
                    return
                
                try:
                    data_dir = TRAIN_DIR if self.split_dropdown.value == 'train' else VAL_DIR
                    img_path = os.path.join(data_dir, self.current_class, self.current_images[self.current_index])
                    
                    img = Image.open(img_path)
                    
                    # Display image
                    plt.figure(figsize=(10, 8))
                    plt.imshow(img)
                    plt.axis('off')
                    plt.title(f"{self.current_images[self.current_index]}", fontsize=14, fontweight='bold')
                    plt.tight_layout()
                    plt.show()
                    
                    # Update info
                    file_size = os.path.getsize(img_path) / (1024*1024)
                    info_html = f'''
                    <div style="font-family: Arial, sans-serif; background-color: #f0f0f0; padding: 10px; border-radius: 5px;">
                        <h4>Image Information</h4>
                        <p><strong>Class:</strong> {self.current_class}</p>
                        <p><strong>Filename:</strong> {self.current_images[self.current_index]}</p>
                        <p><strong>Image {self.current_index + 1} of {len(self.current_images)}</strong></p>
                        <p><strong>Dimensions:</strong> {img.size[0]} × {img.size[1]} pixels</p>
                        <p><strong>Format:</strong> {img.format}</p>
                        <p><strong>Mode:</strong> {img.mode}</p>
                        <p><strong>File Size:</strong> {file_size:.2f} MB</p>
                    </div>
                    '''
                    self.image_info.value = info_html
                    
                except Exception as e:
                    print(f"Error loading image: {e}")
    
    # Create and display the explorer
    explorer = ImageNetExplorer()
    explorer_widget = explorer.create_explorer_widget()
    
    print("🎛️ Interactive ImageNet Explorer")
    print("Use the dropdown to select a class and navigate through images!")
    display(explorer_widget)
    
else:
    print("⚠️ Interactive widgets not available. Install ipywidgets to enable the interactive explorer.")
    print("Run: pip install ipywidgets")

In [None]:
# Class Distribution Visualization
def visualize_class_distribution(max_classes=50):
    """Visualize the distribution of images across classes"""
    
    print(f"📊 Analyzing class distribution (max {max_classes} classes)...")
    
    train_counts = {}
    val_counts = {}
    
    # Count images in training set
    if os.path.exists(TRAIN_DIR):
        train_classes = sorted(os.listdir(TRAIN_DIR))[:max_classes]
        for class_dir in train_classes:
            class_path = os.path.join(TRAIN_DIR, class_dir)
            if os.path.isdir(class_path):
                image_count = len([f for f in os.listdir(class_path) 
                                 if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))])
                train_counts[class_dir] = image_count
    
    # Count images in validation set
    if os.path.exists(VAL_DIR):
        val_classes = sorted(os.listdir(VAL_DIR))[:max_classes]
        for class_dir in val_classes:
            class_path = os.path.join(VAL_DIR, class_dir)
            if os.path.isdir(class_path):
                image_count = len([f for f in os.listdir(class_path) 
                                 if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))])
                val_counts[class_dir] = image_count
    
    # Create visualizations
    fig, axes = plt.subplots(2, 2, figsize=(20, 12))
    fig.suptitle('ImageNet Class Distribution Analysis', fontsize=16, fontweight='bold')
    
    # Training set bar chart
    if train_counts:
        classes = list(train_counts.keys())
        counts = list(train_counts.values())
        
        axes[0, 0].bar(range(len(classes)), counts, color='skyblue', alpha=0.7)
        axes[0, 0].set_title(f'Training Set - Images per Class (First {len(classes)} classes)')
        axes[0, 0].set_xlabel('Class Index')
        axes[0, 0].set_ylabel('Number of Images')
        axes[0, 0].tick_params(axis='x', rotation=45)
        
        # Training set histogram
        axes[0, 1].hist(counts, bins=20, color='skyblue', alpha=0.7, edgecolor='black')
        axes[0, 1].set_title('Training Set - Distribution of Image Counts')
        axes[0, 1].set_xlabel('Number of Images per Class')
        axes[0, 1].set_ylabel('Number of Classes')
    
    # Validation set bar chart
    if val_counts:
        classes = list(val_counts.keys())
        counts = list(val_counts.values())
        
        axes[1, 0].bar(range(len(classes)), counts, color='lightgreen', alpha=0.7)
        axes[1, 0].set_title(f'Validation Set - Images per Class (First {len(classes)} classes)')
        axes[1, 0].set_xlabel('Class Index')
        axes[1, 0].set_ylabel('Number of Images')
        axes[1, 0].tick_params(axis='x', rotation=45)
        
        # Validation set histogram
        axes[1, 1].hist(counts, bins=20, color='lightgreen', alpha=0.7, edgecolor='black')
        axes[1, 1].set_title('Validation Set - Distribution of Image Counts')
        axes[1, 1].set_xlabel('Number of Images per Class')
        axes[1, 1].set_ylabel('Number of Classes')
    
    plt.tight_layout()
    plt.show()
    
    # Print statistics
    if train_counts:
        train_values = list(train_counts.values())
        print(f"\n📈 Training Set Statistics ({len(train_counts)} classes):")
        print(f"   Total images: {sum(train_values):,}")
        print(f"   Min images per class: {min(train_values)}")
        print(f"   Max images per class: {max(train_values)}")
        print(f"   Average images per class: {np.mean(train_values):.1f}")
        print(f"   Std deviation: {np.std(train_values):.1f}")
    
    if val_counts:
        val_values = list(val_counts.values())
        print(f"\n📈 Validation Set Statistics ({len(val_counts)} classes):")
        print(f"   Total images: {sum(val_values):,}")
        print(f"   Min images per class: {min(val_values)}")
        print(f"   Max images per class: {max(val_values)}")
        print(f"   Average images per class: {np.mean(val_values):.1f}")
        print(f"   Std deviation: {np.std(val_values):.1f}")

# Run the visualization
visualize_class_distribution(max_classes=30)

In [None]:
# Image Statistics and Properties
def analyze_image_properties(class_name, max_images=100, split='train'):
    """Analyze color and pixel properties of images"""
    
    data_dir = TRAIN_DIR if split == 'train' else VAL_DIR
    class_path = os.path.join(data_dir, class_name)
    
    if not os.path.exists(class_path):
        print(f"❌ Class '{class_name}' not found")
        return
    
    image_files = [f for f in os.listdir(class_path) 
                   if f.lower().endswith(('.jpg', '.jpeg', '.png', '.bmp'))]
    
    sample_files = random.sample(image_files, min(max_images, len(image_files)))
    
    print(f"🎨 Analyzing color properties for {len(sample_files)} images from class '{class_name}'...")
    
    # Collect color statistics
    red_means = []
    green_means = []
    blue_means = []
    brightness_values = []
    contrast_values = []
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle(f'Image Properties Analysis - Class: {class_name}', fontsize=16, fontweight='bold')
    
    # Sample a few images for detailed analysis
    sample_for_hist = random.sample(sample_files, min(5, len(sample_files)))
    combined_red = []
    combined_green = []
    combined_blue = []
    
    for img_file in sample_files:
        try:
            img_path = os.path.join(class_path, img_file)
            img = Image.open(img_path)
            
            # Convert to RGB if necessary
            if img.mode != 'RGB':
                img = img.convert('RGB')
            
            # Convert to numpy array
            img_array = np.array(img)
            
            # Calculate channel means
            red_mean = np.mean(img_array[:, :, 0])
            green_mean = np.mean(img_array[:, :, 1])
            blue_mean = np.mean(img_array[:, :, 2])
            
            red_means.append(red_mean)
            green_means.append(green_mean)
            blue_means.append(blue_mean)
            
            # Calculate brightness (luminance)
            brightness = 0.299 * red_mean + 0.587 * green_mean + 0.114 * blue_mean
            brightness_values.append(brightness)
            
            # Calculate contrast (standard deviation of grayscale)
            gray = 0.299 * img_array[:, :, 0] + 0.587 * img_array[:, :, 1] + 0.114 * img_array[:, :, 2]
            contrast = np.std(gray)
            contrast_values.append(contrast)
            
            # Collect pixel values for histogram (sample images)
            if img_file in sample_for_hist:
                combined_red.extend(img_array[:, :, 0].flatten())
                combined_green.extend(img_array[:, :, 1].flatten())
                combined_blue.extend(img_array[:, :, 2].flatten())
        
        except Exception as e:
            print(f"⚠️ Error processing {img_file}: {e}")
    
    # Plot RGB channel means
    axes[0, 0].hist([red_means, green_means, blue_means], bins=20, alpha=0.7, 
                   label=['Red', 'Green', 'Blue'], color=['red', 'green', 'blue'])
    axes[0, 0].set_title('Average RGB Channel Values')
    axes[0, 0].set_xlabel('Average Pixel Value')
    axes[0, 0].set_ylabel('Frequency')
    axes[0, 0].legend()
    
    # Plot brightness distribution
    axes[0, 1].hist(brightness_values, bins=20, alpha=0.7, color='gold', edgecolor='black')
    axes[0, 1].set_title('Brightness Distribution')
    axes[0, 1].set_xlabel('Brightness')
    axes[0, 1].set_ylabel('Frequency')
    
    # Plot contrast distribution
    axes[0, 2].hist(contrast_values, bins=20, alpha=0.7, color='purple', edgecolor='black')
    axes[0, 2].set_title('Contrast Distribution')
    axes[0, 2].set_xlabel('Contrast (Std Dev)')
    axes[0, 2].set_ylabel('Frequency')
    
    # Plot combined pixel histograms
    if combined_red:
        axes[1, 0].hist(combined_red, bins=50, alpha=0.7, color='red', edgecolor='none')
        axes[1, 0].set_title('Combined Red Channel Histogram')
        axes[1, 0].set_xlabel('Pixel Value')
        axes[1, 0].set_ylabel('Frequency')
        
        axes[1, 1].hist(combined_green, bins=50, alpha=0.7, color='green', edgecolor='none')
        axes[1, 1].set_title('Combined Green Channel Histogram')
        axes[1, 1].set_xlabel('Pixel Value')
        axes[1, 1].set_ylabel('Frequency')
        
        axes[1, 2].hist(combined_blue, bins=50, alpha=0.7, color='blue', edgecolor='none')
        axes[1, 2].set_title('Combined Blue Channel Histogram')
        axes[1, 2].set_xlabel('Pixel Value')
        axes[1, 2].set_ylabel('Frequency')
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print(f"\n🎨 Color Properties Summary for class '{class_name}':")
    print(f"🔴 Red channel: mean={np.mean(red_means):.1f}, std={np.std(red_means):.1f}")
    print(f"🟢 Green channel: mean={np.mean(green_means):.1f}, std={np.std(green_means):.1f}")
    print(f"🔵 Blue channel: mean={np.mean(blue_means):.1f}, std={np.std(blue_means):.1f}")
    print(f"💡 Brightness: mean={np.mean(brightness_values):.1f}, std={np.std(brightness_values):.1f}")
    print(f"🌈 Contrast: mean={np.mean(contrast_values):.1f}, std={np.std(contrast_values):.1f}")

# Example: Analyze properties for the first available class
if class_to_idx:
    first_class = list(class_to_idx.keys())[0]
    analyze_image_properties(first_class, max_images=50)
else:
    print("⚠️ No classes available for property analysis.")

## 🛠️ Utility Functions

The notebook provides several utility functions you can use for further exploration:

### Available Functions:
- `display_sample_images(class_name, num_images, split)` - Display sample images from a class
- `analyze_image_metadata(class_name, max_images, split)` - Analyze image file properties
- `analyze_image_properties(class_name, max_images, split)` - Analyze color and pixel properties
- `visualize_class_distribution(max_classes)` - Show distribution across classes

### Interactive Explorer:
If ipywidgets is available, use the interactive explorer above to browse images by class with:
- Class selection dropdown
- Train/validation split selection  
- Navigation buttons (Previous, Next, Random)
- Detailed image information display

### Tips for Usage:
1. **Update the dataset path** in the second cell to point to your ImageNet location
2. **Start with small samples** when analyzing many images to avoid long processing times
3. **Use the interactive explorer** for quick browsing and inspection
4. **Analyze metadata first** to understand file formats and sizes before diving into pixel analysis
5. **Compare classes** by running the same analysis on different class names

### Next Steps:
- Identify classes with unusual properties (size, color distribution, etc.)
- Compare training vs validation set characteristics
- Use insights for data preprocessing decisions
- Select representative classes for model testing