# 03 debugging profiling
**Location: TensorVerseHub/notebooks/01_tensorflow_foundations/03_debugging_profiling.ipynb**

TODO: Implement comprehensive TensorFlow + tf.keras learning content.

## Learning Objectives
- TODO: Define specific learning objectives
- TODO: List key TensorFlow concepts covered
- TODO: Outline tf.keras integration points

In [None]:
import tensorflow as tf
import numpy as np
print(f"TensorFlow version: {tf.__version__}")
# TODO: Add comprehensive implementation

# TensorFlow Debugging & Profiling

**File Location:** `notebooks/01_tensorflow_foundations/03_debugging_profiling.ipynb`

Master TensorFlow debugging techniques, profiling tools, and reproducibility practices. Learn to identify bottlenecks, debug complex models, and ensure consistent results across different environments using TensorBoard, tf.keras callbacks, and advanced debugging utilities.

## Learning Objectives
- Master TensorBoard for visualization and debugging
- Implement tf.keras callbacks for training monitoring
- Ensure reproducibility across different environments
- Profile and optimize TensorFlow operations
- Debug complex neural network architectures
- Handle common TensorFlow errors and performance issues

---

## 1. TensorBoard Integration and Visualization

```python
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import os
import datetime
import tempfile
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

print(f"TensorFlow version: {tf.__version__}")

# Setup logging directory
log_dir = os.path.join("logs", "tensorboard", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
os.makedirs(log_dir, exist_ok=True)
print(f"TensorBoard logs: {log_dir}")
```

```python
# Basic TensorBoard logging with tf.summary
def create_sample_data():
    """Create sample data for demonstration"""
    X, y = make_classification(n_samples=1000, n_features=20, n_classes=3, 
                             n_redundant=0, random_state=42)
    return train_test_split(X.astype(np.float32), y, test_size=0.2, random_state=42)

# Create data
X_train, X_test, y_train, y_test = create_sample_data()

# Custom training loop with TensorBoard logging
def train_with_tensorboard_logging():
    """Demonstrate custom training loop with detailed logging"""
    
    # Create model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(3, activation='softmax')
    ])
    
    # Setup optimizer and loss
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
    
    # Metrics
    train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    val_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    
    # Create datasets
    train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).batch(32)
    val_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(32)
    
    # TensorBoard writers
    train_log_dir = os.path.join(log_dir, 'train')
    val_log_dir = os.path.join(log_dir, 'validation')
    train_writer = tf.summary.create_file_writer(train_log_dir)
    val_writer = tf.summary.create_file_writer(val_log_dir)
    
    # Training loop with detailed logging
    epochs = 20
    
    for epoch in range(epochs):
        print(f"Epoch {epoch + 1}/{epochs}")
        
        # Training phase
        train_accuracy.reset_states()
        train_loss_avg = tf.keras.metrics.Mean()
        
        for batch_idx, (x_batch, y_batch) in enumerate(train_ds):
            with tf.GradientTape() as tape:
                predictions = model(x_batch, training=True)
                loss = loss_fn(y_batch, predictions)
            
            # Compute gradients and update weights
            gradients = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(gradients, model.trainable_variables))
            
            # Update metrics
            train_accuracy.update_state(y_batch, predictions)
            train_loss_avg.update_state(loss)
            
            # Log detailed metrics every 10 batches
            if batch_idx % 10 == 0:
                with train_writer.as_default():
                    tf.summary.scalar('batch_loss', loss, step=epoch * len(train_ds) + batch_idx)
                    tf.summary.scalar('batch_accuracy', train_accuracy.result(), 
                                    step=epoch * len(train_ds) + batch_idx)
                    
                    # Log gradient norms
                    for i, grad in enumerate(gradients):
                        if grad is not None:
                            grad_norm = tf.norm(grad)
                            tf.summary.scalar(f'gradient_norm_layer_{i}', grad_norm, 
                                            step=epoch * len(train_ds) + batch_idx)
        
        # Validation phase
        val_accuracy.reset_states()
        val_loss_avg = tf.keras.metrics.Mean()
        
        for x_batch, y_batch in val_ds:
            predictions = model(x_batch, training=False)
            loss = loss_fn(y_batch, predictions)
            val_accuracy.update_state(y_batch, predictions)
            val_loss_avg.update_state(loss)
        
        # Log epoch metrics
        with train_writer.as_default():
            tf.summary.scalar('epoch_loss', train_loss_avg.result(), step=epoch)
            tf.summary.scalar('epoch_accuracy', train_accuracy.result(), step=epoch)
            tf.summary.scalar('learning_rate', optimizer.learning_rate, step=epoch)
        
        with val_writer.as_default():
            tf.summary.scalar('epoch_loss', val_loss_avg.result(), step=epoch)
            tf.summary.scalar('epoch_accuracy', val_accuracy.result(), step=epoch)
        
        # Log model weights distribution
        if epoch % 5 == 0:
            with train_writer.as_default():
                for layer in model.layers:
                    if hasattr(layer, 'kernel'):
                        tf.summary.histogram(f'{layer.name}/weights', layer.kernel, step=epoch)
                    if hasattr(layer, 'bias') and layer.bias is not None:
                        tf.summary.histogram(f'{layer.name}/bias', layer.bias, step=epoch)
        
        print(f"  Train Acc: {train_accuracy.result():.4f}, Val Acc: {val_accuracy.result():.4f}")
    
    return model

# Run training with logging
trained_model = train_with_tensorboard_logging()

print(f"\nTraining complete! View logs with:")
print(f"tensorboard --logdir {log_dir}")
```

```python
# Advanced TensorBoard features
def advanced_tensorboard_logging():
    """Demonstrate advanced TensorBoard features"""
    
    # Create a more complex model for demonstration
    def create_complex_model():
        inputs = tf.keras.layers.Input(shape=(20,))
        
        # Branch 1
        x1 = tf.keras.layers.Dense(32, activation='relu', name='branch1_dense1')(inputs)
        x1 = tf.keras.layers.Dropout(0.3, name='branch1_dropout')(x1)
        x1 = tf.keras.layers.Dense(16, activation='relu', name='branch1_dense2')(x1)
        
        # Branch 2
        x2 = tf.keras.layers.Dense(24, activation='relu', name='branch2_dense1')(inputs)
        x2 = tf.keras.layers.Dropout(0.2, name='branch2_dropout')(x2)
        x2 = tf.keras.layers.Dense(12, activation='relu', name='branch2_dense2')(x2)
        
        # Combine branches
        combined = tf.keras.layers.Concatenate(name='combine_branches')([x1, x2])
        outputs = tf.keras.layers.Dense(3, activation='softmax', name='output')(combined)
        
        return tf.keras.Model(inputs=inputs, outputs=outputs, name='complex_model')
    
    model = create_complex_model()
    
    # Log model architecture
    with tf.summary.create_file_writer(os.path.join(log_dir, 'model')).as_default():
        tf.summary.graph(model.call.get_concrete_function(
            tf.TensorSpec(shape=[None, 20], dtype=tf.float32)
        ).graph)
    
    # Log model structure
    model.summary()
    
    # Create confusion matrix logging
    def log_confusion_matrix(epoch, y_true, y_pred, class_names):
        """Log confusion matrix to TensorBoard"""
        from sklearn.metrics import confusion_matrix
        import seaborn as sns
        
        cm = confusion_matrix(y_true, y_pred)
        
        plt.figure(figsize=(8, 6))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                   xticklabels=class_names, yticklabels=class_names)
        plt.title(f'Confusion Matrix - Epoch {epoch}')
        plt.ylabel('True Label')
        plt.xlabel('Predicted Label')
        
        # Convert plot to image and log to TensorBoard
        buf = io.BytesIO()
        plt.savefig(buf, format='png')
        buf.seek(0)
        image = tf.image.decode_png(buf.getvalue(), channels=4)
        image = tf.expand_dims(image, 0)
        
        with tf.summary.create_file_writer(os.path.join(log_dir, 'images')).as_default():
            tf.summary.image("Confusion Matrix", image, step=epoch)
        
        plt.close()
    
    # Example of logging custom images and text
    with tf.summary.create_file_writer(os.path.join(log_dir, 'custom')).as_default():
        # Log text summary
        tf.summary.text("model_info", f"Model has {model.count_params()} parameters", step=0)
        
        # Log custom scalar with metadata
        tf.summary.scalar("custom_metric", 0.85, step=0, 
                         description="Custom performance metric")

advanced_tensorboard_logging()
print("Advanced TensorBoard logging complete!")
```

## 2. tf.keras Callbacks for Monitoring

```python
# Comprehensive callback system
class CustomCallbacks:
    """Collection of custom callbacks for training monitoring"""
    
    @staticmethod
    def create_standard_callbacks(log_dir, patience=5):
        """Create standard set of callbacks"""
        
        callbacks = [
            # Early stopping
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=patience,
                restore_best_weights=True,
                verbose=1
            ),
            
            # Learning rate reduction
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.2,
                patience=3,
                min_lr=1e-7,
                verbose=1
            ),
            
            # Model checkpointing
            tf.keras.callbacks.ModelCheckpoint(
                filepath=os.path.join(log_dir, 'best_model.h5'),
                monitor='val_accuracy',
                save_best_only=True,
                verbose=1
            ),
            
            # TensorBoard logging
            tf.keras.callbacks.TensorBoard(
                log_dir=log_dir,
                histogram_freq=1,
                write_graph=True,
                write_images=True,
                update_freq='epoch'
            ),
            
            # CSV logging
            tf.keras.callbacks.CSVLogger(
                filename=os.path.join(log_dir, 'training_log.csv'),
                separator=',',
                append=False
            )
        ]
        
        return callbacks
    
    @staticmethod
    def create_custom_monitoring_callback():
        """Create custom callback for detailed monitoring"""
        
        class DetailedMonitoringCallback(tf.keras.callbacks.Callback):
            def __init__(self):
                super().__init__()
                self.epoch_times = []
                self.batch_times = []
                
            def on_epoch_begin(self, epoch, logs=None):
                self.epoch_start_time = time.time()
                print(f"\n--- Starting Epoch {epoch + 1} ---")
                
            def on_epoch_end(self, epoch, logs=None):
                epoch_time = time.time() - self.epoch_start_time
                self.epoch_times.append(epoch_time)
                
                print(f"Epoch {epoch + 1} completed in {epoch_time:.2f} seconds")
                print(f"Train Loss: {logs.get('loss', 0):.4f}, Train Acc: {logs.get('accuracy', 0):.4f}")
                print(f"Val Loss: {logs.get('val_loss', 0):.4f}, Val Acc: {logs.get('val_accuracy', 0):.4f}")
                
                # Log to TensorBoard
                with tf.summary.create_file_writer(os.path.join(log_dir, 'timing')).as_default():
                    tf.summary.scalar('epoch_duration', epoch_time, step=epoch)
                    tf.summary.scalar('avg_epoch_duration', np.mean(self.epoch_times), step=epoch)
                
            def on_batch_begin(self, batch, logs=None):
                self.batch_start_time = time.time()
                
            def on_batch_end(self, batch, logs=None):
                batch_time = time.time() - self.batch_start_time
                self.batch_times.append(batch_time)
                
                if batch % 20 == 0:  # Log every 20 batches
                    print(f"  Batch {batch}: loss={logs.get('loss', 0):.4f}, "
                          f"time={batch_time:.3f}s")
            
            def on_train_end(self, logs=None):
                print(f"\nTraining Summary:")
                print(f"Total epochs: {len(self.epoch_times)}")
                print(f"Average epoch time: {np.mean(self.epoch_times):.2f} ¬± {np.std(self.epoch_times):.2f} seconds")
                print(f"Total training time: {sum(self.epoch_times):.2f} seconds")
        
        return DetailedMonitoringCallback()

# Custom learning rate scheduler callback
class CustomLRScheduler(tf.keras.callbacks.Callback):
    """Custom learning rate scheduler with warm-up and decay"""
    
    def __init__(self, warmup_epochs=5, decay_epochs=10, initial_lr=1e-3):
        super().__init__()
        self.warmup_epochs = warmup_epochs
        self.decay_epochs = decay_epochs
        self.initial_lr = initial_lr
        
    def on_epoch_begin(self, epoch, logs=None):
        if epoch < self.warmup_epochs:
            # Warm-up phase
            lr = self.initial_lr * (epoch + 1) / self.warmup_epochs
        elif epoch < self.warmup_epochs + self.decay_epochs:
            # Decay phase
            decay_epoch = epoch - self.warmup_epochs
            lr = self.initial_lr * (0.5 ** (decay_epoch / 5))
        else:
            # Stable phase
            lr = self.initial_lr * 0.1
        
        tf.keras.backend.set_value(self.model.optimizer.learning_rate, lr)
        print(f"Epoch {epoch + 1}: Learning rate set to {lr:.6f}")

# Gradient monitoring callback
class GradientMonitoringCallback(tf.keras.callbacks.Callback):
    """Monitor gradient statistics during training"""
    
    def __init__(self, log_dir, log_freq=5):
        super().__init__()
        self.log_dir = log_dir
        self.log_freq = log_freq
        self.writer = tf.summary.create_file_writer(os.path.join(log_dir, 'gradients'))
    
    def on_batch_end(self, batch, logs=None):
        if batch % self.log_freq == 0:
            # Get gradients (this is simplified - in practice you'd need to modify training loop)
            with self.writer.as_default():
                for i, layer in enumerate(self.model.layers):
                    if hasattr(layer, 'kernel') and layer.kernel is not None:
                        weights = layer.kernel
                        tf.summary.scalar(f'layer_{i}/weight_mean', tf.reduce_mean(weights), step=batch)
                        tf.summary.scalar(f'layer_{i}/weight_std', tf.math.reduce_std(weights), step=batch)
                        tf.summary.histogram(f'layer_{i}/weights', weights, step=batch)

# Demonstrate callback usage
def train_with_callbacks():
    """Train model with comprehensive callback monitoring"""
    
    # Create model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(128, activation='relu', input_shape=(20,)),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.4),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(3, activation='softmax')
    ])
    
    # Compile model
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Setup callbacks
    callback_log_dir = os.path.join(log_dir, 'callbacks')
    os.makedirs(callback_log_dir, exist_ok=True)
    
    callbacks = CustomCallbacks.create_standard_callbacks(callback_log_dir)
    callbacks.extend([
        CustomCallbacks.create_custom_monitoring_callback(),
        CustomLRScheduler(warmup_epochs=3, decay_epochs=7),
        GradientMonitoringCallback(callback_log_dir)
    ])
    
    # Train model
    history = model.fit(
        X_train, y_train,
        validation_data=(X_test, y_test),
        epochs=15,
        batch_size=32,
        callbacks=callbacks,
        verbose=0  # Custom callback handles verbose output
    )
    
    return model, history

# Train with callbacks
callback_model, callback_history = train_with_callbacks()
print("Training with callbacks completed!")
```

## 3. Reproducibility and Debugging

```python
# Comprehensive reproducibility setup
def setup_reproducibility(seed=42):
    """Setup reproducible environment"""
    
    # Set random seeds
    np.random.seed(seed)
    tf.random.set_seed(seed)
    
    # For Python's random module
    import random
    random.seed(seed)
    
    # Configure TensorFlow for reproducibility
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    
    # Configure GPU memory growth (if available)
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        try:
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
                # Set deterministic operations
                tf.config.experimental.enable_op_determinism()
        except RuntimeError as e:
            print(f"GPU configuration warning: {e}")
    
    print(f"Reproducibility setup complete with seed: {seed}")

# Environment debugging utilities
def debug_environment():
    """Comprehensive environment debugging information"""
    
    print("=== TensorFlow Environment Debug Info ===")
    print(f"TensorFlow version: {tf.__version__}")
    print(f"Keras version: {tf.keras.__version__}")
    print(f"Python executable: {sys.executable}")
    
    # Device information
    print(f"\nDevice Information:")
    print(f"GPUs available: {len(tf.config.list_physical_devices('GPU'))}")
    print(f"CPUs available: {len(tf.config.list_physical_devices('CPU'))}")
    
    for device in tf.config.list_physical_devices():
        print(f"  {device}")
    
    # Memory information
    if tf.config.list_physical_devices('GPU'):
        print(f"\nGPU Memory Info:")
        for gpu in tf.config.list_physical_devices('GPU'):
            try:
                memory_info = tf.config.experimental.get_memory_info(gpu.name.replace('/physical_device:', ''))
                print(f"  {gpu}: {memory_info}")
            except:
                print(f"  {gpu}: Memory info not available")
    
    # Environment variables
    important_vars = ['TF_CPP_MIN_LOG_LEVEL', 'TF_DETERMINISTIC_OPS', 'CUDA_VISIBLE_DEVICES']
    print(f"\nImportant Environment Variables:")
    for var in important_vars:
        value = os.environ.get(var, 'Not set')
        print(f"  {var}: {value}")
    
    # Test basic operations
    print(f"\nBasic Operation Tests:")
    try:
        # CPU test
        with tf.device('/CPU:0'):
            cpu_test = tf.reduce_sum(tf.random.normal([100, 100]))
            print(f"  CPU operation: {cpu_test.numpy():.4f}")
        
        # GPU test (if available)
        if tf.config.list_physical_devices('GPU'):
            with tf.device('/GPU:0'):
                gpu_test = tf.reduce_sum(tf.random.normal([100, 100]))
                print(f"  GPU operation: {gpu_test.numpy():.4f}")
    except Exception as e:
        print(f"  Operation test failed: {e}")

# Setup reproducibility and debug environment
setup_reproducibility(42)
debug_environment()
```

```python
# Advanced debugging utilities
class ModelDebugger:
    """Comprehensive model debugging utilities"""
    
    def __init__(self, model):
        self.model = model
        self.layer_outputs = {}
        
    def create_layer_output_model(self):
        """Create model that outputs all intermediate layer results"""
        
        layer_outputs = []
        layer_names = []
        
        for layer in self.model.layers:
            if len(layer.output.shape) > 0:  # Skip layers without output
                layer_outputs.append(layer.output)
                layer_names.append(layer.name)
        
        debug_model = tf.keras.Model(
            inputs=self.model.input,
            outputs=layer_outputs
        )
        
        return debug_model, layer_names
    
    def analyze_layer_outputs(self, X_sample):
        """Analyze outputs from each layer"""
        
        debug_model, layer_names = self.create_layer_output_model()
        layer_outputs = debug_model(X_sample)
        
        print("Layer Output Analysis:")
        print("-" * 50)
        
        for i, (output, name) in enumerate(zip(layer_outputs, layer_names)):
            output_stats = {
                'shape': output.shape,
                'mean': tf.reduce_mean(output).numpy(),
                'std': tf.math.reduce_std(output).numpy(),
                'min': tf.reduce_min(output).numpy(),
                'max': tf.reduce_max(output).numpy(),
                'zeros': tf.reduce_sum(tf.cast(tf.equal(output, 0), tf.float32)).numpy(),
                'infs': tf.reduce_sum(tf.cast(tf.math.is_inf(output), tf.float32)).numpy(),
                'nans': tf.reduce_sum(tf.cast(tf.math.is_nan(output), tf.float32)).numpy()
            }
            
            print(f"Layer {i}: {name}")
            print(f"  Shape: {output_stats['shape']}")
            print(f"  Stats: mean={output_stats['mean']:.4f}, std={output_stats['std']:.4f}")
            print(f"  Range: [{output_stats['min']:.4f}, {output_stats['max']:.4f}]")
            
            # Warning for potential issues
            if output_stats['zeros'] > output.size * 0.5:
                print(f"  ‚ö†Ô∏è  Warning: {output_stats['zeros']}/{output.size} outputs are zero (dead neurons?)")
            if output_stats['infs'] > 0:
                print(f"  ‚ùå Error: {output_stats['infs']} infinite values detected")
            if output_stats['nans'] > 0:
                print(f"  ‚ùå Error: {output_stats['nans']} NaN values detected")
            
            print()
    
    def check_gradient_flow(self, X_sample, y_sample):
        """Check gradient flow through the model"""
        
        with tf.GradientTape(persistent=True) as tape:
            predictions = self.model(X_sample, training=True)
            loss = tf.keras.losses.sparse_categorical_crossentropy(y_sample, predictions)
            loss = tf.reduce_mean(loss)
        
        print("Gradient Flow Analysis:")
        print("-" * 50)
        
        for i, layer in enumerate(self.model.layers):
            if hasattr(layer, 'trainable_variables') and layer.trainable_variables:
                for j, var in enumerate(layer.trainable_variables):
                    grad = tape.gradient(loss, var)
                    if grad is not None:
                        grad_norm = tf.norm(grad).numpy()
                        grad_mean = tf.reduce_mean(tf.abs(grad)).numpy()
                        
                        print(f"Layer {i} ({layer.name}) - Variable {j}:")
                        print(f"  Gradient norm: {grad_norm:.6f}")
                        print(f"  Gradient mean abs: {grad_mean:.6f}")
                        
                        if grad_norm < 1e-7:
                            print(f"  ‚ö†Ô∏è  Warning: Very small gradients (vanishing gradient?)")
                        elif grad_norm > 10:
                            print(f"  ‚ö†Ô∏è  Warning: Large gradients (exploding gradient?)")
                    else:
                        print(f"Layer {i} ({layer.name}) - Variable {j}: No gradient")
        
        del tape  # Clean up persistent tape

# Debugging utilities in action
def demonstrate_debugging():
    """Demonstrate comprehensive model debugging"""
    
    # Create a potentially problematic model
    problematic_model = tf.keras.Sequential([
        tf.keras.layers.Dense(1000, activation='relu', input_shape=(20,)),
        tf.keras.layers.Dense(1000, activation='relu'),
        tf.keras.layers.Dense(1000, activation='relu'),  # Very deep
        tf.keras.layers.Dense(3, activation='softmax')
    ])
    
    problematic_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-1),  # High learning rate
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Create debugger
    debugger = ModelDebugger(problematic_model)
    
    # Sample data for debugging
    X_debug = X_train[:10]
    y_debug = y_train[:10]
    
    print("=== Model Architecture Debug ===")
    problematic_model.summary()
    
    print("\n=== Layer Output Analysis ===")
    debugger.analyze_layer_outputs(X_debug)
    
    print("\n=== Gradient Flow Analysis ===")
    debugger.check_gradient_flow(X_debug, y_debug)

demonstrate_debugging()
```

## 4. Performance Profiling

```python
# TensorFlow Profiler integration
def profile_model_training():
    """Profile model training with TensorFlow Profiler"""
    
    # Create model and data
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(256, activation='relu', input_shape=(20,)),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(3, activation='softmax')
    ])
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    # Create dataset
    dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
    dataset = dataset.batch(64).prefetch(tf.data.AUTOTUNE)
    
    # Profile training
    profile_dir = os.path.join(log_dir, 'profiler')
    os.makedirs(profile_dir, exist_ok=True)
    
    # Start profiling
    tf.profiler.experimental.start(profile_dir)
    
    # Run training steps
    print("Starting profiled training...")
    for step, (x_batch, y_batch) in enumerate(dataset.take(50)):
        with tf.GradientTape() as tape:
            predictions = model(x_batch, training=True)
            loss = tf.keras.losses.sparse_categorical_crossentropy(y_batch, predictions)
            loss = tf.reduce_mean(loss)
        
        gradients = tape.gradient(loss, model.trainable_variables)
        model.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        
        if step % 10 == 0:
            print(f"Step {step}, Loss: {loss.numpy():.4f}")
    
    # Stop profiling
    tf.profiler.experimental.stop()
    
    print(f"Profiling complete! View with:")
    print(f"tensorboard --logdir {profile_dir}")

# Custom performance benchmarking
class PerformanceBenchmark:
    """Custom performance benchmarking utilities"""
    
    def __init__(self):
        self.results = {}
    
    def benchmark_operation(self, operation_fn, inputs, num_runs=100, warmup_runs=10):
        """Benchmark a specific operation"""
        
        # Warmup
        for _ in range(warmup_runs):
            _ = operation_fn(inputs)
        
        # Actual timing
        times = []
        for _ in range(num_runs):
            start_time = time.time()
            result = operation_fn(inputs)
            if hasattr(result, 'numpy'):
                _ = result.numpy()  # Ensure computation is complete
            end_time = time.time()
            times.append(end_time - start_time)
        
        return {
            'mean_time': np.mean(times),
            'std_time': np.std(times),
            'min_time': np.min(times),
            'max_time': np.max(times),
            'median_time': np.median(times)
        }
    
    def benchmark_model_inference(self, model, test_data, batch_sizes=[1, 8, 32, 64]):
        """Benchmark model inference at different batch sizes"""
        
        print("Model Inference Benchmarking:")
        print("-" * 50)
        
        results = {}
        
        for batch_size in batch_sizes:
            # Prepare batched data
            batched_data = []
            for i in range(0, len(test_data), batch_size):
                batch = test_data[i:i + batch_size]
                if len(batch) == batch_size:  # Only use full batches
                    batched_data.append(batch)
                if len(batched_data) >= 20:  # Limit number of batches
                    break
            
            # Benchmark
            def inference_fn(batch_data):
                total_predictions = 0
                for batch in batch_data:
                    predictions = model(batch, training=False)
                    total_predictions += predictions.shape[0]
                return total_predictions
            
            benchmark_result = self.benchmark_operation(inference_fn, batched_data, num_runs=10)
            
            # Calculate throughput
            total_samples = len(batched_data) * batch_size
            throughput = total_samples / benchmark_result['mean_time']
            
            results[batch_size] = {
                **benchmark_result,
                'throughput': throughput,
                'samples_per_batch': batch_size
            }
            
            print(f"Batch size {batch_size:2d}: {throughput:8.2f} samples/sec, "
                  f"latency: {benchmark_result['mean_time']*1000:6.2f} ¬± {benchmark_result['std_time']*1000:5.2f} ms")
        
        return results
    
    def profile_memory_usage(self, operation_fn, inputs):
        """Profile memory usage of an operation"""
        
        if tf.config.list_physical_devices('GPU'):
            # GPU memory profiling
            initial_memory = tf.config.experimental.get_memory_info('GPU:0')['current']
            
            result = operation_fn(inputs)
            
            peak_memory = tf.config.experimental.get_memory_info('GPU:0')['current']
            memory_used = peak_memory - initial_memory
            
            return {
                'gpu_memory_used_mb': memory_used / (1024**2),
                'gpu_initial_mb': initial_memory / (1024**2),
                'gpu_peak_mb': peak_memory / (1024**2)
            }
        else:
            # CPU memory profiling would require additional libraries like psutil
            return {'message': 'GPU not available for memory profiling'}

# Run performance profiling
print("=== Performance Profiling ===")

# Simple profiling example
benchmark = PerformanceBenchmark()

# Benchmark matrix operations
def matrix_op(size):
    a = tf.random.normal((size, size))
    b = tf.random.normal((size, size))
    return tf.matmul(a, b)

print("Matrix multiplication benchmarking:")
for size in [100, 500, 1000]:
    result = benchmark.benchmark_operation(lambda _: matrix_op(size), None, num_runs=10)
    print(f"Size {size:4d}: {result['mean_time']*1000:6.2f} ¬± {result['std_time']*1000:4.2f} ms")

# Benchmark model inference
if 'trained_model' in locals():
    inference_results = benchmark.benchmark_model_inference(trained_model, X_test)

# Profile TensorFlow operations
profile_model_training()
```

## 5. Common Debugging Scenarios

```python
# Common debugging scenarios and solutions
class DebuggingScenarios:
    """Collection of common debugging scenarios and solutions"""
    
    @staticmethod
    def debug_nan_gradients():
        """Debug and fix NaN gradients"""
        
        print("=== Debugging NaN Gradients ===")
        
        # Create model that might produce NaN gradients
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(10, input_shape=(5,)),
            tf.keras.layers.Dense(1, activation='sigmoid')
        ])
        
        # Use high learning rate that might cause issues
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=10.0),
                     loss='binary_crossentropy')
        
        # Generate data
        X = tf.random.normal((100, 5))
        y = tf.random.uniform((100, 1)) > 0.5
        y = tf.cast(y, tf.float32)
        
        # Training step that might produce NaN
        with tf.GradientTape() as tape:
            predictions = model(X)
            loss = tf.keras.losses.binary_crossentropy(y, predictions)
            loss = tf.reduce_mean(loss)
        
        gradients = tape.gradient(loss, model.trainable_variables)
        
        # Check for NaN gradients
        nan_detected = False
        for i, grad in enumerate(gradients):
            if grad is not None:
                nan_count = tf.reduce_sum(tf.cast(tf.math.is_nan(grad), tf.int32))
                if nan_count > 0:
                    print(f"NaN gradients detected in layer {i}: {nan_count.numpy()} NaN values")
                    nan_detected = True
                else:
                    grad_norm = tf.norm(grad)
                    print(f"Layer {i} gradient norm: {grad_norm.numpy():.6f}")
        
        if not nan_detected:
            print("No NaN gradients detected with current settings")
        
        # Solutions for NaN gradients
        print("\nSolutions for NaN gradients:")
        print("1. Reduce learning rate")
        print("2. Add gradient clipping")
        print("3. Use batch normalization")
        print("4. Check for inf/nan in input data")
        print("5. Use more stable loss functions")
    
    @staticmethod
    def debug_vanishing_gradients():
        """Debug vanishing gradients in deep networks"""
        
        print("\n=== Debugging Vanishing Gradients ===")
        
        # Create very deep network
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.Dense(50, input_shape=(10,)))
        
        for _ in range(20):  # 20 hidden layers
            model.add(tf.keras.layers.Dense(50, activation='sigmoid'))  # Sigmoid can cause vanishing gradients
        
        model.add(tf.keras.layers.Dense(1, activation='linear'))
        model.compile(optimizer='adam', loss='mse')
        
        # Generate data
        X = tf.random.normal((100, 10))
        y = tf.random.normal((100, 1))
        
        # Analyze gradient magnitudes
        with tf.GradientTape() as tape:
            predictions = model(X)
            loss = tf.keras.losses.mse(y, predictions)
            loss = tf.reduce_mean(loss)
        
        gradients = tape.gradient(loss, model.trainable_variables)
        
        print("Gradient analysis for deep network:")
        layer_idx = 0
        for i in range(0, len(gradients), 2):  # Every 2 gradients (weights and biases)
            if gradients[i] is not None:
                grad_norm = tf.norm(gradients[i])
                print(f"Layer {layer_idx:2d} gradient norm: {grad_norm.numpy():.8f}")
                layer_idx += 1
        
        print("\nSolutions for vanishing gradients:")
        print("1. Use ReLU or LeakyReLU activations instead of sigmoid/tanh")
        print("2. Use residual connections (ResNet)")
        print("3. Use batch normalization")
        print("4. Use gradient clipping")
        print("5. Use proper weight initialization (He, Xavier)")
    
    @staticmethod
    def debug_exploding_gradients():
        """Debug exploding gradients"""
        
        print("\n=== Debugging Exploding Gradients ===")
        
        # Create model with potential for exploding gradients
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(100, activation='relu', input_shape=(10,),
                                kernel_initializer='random_normal'),  # Poor initialization
            tf.keras.layers.Dense(100, activation='relu',
                                kernel_initializer='random_normal'),
            tf.keras.layers.Dense(1, activation='linear')
        ])
        
        # Use high learning rate
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1.0),
                     loss='mse')
        
        # Generate data
        X = tf.random.normal((50, 10)) * 10  # Large input values
        y = tf.random.normal((50, 1))
        
        # Check for exploding gradients
        with tf.GradientTape() as tape:
            predictions = model(X)
            loss = tf.keras.losses.mse(y, predictions)
            loss = tf.reduce_mean(loss)
        
        gradients = tape.gradient(loss, model.trainable_variables)
        
        max_grad_norm = 0
        total_grad_norm = 0
        
        for grad in gradients:
            if grad is not None:
                grad_norm = tf.norm(grad).numpy()
                max_grad_norm = max(max_grad_norm, grad_norm)
                total_grad_norm += grad_norm
        
        print(f"Maximum gradient norm: {max_grad_norm:.4f}")
        print(f"Total gradient norm: {total_grad_norm:.4f}")
        
        if max_grad_norm > 10:
            print("‚ö†Ô∏è  Large gradients detected - potential exploding gradient problem")
        
        print("\nSolutions for exploding gradients:")
        print("1. Gradient clipping")
        print("2. Lower learning rate")
        print("3. Better weight initialization")
        print("4. Batch normalization")
        print("5. L2 regularization")

# Run debugging scenarios
debugging_scenarios = DebuggingScenarios()
debugging_scenarios.debug_nan_gradients()
debugging_scenarios.debug_vanishing_gradients()
debugging_scenarios.debug_exploding_gradients()
```

## 6. Reproducibility Testing

```python
# Comprehensive reproducibility testing
class ReproducibilityTester:
    """Test and ensure model reproducibility"""
    
    def __init__(self, seed=42):
        self.seed = seed
        
    def test_deterministic_training(self, num_runs=3):
        """Test if training produces identical results across runs"""
        
        print("=== Reproducibility Testing ===")
        
        training_histories = []
        model_weights = []
        
        for run in range(num_runs):
            print(f"Run {run + 1}/{num_runs}")
            
            # Reset environment
            setup_reproducibility(self.seed)
            
            # Create and train model
            model = tf.keras.Sequential([
                tf.keras.layers.Dense(32, activation='relu', input_shape=(20,)),
                tf.keras.layers.Dense(16, activation='relu'),
                tf.keras.layers.Dense(3, activation='softmax')
            ])
            
            model.compile(
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy']
            )
            
            # Train for few epochs
            history = model.fit(X_train, y_train, epochs=5, verbose=0)
            
            # Store results
            training_histories.append(history.history)
            model_weights.append([w.numpy() for w in model.get_weights()])
        
        # Compare results
        self._compare_training_results(training_histories, model_weights)
        
    def _compare_training_results(self, histories, weights):
        """Compare training results across runs"""
        
        print("\nReproducibility Analysis:")
        print("-" * 40)
        
        # Compare training histories
        metric_names = list(histories[0].keys())
        
        for metric in metric_names:
            values_per_epoch = []
            for epoch in range(len(histories[0][metric])):
                epoch_values = [hist[metric][epoch] for hist in histories]
                values_per_epoch.append(epoch_values)
                
                # Check if all runs produced identical values
                if len(set(f"{v:.6f}" for v in epoch_values)) == 1:
                    status = "‚úÖ Identical"
                else:
                    max_diff = max(epoch_values) - min(epoch_values)
                    status = f"‚ùå Different (max diff: {max_diff:.6f})"
                
                print(f"{metric} epoch {epoch}: {status}")
        
        # Compare final model weights
        print(f"\nModel Weights Comparison:")
        all_weights_identical = True
        
        for layer_idx in range(len(weights[0])):
            layer_weights_identical = True
            
            for run_idx in range(1, len(weights)):
                if not np.allclose(weights[0][layer_idx], weights[run_idx][layer_idx], rtol=1e-6):
                    layer_weights_identical = False
                    all_weights_identical = False
            
            status = "‚úÖ Identical" if layer_weights_identical else "‚ùå Different"
            print(f"Layer {layer_idx} weights: {status}")
        
        if all_weights_identical:
            print(f"\nüéâ All runs produced identical results - Reproducibility confirmed!")
        else:
            print(f"\n‚ö†Ô∏è  Results differ between runs - Check reproducibility setup")
    
    def test_cross_platform_reproducibility(self):
        """Test reproducibility across different execution contexts"""
        
        print(f"\n=== Cross-Platform Reproducibility Test ===")
        
        # Test eager vs graph execution
        setup_reproducibility(self.seed)
        
        # Eager execution
        @tf.function
        def graph_computation(x):
            return tf.nn.relu(tf.matmul(x, tf.random.normal([10, 5], seed=self.seed)))
        
        def eager_computation(x):
            tf.random.set_seed(self.seed)
            return tf.nn.relu(tf.matmul(x, tf.random.normal([10, 5])))
        
        test_input = tf.random.normal([5, 10], seed=self.seed)
        
        # Compare results
        setup_reproducibility(self.seed)
        eager_result = eager_computation(test_input)
        
        setup_reproducibility(self.seed)
        graph_result = graph_computation(test_input)
        
        if np.allclose(eager_result.numpy(), graph_result.numpy()):
            print("‚úÖ Eager and graph execution produce identical results")
        else:
            print("‚ùå Eager and graph execution produce different results")
            print(f"Max difference: {np.max(np.abs(eager_result.numpy() - graph_result.numpy()))}")

# Run reproducibility tests
reproducibility_tester = ReproducibilityTester(seed=42)
reproducibility_tester.test_deterministic_training(num_runs=3)
reproducibility_tester.test_cross_platform_reproducibility()
```

---

## Summary

**File Location:** `notebooks/01_tensorflow_foundations/03_debugging_profiling.ipynb`

This comprehensive debugging and profiling notebook covered essential techniques for TensorFlow development:

### Key Debugging Skills Mastered:
1. **TensorBoard Integration**: Advanced logging, visualization, and monitoring
2. **Custom Callbacks**: Detailed training monitoring and automated interventions
3. **Environment Setup**: Reproducibility configuration and environment debugging
4. **Model Debugging**: Layer-by-layer analysis and gradient flow inspection
5. **Performance Profiling**: TensorFlow Profiler and custom benchmarking
6. **Common Issues**: NaN gradients, vanishing/exploding gradients solutions
7. **Reproducibility Testing**: Cross-run and cross-platform consistency verification

### Critical Debugging Tools:
- **TensorBoard**: Comprehensive visualization and profiling
- **Custom Callbacks**: Real-time monitoring and intervention
- **tf.GradientTape**: Gradient analysis and debugging
- **tf.profiler**: Performance bottleneck identification
- **Environment Configuration**: Reproducible research practices

### Production-Ready Practices:
- Comprehensive logging strategies
- Automated performance monitoring  
- Systematic debugging workflows
- Cross-platform reproducibility testing
- Performance optimization techniques

### Next Steps:
- Apply these debugging techniques with tf.keras models (Notebook 04)
- Use profiling insights for model optimization
- Implement monitoring in production deployments

These debugging and profiling skills are essential for developing robust, reproducible, and high-performance TensorFlow applications!