# 04 keras sequential functional apis
**Location: TensorVerseHub/notebooks/02_neural_networks_with_keras/04_keras_sequential_functional_apis.ipynb**

In [None]:
import tensorflow as tf
import numpy as np
print(f"TensorFlow version: {tf.__version__}")

# tf.keras Sequential & Functional APIs + Model Subclassing

**File Location:** `notebooks/02_neural_networks_with_keras/04_tf_keras_sequential_functional.ipynb`

Master the three ways to build models in tf.keras: Sequential API for simple architectures, Functional API for complex topologies, and Model Subclassing for maximum flexibility. Learn when to use each approach and build sophisticated neural network architectures.

## Learning Objectives
- Master tf.keras Sequential API for linear model architectures
- Build complex models using tf.keras Functional API  
- Implement custom models with Model Subclassing
- Compare approaches and choose the right one for each scenario
- Build multi-input, multi-output, and branched architectures
- Handle shared layers and model composition

---

## 1. Sequential API - Linear Stack Models

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

print(f"TensorFlow version: {tf.__version__}")
tf.random.set_seed(42)
np.random.seed(42)

# Create sample datasets for demonstrations
def create_datasets():
    """Create various datasets for different model types"""
    
    # Classification dataset
    X_class, y_class = make_classification(
        n_samples=1000, n_features=20, n_classes=3, 
        n_redundant=0, random_state=42
    )
    
    # Regression dataset  
    X_reg, y_reg = make_regression(
        n_samples=1000, n_features=15, noise=0.1, random_state=42
    )
    
    # Multi-label classification
    X_multi, y_multi = make_classification(
        n_samples=1000, n_features=25, n_classes=5, 
        n_redundant=0, random_state=42
    )
    y_multi = tf.keras.utils.to_categorical(y_multi, 5)
    
    return (X_class.astype(np.float32), y_class), \
           (X_reg.astype(np.float32), y_reg.astype(np.float32)), \
           (X_multi.astype(np.float32), y_multi.astype(np.float32))

# Create datasets
(X_class, y_class), (X_reg, y_reg), (X_multi, y_multi) = create_datasets()

# Split datasets
X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(
    X_class, y_class, test_size=0.2, random_state=42
)
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42  
)

In [None]:
# Basic Sequential models for different tasks
class SequentialModels:
    """Collection of Sequential model architectures"""
    
    @staticmethod
    def create_simple_classifier(input_dim, num_classes):
        """Simple feed-forward classifier"""
        
        model = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(input_dim,)),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(num_classes, activation='softmax')
        ], name='simple_classifier')
        
        return model
    
    @staticmethod
    def create_deep_classifier(input_dim, num_classes, depth=5):
        """Deep classifier with batch normalization"""
        
        model = tf.keras.Sequential(name='deep_classifier')
        model.add(tf.keras.layers.Input(shape=(input_dim,)))
        
        # First layer
        model.add(tf.keras.layers.Dense(256, activation='relu'))
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.Dropout(0.4))
        
        # Hidden layers
        for i in range(depth - 2):
            units = max(64, 256 // (2 ** i))
            model.add(tf.keras.layers.Dense(units, activation='relu'))
            model.add(tf.keras.layers.BatchNormalization())
            model.add(tf.keras.layers.Dropout(0.3))
        
        # Output layer
        model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
        
        return model
    
    @staticmethod
    def create_regressor(input_dim):
        """Regression model with regularization"""
        
        model = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(input_dim,)),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(64, activation='relu', 
                                kernel_regularizer=tf.keras.regularizers.l2(0.01)),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(32, activation='relu',
                                kernel_regularizer=tf.keras.regularizers.l2(0.01)),
            tf.keras.layers.Dense(1, activation='linear')
        ], name='regressor')
        
        return model
    
    @staticmethod
    def create_autoencoder(input_dim, encoding_dim=32):
        """Simple autoencoder using Sequential API"""
        
        # Encoder
        encoder = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(input_dim,)),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(encoding_dim, activation='relu')
        ], name='encoder')
        
        # Decoder
        decoder = tf.keras.Sequential([
            tf.keras.layers.Input(shape=(encoding_dim,)),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(input_dim, activation='sigmoid')
        ], name='decoder')
        
        # Full autoencoder
        autoencoder = tf.keras.Sequential([encoder, decoder], name='autoencoder')
        
        return autoencoder, encoder, decoder

# Build and train Sequential models
print("=== Sequential API Examples ===")

# Simple classifier
simple_model = SequentialModels.create_simple_classifier(X_class.shape[1], 3)
simple_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print("Simple Classifier Architecture:")
simple_model.summary()

# Train simple model
history_simple = simple_model.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=20, batch_size=32, verbose=0
)

print(f"Simple Model - Final Val Accuracy: {history_simple.history['val_accuracy'][-1]:.4f}")

# Deep classifier
deep_model = SequentialModels.create_deep_classifier(X_class.shape[1], 3, depth=6)
deep_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(f"\nDeep Model Parameters: {deep_model.count_params():,}")

# Regressor
reg_model = SequentialModels.create_regressor(X_reg.shape[1])
reg_model.compile(optimizer='adam', loss='mse', metrics=['mae'])

history_reg = reg_model.fit(
    X_reg_train, y_reg_train,
    validation_data=(X_reg_test, y_reg_test),
    epochs=30, batch_size=32, verbose=0
)

print(f"Regression Model - Final Val MAE: {history_reg.history['val_mae'][-1]:.4f}")

## 2. Functional API - Complex Architectures

In [None]:
# Advanced Functional API models
class FunctionalModels:
    """Collection of Functional API model architectures"""
    
    @staticmethod
    def create_multi_input_model():
        """Multi-input model for different data types"""
        
        # Define inputs
        numerical_input = tf.keras.layers.Input(shape=(10,), name='numerical_features')
        categorical_input = tf.keras.layers.Input(shape=(5,), name='categorical_features')
        text_input = tf.keras.layers.Input(shape=(100,), name='text_features')
        
        # Process numerical features
        numerical_branch = tf.keras.layers.Dense(64, activation='relu')(numerical_input)
        numerical_branch = tf.keras.layers.BatchNormalization()(numerical_branch)
        numerical_branch = tf.keras.layers.Dropout(0.3)(numerical_branch)
        numerical_branch = tf.keras.layers.Dense(32, activation='relu')(numerical_branch)
        
        # Process categorical features
        categorical_branch = tf.keras.layers.Dense(32, activation='relu')(categorical_input)
        categorical_branch = tf.keras.layers.Dropout(0.2)(categorical_branch)
        categorical_branch = tf.keras.layers.Dense(16, activation='relu')(categorical_branch)
        
        # Process text features
        text_branch = tf.keras.layers.Dense(128, activation='relu')(text_input)
        text_branch = tf.keras.layers.Dropout(0.4)(text_branch)
        text_branch = tf.keras.layers.Dense(64, activation='relu')(text_branch)
        text_branch = tf.keras.layers.Dense(32, activation='relu')(text_branch)
        
        # Combine all branches
        combined = tf.keras.layers.Concatenate()([
            numerical_branch, categorical_branch, text_branch
        ])
        
        # Final layers
        x = tf.keras.layers.Dense(128, activation='relu')(combined)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.Dropout(0.4)(x)
        x = tf.keras.layers.Dense(64, activation='relu')(x)
        x = tf.keras.layers.Dropout(0.3)(x)
        
        # Multiple outputs
        main_output = tf.keras.layers.Dense(3, activation='softmax', name='main_prediction')(x)
        auxiliary_output = tf.keras.layers.Dense(1, activation='sigmoid', name='auxiliary_prediction')(x)
        
        model = tf.keras.Model(
            inputs=[numerical_input, categorical_input, text_input],
            outputs=[main_output, auxiliary_output],
            name='multi_input_output_model'
        )
        
        return model
    
    @staticmethod
    def create_residual_block_model(input_shape, num_classes):
        """Model with residual blocks using Functional API"""
        
        inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Initial processing
        x = tf.keras.layers.Dense(128, activation='relu')(inputs)
        x = tf.keras.layers.BatchNormalization()(x)
        
        # Residual blocks
        for i in range(3):
            # Main path
            residual = tf.keras.layers.Dense(128, activation='relu')(x)
            residual = tf.keras.layers.BatchNormalization()(residual)
            residual = tf.keras.layers.Dropout(0.3)(residual)
            residual = tf.keras.layers.Dense(128, activation='linear')(residual)
            residual = tf.keras.layers.BatchNormalization()(residual)
            
            # Skip connection
            x = tf.keras.layers.Add()([x, residual])
            x = tf.keras.layers.Activation('relu')(x)
            x = tf.keras.layers.Dropout(0.2)(x)
        
        # Final layers
        x = tf.keras.layers.Dense(64, activation='relu')(x)
        x = tf.keras.layers.Dropout(0.4)(x)
        outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
        
        model = tf.keras.Model(inputs=inputs, outputs=outputs, name='residual_model')
        return model
    
    @staticmethod
    def create_attention_model(input_shape, num_classes):
        """Model with self-attention mechanism"""
        
        inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Reshape for attention if needed
        x = tf.keras.layers.Dense(64)(inputs)
        x = tf.keras.layers.Reshape((input_shape[0] // 4, 64 * 4 // input_shape[0]))(x)
        
        # Multi-head attention
        attention_output = tf.keras.layers.MultiHeadAttention(
            num_heads=8, key_dim=32
        )(x, x)
        
        # Add & Norm
        x = tf.keras.layers.Add()([x, attention_output])
        x = tf.keras.layers.LayerNormalization()(x)
        
        # Feed Forward Network
        ffn = tf.keras.layers.Dense(128, activation='relu')(x)
        ffn = tf.keras.layers.Dropout(0.3)(ffn)
        ffn = tf.keras.layers.Dense(x.shape[-1])(ffn)
        
        # Add & Norm
        x = tf.keras.layers.Add()([x, ffn])
        x = tf.keras.layers.LayerNormalization()(x)
        
        # Global average pooling and classification
        x = tf.keras.layers.GlobalAveragePooling1D()(x)
        x = tf.keras.layers.Dense(128, activation='relu')(x)
        x = tf.keras.layers.Dropout(0.4)(x)
        outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
        
        model = tf.keras.Model(inputs=inputs, outputs=outputs, name='attention_model')
        return model
    
    @staticmethod
    def create_branched_model(input_shape, num_classes):
        """Model with multiple processing branches"""
        
        inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Shared initial layers
        shared = tf.keras.layers.Dense(128, activation='relu')(inputs)
        shared = tf.keras.layers.BatchNormalization()(shared)
        
        # Branch 1: Deep narrow path
        branch1 = tf.keras.layers.Dense(64, activation='relu')(shared)
        for _ in range(4):
            branch1 = tf.keras.layers.Dense(64, activation='relu')(branch1)
            branch1 = tf.keras.layers.Dropout(0.2)(branch1)
        branch1 = tf.keras.layers.Dense(32, activation='relu')(branch1)
        
        # Branch 2: Wide shallow path  
        branch2 = tf.keras.layers.Dense(256, activation='relu')(shared)
        branch2 = tf.keras.layers.Dropout(0.4)(branch2)
        branch2 = tf.keras.layers.Dense(128, activation='relu')(branch2)
        branch2 = tf.keras.layers.Dropout(0.3)(branch2)
        
        # Branch 3: Regularized path
        branch3 = tf.keras.layers.Dense(128, activation='relu', 
                                       kernel_regularizer=tf.keras.regularizers.l1_l2(0.01, 0.01))(shared)
        branch3 = tf.keras.layers.BatchNormalization()(branch3)
        branch3 = tf.keras.layers.Dense(64, activation='relu',
                                       kernel_regularizer=tf.keras.regularizers.l2(0.01))(branch3)
        
        # Combine branches
        combined = tf.keras.layers.Concatenate()([branch1, branch2, branch3])
        
        # Final processing
        x = tf.keras.layers.Dense(128, activation='relu')(combined)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.Dropout(0.4)(x)
        outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
        
        model = tf.keras.Model(inputs=inputs, outputs=outputs, name='branched_model')
        return model

# Build and test Functional API models
print("\n=== Functional API Examples ===")

# Multi-input model
multi_input_model = FunctionalModels.create_multi_input_model()
print("Multi-input Model Architecture:")
multi_input_model.summary()

# Compile with multiple losses
multi_input_model.compile(
    optimizer='adam',
    loss={'main_prediction': 'sparse_categorical_crossentropy',
          'auxiliary_prediction': 'binary_crossentropy'},
    loss_weights={'main_prediction': 1.0, 'auxiliary_prediction': 0.3},
    metrics={'main_prediction': 'accuracy', 'auxiliary_prediction': 'accuracy'}
)

# Residual model
residual_model = FunctionalModels.create_residual_block_model(X_class.shape[1:], 3)
residual_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(f"Residual Model Parameters: {residual_model.count_params():,}")

# Train residual model
history_residual = residual_model.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=25, batch_size=32, verbose=0
)

print(f"Residual Model - Final Val Accuracy: {history_residual.history['val_accuracy'][-1]:.4f}")

# Attention model (if input shape allows)
if X_class.shape[1] >= 16:  # Ensure minimum size for attention
    attention_model = FunctionalModels.create_attention_model(X_class.shape[1:], 3)
    attention_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    print(f"Attention Model Parameters: {attention_model.count_params():,}")

# Branched model
branched_model = FunctionalModels.create_branched_model(X_class.shape[1:], 3)
branched_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(f"Branched Model Parameters: {branched_model.count_params():,}")

## 3. Model Subclassing - Maximum Flexibility

In [None]:
# Custom models using Model Subclassing
class CustomResNet(tf.keras.Model):
    """Custom ResNet implementation using Model Subclassing"""
    
    def __init__(self, num_classes, num_blocks=3, **kwargs):
        super().__init__(**kwargs)
        self.num_classes = num_classes
        self.num_blocks = num_blocks
        
        # Initial layers
        self.initial_dense = tf.keras.layers.Dense(128, activation='relu')
        self.initial_bn = tf.keras.layers.BatchNormalization()
        
        # Residual blocks
        self.residual_blocks = []
        for i in range(num_blocks):
            self.residual_blocks.append({
                'dense1': tf.keras.layers.Dense(128, activation='relu'),
                'bn1': tf.keras.layers.BatchNormalization(),
                'dropout1': tf.keras.layers.Dropout(0.3),
                'dense2': tf.keras.layers.Dense(128),
                'bn2': tf.keras.layers.BatchNormalization(),
                'add': tf.keras.layers.Add(),
                'activation': tf.keras.layers.Activation('relu'),
                'dropout2': tf.keras.layers.Dropout(0.2)
            })
        
        # Final layers
        self.final_dense = tf.keras.layers.Dense(64, activation='relu')
        self.final_dropout = tf.keras.layers.Dropout(0.4)
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
    
    def call(self, inputs, training=None):
        # Initial processing
        x = self.initial_dense(inputs)
        x = self.initial_bn(x, training=training)
        
        # Residual blocks
        for block in self.residual_blocks:
            # Main path
            residual = block['dense1'](x)
            residual = block['bn1'](residual, training=training)
            residual = block['dropout1'](residual, training=training)
            residual = block['dense2'](residual)
            residual = block['bn2'](residual, training=training)
            
            # Skip connection
            x = block['add']([x, residual])
            x = block['activation'](x)
            x = block['dropout2'](x, training=training)
        
        # Final processing
        x = self.final_dense(x)
        x = self.final_dropout(x, training=training)
        return self.classifier(x)
    
    def get_config(self):
        config = super().get_config()
        config.update({
            'num_classes': self.num_classes,
            'num_blocks': self.num_blocks
        })
        return config

class CustomVariationalAutoencoder(tf.keras.Model):
    """Variational Autoencoder with custom training logic"""
    
    def __init__(self, original_dim, latent_dim=32, **kwargs):
        super().__init__(**kwargs)
        self.original_dim = original_dim
        self.latent_dim = latent_dim
        
        # Encoder
        self.encoder_layers = [
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
        ]
        
        # Latent space
        self.z_mean_layer = tf.keras.layers.Dense(latent_dim)
        self.z_log_var_layer = tf.keras.layers.Dense(latent_dim)
        
        # Decoder
        self.decoder_layers = [
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(original_dim, activation='sigmoid')
        ]
        
        # Metrics
        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(name="reconstruction_loss")
        self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")
    
    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker, 
            self.kl_loss_tracker
        ]
    
    def encoder(self, inputs, training=None):
        x = inputs
        for layer in self.encoder_layers:
            x = layer(x, training=training)
        
        z_mean = self.z_mean_layer(x)
        z_log_var = self.z_log_var_layer(x)
        return z_mean, z_log_var
    
    def reparameterize(self, z_mean, z_log_var):
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon
    
    def decoder(self, z, training=None):
        x = z
        for layer in self.decoder_layers:
            x = layer(x, training=training)
        return x
    
    def call(self, inputs, training=None):
        z_mean, z_log_var = self.encoder(inputs, training=training)
        z = self.reparameterize(z_mean, z_log_var)
        reconstruction = self.decoder(z, training=training)
        return reconstruction, z_mean, z_log_var
    
    def train_step(self, data):
        with tf.GradientTape() as tape:
            reconstruction, z_mean, z_log_var = self(data, training=True)
            
            # Reconstruction loss
            reconstruction_loss = tf.reduce_mean(
                tf.keras.losses.binary_crossentropy(data, reconstruction)
            )
            
            # KL divergence loss
            kl_loss = -0.5 * tf.reduce_mean(
                1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
            )
            
            total_loss = reconstruction_loss + kl_loss
        
        gradients = tape.gradient(total_loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
        
        # Update metrics
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

class CustomGAN(tf.keras.Model):
    """Simple GAN implementation with custom training loop"""
    
    def __init__(self, latent_dim=100, data_dim=20, **kwargs):
        super().__init__(**kwargs)
        self.latent_dim = latent_dim
        self.data_dim = data_dim
        
        # Generator
        self.generator = tf.keras.Sequential([
            tf.keras.layers.Dense(128, activation='relu', input_shape=(latent_dim,)),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.BatchNormalization(),
            tf.keras.layers.Dense(data_dim, activation='tanh')
        ], name='generator')
        
        # Discriminator
        self.discriminator = tf.keras.Sequential([
            tf.keras.layers.Dense(256, activation='relu', input_shape=(data_dim,)),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dropout(0.3),
            tf.keras.layers.Dense(1, activation='sigmoid')
        ], name='discriminator')
        
        # Optimizers
        self.gen_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5)
        self.disc_optimizer = tf.keras.optimizers.Adam(0.0002, beta_1=0.5)
        
        # Loss functions
        self.loss_fn = tf.keras.losses.BinaryCrossentropy()
        
        # Metrics
        self.gen_loss_tracker = tf.keras.metrics.Mean(name="generator_loss")
        self.disc_loss_tracker = tf.keras.metrics.Mean(name="discriminator_loss")
    
    @property
    def metrics(self):
        return [self.gen_loss_tracker, self.disc_loss_tracker]
    
    def compile(self, **kwargs):
        super().compile(**kwargs)
    
    def train_step(self, real_data):
        batch_size = tf.shape(real_data)[0]
        
        # Train discriminator
        noise = tf.random.normal([batch_size, self.latent_dim])
        
        with tf.GradientTape() as disc_tape:
            generated_data = self.generator(noise, training=True)
            
            real_predictions = self.discriminator(real_data, training=True)
            fake_predictions = self.discriminator(generated_data, training=True)
            
            real_loss = self.loss_fn(tf.ones_like(real_predictions), real_predictions)
            fake_loss = self.loss_fn(tf.zeros_like(fake_predictions), fake_predictions)
            disc_loss = real_loss + fake_loss
        
        disc_gradients = disc_tape.gradient(disc_loss, self.discriminator.trainable_variables)
        self.disc_optimizer.apply_gradients(zip(disc_gradients, self.discriminator.trainable_variables))
        
        # Train generator
        noise = tf.random.normal([batch_size, self.latent_dim])
        
        with tf.GradientTape() as gen_tape:
            generated_data = self.generator(noise, training=True)
            fake_predictions = self.discriminator(generated_data, training=True)
            gen_loss = self.loss_fn(tf.ones_like(fake_predictions), fake_predictions)
        
        gen_gradients = gen_tape.gradient(gen_loss, self.generator.trainable_variables)
        self.gen_optimizer.apply_gradients(zip(gen_gradients, self.generator.trainable_variables))
        
        # Update metrics
        self.gen_loss_tracker.update_state(gen_loss)
        self.disc_loss_tracker.update_state(disc_loss)
        
        return {
            "generator_loss": self.gen_loss_tracker.result(),
            "discriminator_loss": self.disc_loss_tracker.result()
        }

# Test custom models
print("\n=== Model Subclassing Examples ===")

# Custom ResNet
custom_resnet = CustomResNet(num_classes=3, num_blocks=4, name='custom_resnet')
custom_resnet.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Build the model by calling it
_ = custom_resnet(X_class_train[:1])  # Build model
print(f"Custom ResNet Parameters: {custom_resnet.count_params():,}")

# Train custom ResNet
history_custom = custom_resnet.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=20, batch_size=32, verbose=0
)

print(f"Custom ResNet - Final Val Accuracy: {history_custom.history['val_accuracy'][-1]:.4f}")

# Variational Autoencoder
vae = CustomVariationalAutoencoder(original_dim=X_class.shape[1], latent_dim=16)
vae.compile(optimizer='adam')

# Train VAE
print("\nTraining Variational Autoencoder...")
vae_history = vae.fit(X_class_train, epochs=15, batch_size=32, verbose=0)

print(f"VAE - Final Loss: {vae_history.history['loss'][-1]:.4f}")
print(f"VAE - Reconstruction Loss: {vae_history.history['reconstruction_loss'][-1]:.4f}")
print(f"VAE - KL Loss: {vae_history.history['kl_loss'][-1]:.4f}")

# Simple GAN
gan = CustomGAN(latent_dim=50, data_dim=X_class.shape[1])
gan.compile()

# Normalize data for GAN
X_class_normalized = (X_class_train - X_class_train.mean()) / X_class_train.std()
X_class_normalized = np.clip(X_class_normalized, -1, 1)

print("\nTraining GAN...")
gan_history = gan.fit(X_class_normalized, epochs=20, batch_size=32, verbose=0)

print(f"GAN - Final Generator Loss: {gan_history.history['generator_loss'][-1]:.4f}")
print(f"GAN - Final Discriminator Loss: {gan_history.history['discriminator_loss'][-1]:.4f}")

## 4. Model Composition and Advanced Patterns

In [None]:
# Advanced model composition patterns
class ModelComposition:
    """Advanced patterns for combining and composing models"""
    
    @staticmethod
    def create_ensemble_model(input_shape, num_classes, num_models=3):
        """Ensemble of different model architectures"""
        
        inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Model 1: Deep narrow network
        model1 = tf.keras.layers.Dense(64, activation='relu')(inputs)
        for _ in range(5):
            model1 = tf.keras.layers.Dense(64, activation='relu')(model1)
            model1 = tf.keras.layers.Dropout(0.2)(model1)
        model1_out = tf.keras.layers.Dense(num_classes, activation='softmax')(model1)
        
        # Model 2: Wide shallow network
        model2 = tf.keras.layers.Dense(512, activation='relu')(inputs)
        model2 = tf.keras.layers.Dropout(0.4)(model2)
        model2 = tf.keras.layers.Dense(256, activation='relu')(model2)
        model2 = tf.keras.layers.Dropout(0.3)(model2)
        model2_out = tf.keras.layers.Dense(num_classes, activation='softmax')(model2)
        
        # Model 3: Residual network
        model3 = tf.keras.layers.Dense(128, activation='relu')(inputs)
        residual = tf.keras.layers.Dense(128, activation='relu')(model3)
        residual = tf.keras.layers.Dense(128)(residual)
        model3 = tf.keras.layers.Add()([model3, residual])
        model3 = tf.keras.layers.Activation('relu')(model3)
        model3_out = tf.keras.layers.Dense(num_classes, activation='softmax')(model3)
        
        # Average ensemble predictions
        ensemble_output = tf.keras.layers.Average()([model1_out, model2_out, model3_out])
        
        model = tf.keras.Model(inputs=inputs, outputs=ensemble_output, name='ensemble_model')
        return model
    
    @staticmethod
    def create_hierarchical_model(input_shape, num_classes):
        """Hierarchical model with multiple prediction levels"""
        
        inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Shared feature extraction
        x = tf.keras.layers.Dense(256, activation='relu')(inputs)
        x = tf.keras.layers.BatchNormalization()(x)
        shared_features = tf.keras.layers.Dropout(0.3)(x)
        
        # Level 1: Coarse classification (binary)
        level1 = tf.keras.layers.Dense(128, activation='relu')(shared_features)
        level1 = tf.keras.layers.Dropout(0.3)(level1)
        level1_output = tf.keras.layers.Dense(2, activation='softmax', name='level1_prediction')(level1)
        
        # Level 2: Fine classification (conditional on level 1)
        level2_input = tf.keras.layers.Concatenate()([shared_features, level1])
        level2 = tf.keras.layers.Dense(128, activation='relu')(level2_input)
        level2 = tf.keras.layers.Dropout(0.3)(level2)
        level2_output = tf.keras.layers.Dense(num_classes, activation='softmax', name='level2_prediction')(level2)
        
        # Final combined prediction
        combined_input = tf.keras.layers.Concatenate()([shared_features, level1, level2])
        final = tf.keras.layers.Dense(64, activation='relu')(combined_input)
        final = tf.keras.layers.Dropout(0.3)(final)
        final_output = tf.keras.layers.Dense(num_classes, activation='softmax', name='final_prediction')(final)
        
        model = tf.keras.Model(
            inputs=inputs, 
            outputs=[level1_output, level2_output, final_output],
            name='hierarchical_model'
        )
        return model
    
    @staticmethod
    def create_progressive_model(input_shape, num_classes):
        """Progressive model that can be grown during training"""
        
        # Base model
        base_inputs = tf.keras.layers.Input(shape=input_shape)
        base_x = tf.keras.layers.Dense(64, activation='relu')(base_inputs)
        base_x = tf.keras.layers.BatchNormalization()(base_x)
        base_outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(base_x)
        
        base_model = tf.keras.Model(inputs=base_inputs, outputs=base_outputs, name='base_model')
        
        # Extended model (adds complexity)
        extended_inputs = tf.keras.layers.Input(shape=input_shape)
        
        # Use base model as feature extractor (frozen)
        base_features = base_model.layers[-2](base_model.layers[-3](base_model.layers[-4](extended_inputs)))
        
        # Add more complexity
        extended_x = tf.keras.layers.Dense(128, activation='relu')(extended_inputs)
        extended_x = tf.keras.layers.BatchNormalization()(extended_x)
        extended_x = tf.keras.layers.Dropout(0.3)(extended_x)
        
        # Combine with base features
        combined = tf.keras.layers.Concatenate()([base_features, extended_x])
        combined = tf.keras.layers.Dense(64, activation='relu')(combined)
        extended_outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(combined)
        
        extended_model = tf.keras.Model(inputs=extended_inputs, outputs=extended_outputs, name='extended_model')
        
        return base_model, extended_model

# Test advanced composition patterns
print("\n=== Advanced Model Composition ===")

# Ensemble model
ensemble_model = ModelComposition.create_ensemble_model(X_class.shape[1:], 3)
ensemble_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(f"Ensemble Model Parameters: {ensemble_model.count_params():,}")

# Train ensemble model
history_ensemble = ensemble_model.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=15, batch_size=32, verbose=0
)

print(f"Ensemble Model - Final Val Accuracy: {history_ensemble.history['val_accuracy'][-1]:.4f}")

# Hierarchical model
hierarchical_model = ModelComposition.create_hierarchical_model(X_class.shape[1:], 3)
hierarchical_model.compile(
    optimizer='adam',
    loss={
        'level1_prediction': 'sparse_categorical_crossentropy',
        'level2_prediction': 'sparse_categorical_crossentropy', 
        'final_prediction': 'sparse_categorical_crossentropy'
    },
    loss_weights={'level1_prediction': 0.3, 'level2_prediction': 0.3, 'final_prediction': 0.4},
    metrics='accuracy'
)

print(f"Hierarchical Model Parameters: {hierarchical_model.count_params():,}")

# Progressive models
base_model, extended_model = ModelComposition.create_progressive_model(X_class.shape[1:], 3)

# Train base model first
base_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
base_history = base_model.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=10, batch_size=32, verbose=0
)

print(f"Base Model - Final Val Accuracy: {base_history.history['val_accuracy'][-1]:.4f}")

# Then train extended model
extended_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
extended_history = extended_model.fit(
    X_class_train, y_class_train,
    validation_data=(X_class_test, y_class_test),
    epochs=10, batch_size=32, verbose=0
)

print(f"Extended Model - Final Val Accuracy: {extended_history.history['val_accuracy'][-1]:.4f}")

## 5. Model Comparison and Selection

In [None]:
# Comprehensive model comparison
def compare_models():
    """Compare different model architectures and approaches"""
    
    models_to_compare = {
        'Sequential_Simple': SequentialModels.create_simple_classifier(X_class.shape[1], 3),
        'Sequential_Deep': SequentialModels.create_deep_classifier(X_class.shape[1], 3),
        'Functional_Residual': FunctionalModels.create_residual_block_model(X_class.shape[1:], 3),
        'Functional_Branched': FunctionalModels.create_branched_model(X_class.shape[1:], 3),
        'Custom_ResNet': CustomResNet(num_classes=3, num_blocks=3),
        'Ensemble': ModelComposition.create_ensemble_model(X_class.shape[1:], 3)
    }
    
    results = {}
    
    print("=== Model Comparison Results ===")
    print("-" * 80)
    print(f"{'Model':<20} {'Parameters':<12} {'Train Acc':<12} {'Val Acc':<12} {'Train Time':<12}")
    print("-" * 80)
    
    for name, model in models_to_compare.items():
        # Compile model
        model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
        
        # Build model to count parameters
        if hasattr(model, 'build'):
            model.build(input_shape=(None, X_class.shape[1]))
        else:
            _ = model(X_class_train[:1])
        
        # Train model and measure time
        import time
        start_time = time.time()
        
        history = model.fit(
            X_class_train, y_class_train,
            validation_data=(X_class_test, y_class_test),
            epochs=15, batch_size=32, verbose=0
        )
        
        train_time = time.time() - start_time
        
        # Store results
        results[name] = {
            'parameters': model.count_params(),
            'train_accuracy': history.history['accuracy'][-1],
            'val_accuracy': history.history['val_accuracy'][-1],
            'train_time': train_time,
            'history': history.history
        }
        
        print(f"{name:<20} {results[name]['parameters']:<12,} "
              f"{results[name]['train_accuracy']:<12.4f} "
              f"{results[name]['val_accuracy']:<12.4f} "
              f"{train_time:<12.2f}")
    
    return results

# Visualization of model performance
def visualize_model_comparison(results):
    """Visualize model comparison results"""
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # Performance vs Parameters
    params = [results[name]['parameters'] for name in results.keys()]
    val_accs = [results[name]['val_accuracy'] for name in results.keys()]
    names = list(results.keys())
    
    axes[0, 0].scatter(params, val_accs, s=100, alpha=0.7)
    for i, name in enumerate(names):
        axes[0, 0].annotate(name, (params[i], val_accs[i]), xytext=(5, 5), textcoords='offset points')
    axes[0, 0].set_xlabel('Number of Parameters')
    axes[0, 0].set_ylabel('Validation Accuracy')
    axes[0, 0].set_title('Performance vs Model Complexity')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Training Time vs Performance
    train_times = [results[name]['train_time'] for name in results.keys()]
    
    axes[0, 1].scatter(train_times, val_accs, s=100, alpha=0.7, color='orange')
    for i, name in enumerate(names):
        axes[0, 1].annotate(name, (train_times[i], val_accs[i]), xytext=(5, 5), textcoords='offset points')
    axes[0, 1].set_xlabel('Training Time (seconds)')
    axes[0, 1].set_ylabel('Validation Accuracy')
    axes[0, 1].set_title('Performance vs Training Time')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Model Performance Bar Chart
    axes[1, 0].bar(names, val_accs, alpha=0.7, color='green')
    axes[1, 0].set_ylabel('Validation Accuracy')
    axes[1, 0].set_title('Model Performance Comparison')
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Parameter Count Bar Chart
    axes[1, 1].bar(names, params, alpha=0.7, color='red')
    axes[1, 1].set_ylabel('Number of Parameters')
    axes[1, 1].set_title('Model Complexity Comparison')
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].set_yscale('log')
    
    plt.tight_layout()
    plt.show()
    
    # Performance summary
    print("\n=== Performance Summary ===")
    best_accuracy = max(results.values(), key=lambda x: x['val_accuracy'])
    fastest_training = min(results.values(), key=lambda x: x['train_time'])
    most_efficient = min(results.values(), key=lambda x: x['parameters'])
    
    best_acc_name = [k for k, v in results.items() if v == best_accuracy][0]
    fastest_name = [k for k, v in results.items() if v == fastest_training][0] 
    most_eff_name = [k for k, v in results.items() if v == most_efficient][0]
    
    print(f"Best Accuracy: {best_acc_name} ({best_accuracy['val_accuracy']:.4f})")
    print(f"Fastest Training: {fastest_name} ({fastest_training['train_time']:.2f}s)")
    print(f"Most Efficient: {most_eff_name} ({most_efficient['parameters']:,} params)")

# Run model comparison
comparison_results = compare_models()
visualize_model_comparison(comparison_results)

## 6. Best Practices and Guidelines

In [None]:
# Best practices and design guidelines
class ModelDesignGuidelines:
    """Guidelines for choosing the right model architecture approach"""
    
    @staticmethod
    def print_guidelines():
        """Print comprehensive guidelines for model selection"""
        
        print("=== tf.keras Model Architecture Guidelines ===")
        print()
        
        print("ðŸ”¹ SEQUENTIAL API - Use when:")
        print("  âœ“ Linear stack of layers (no branching/merging)")
        print("  âœ“ Simple feedforward networks") 
        print("  âœ“ Quick prototyping and experimentation")
        print("  âœ“ Standard architectures (MLP, simple CNN/RNN)")
        print("  âœ— Avoid for: Multi-input/output, complex topologies")
        print()
        
        print("ðŸ”¹ FUNCTIONAL API - Use when:")
        print("  âœ“ Multi-input or multi-output models")
        print("  âœ“ Models with shared layers")
        print("  âœ“ Non-linear topology (skip connections, branches)")
        print("  âœ“ Complex architectures (ResNet, U-Net, etc.)")
        print("  âœ“ Need to access intermediate layer outputs")
        print("  âœ— Avoid for: Very simple linear models")
        print()
        
        print("ðŸ”¹ MODEL SUBCLASSING - Use when:")
        print("  âœ“ Need custom training loops or loss functions")
        print("  âœ“ Dynamic architectures that change during forward pass")
        print("  âœ“ Complex control flow in the model")
        print("  âœ“ Research-level customizations (GANs, VAEs, etc.)")
        print("  âœ“ Need to track custom metrics or states")
        print("  âœ— Avoid for: Standard architectures, production models")
        print()
        
        print("ðŸ”¹ PERFORMANCE CONSIDERATIONS:")
        print("  â€¢ Sequential: Fastest to build and train")
        print("  â€¢ Functional: Good performance, flexible")
        print("  â€¢ Subclassing: Most flexible, potentially slower")
        print()
        
        print("ðŸ”¹ MAINTAINABILITY:")
        print("  â€¢ Sequential: Easiest to understand and modify")
        print("  â€¢ Functional: Good balance of flexibility and clarity")  
        print("  â€¢ Subclassing: Requires careful documentation")
        print()
        
        print("ðŸ”¹ DEPLOYMENT CONSIDERATIONS:")
        print("  â€¢ Sequential/Functional: Easy to save/load, convert to other formats")
        print("  â€¢ Subclassing: May require custom code for deployment")
    
    @staticmethod
    def architecture_recommendations():
        """Specific architecture recommendations for common use cases"""
        
        print("\n=== Architecture Recommendations ===")
        print()
        
        recommendations = {
            "Binary Classification": {
                "approach": "Sequential",
                "architecture": "Dense(128) â†’ ReLU â†’ Dropout â†’ Dense(64) â†’ ReLU â†’ Dense(1) â†’ Sigmoid",
                "considerations": "Use batch normalization for deeper networks"
            },
            "Multi-class Classification": {
                "approach": "Sequential or Functional",
                "architecture": "Dense(256) â†’ ReLU â†’ BatchNorm â†’ Dropout â†’ Dense(128) â†’ ReLU â†’ Dense(num_classes) â†’ Softmax",
                "considerations": "Consider residual connections for very deep networks"
            },
            "Regression": {
                "approach": "Sequential",
                "architecture": "Dense(128) â†’ ReLU â†’ Dropout â†’ Dense(64) â†’ ReLU â†’ Dense(1) â†’ Linear",
                "considerations": "Add L2 regularization, careful with output activation"
            },
            "Multi-input Model": {
                "approach": "Functional",
                "architecture": "Separate branches for each input type, concatenate, then dense layers",
                "considerations": "Normalize inputs differently, consider input-specific preprocessing"
            },
            "Time Series": {
                "approach": "Sequential or Functional",
                "architecture": "LSTM/GRU layers followed by dense layers",
                "considerations": "Consider attention mechanisms for long sequences"
            },
            "Autoencoder": {
                "approach": "Functional or Subclassing",
                "architecture": "Symmetric encoder-decoder with bottleneck",
                "considerations": "Use skip connections for better reconstruction"
            },
            "GAN": {
                "approach": "Model Subclassing",
                "architecture": "Separate generator and discriminator with custom training loop",
                "considerations": "Careful learning rate scheduling and loss balancing"
            }
        }
        
        for use_case, rec in recommendations.items():
            print(f"ðŸ“‹ {use_case}:")
            print(f"   Approach: {rec['approach']}")
            print(f"   Architecture: {rec['architecture']}")
            print(f"   Considerations: {rec['considerations']}")
            print()

# Print guidelines and recommendations
ModelDesignGuidelines.print_guidelines()
ModelDesignGuidelines.architecture_recommendations()

# Final comparison summary
print("\n=== Final Summary ===")
print("This notebook demonstrated three approaches to building tf.keras models:")
print("1. Sequential API: ðŸš€ Fast prototyping, linear architectures")
print("2. Functional API: ðŸ”§ Flexible, complex topologies") 
print("3. Model Subclassing: ðŸŽ¯ Maximum control, research applications")
print("\nChoose based on your specific requirements for flexibility vs. simplicity!")

## Summary

**File Location:** `notebooks/02_neural_networks_with_keras/04_tf_keras_sequential_functional.ipynb`

This comprehensive notebook mastered all three tf.keras model-building approaches:

### Key Concepts Covered:
1. **Sequential API**: Linear layer stacking for simple architectures
2. **Functional API**: Complex topologies with branching and merging  
3. **Model Subclassing**: Custom training loops and dynamic architectures
4. **Advanced Patterns**: Ensembles, hierarchical models, progressive training
5. **Model Composition**: Combining different architectural approaches
6. **Performance Comparison**: Systematic evaluation of different approaches

### Architecture Patterns Mastered:
- **Simple Classifiers**: Basic feedforward networks
- **Deep Networks**: Batch normalization and residual connections
- **Multi-input/Output**: Complex data fusion architectures
- **Attention Mechanisms**: Self-attention for improved performance
- **Custom Models**: VAEs, GANs with custom training logic
- **Ensemble Methods**: Averaging multiple model predictions

### Decision Framework:
- **Sequential**: Simple, linear architectures; fastest development
- **Functional**: Complex topologies; good balance of flexibility/performance  
- **Subclassing**: Research applications; maximum customization capability

### Best Practices Learned:
- Choose architecture complexity based on data complexity
- Use appropriate regularization (dropout, batch norm, L2)
- Consider residual connections for very deep networks
- Implement proper train/validation/test splitting
- Profile different approaches for your specific use case

### Next Steps:
- Apply custom layers and advanced training techniques (Notebook 05)
- Implement domain-specific architectures (CNN, RNN, Transformers)
- Scale to production deployments with optimized architectures

This foundation enables building sophisticated neural networks for any machine learning task!