### Explain parameters for the code above and why i use these values : 

* Normalize the input data - Κανονικοποιήση

RGB values are typically in the range 0-255. Normalizing them to 0-1 can help the model learn more effectively. Πιο συγκεκριμένα η κανονικοποίηση μπορεί να βοηθήσει το μοντέλο να μάθει πιο γρήγορα, καθώς οι αλγόριθμοι βελτιστοποίησης (όπως η gradient descent) λειτουργούν πιο αποτελεσματικά όταν τα δεδομένα είναι κανονικοποιημένα.H κανονικοποίηση μπορεί να οδηγήσει σε καλύτερη απόδοση του μοντέλου, καθώς βοηθά στην αποφυγή προβλημάτων όπως το "vanishing gradient" (εξαφάνιση κλίσης) ή το "exploding gradient" (έκρηξη κλίσης).Πολλά νευρωνικά δίκτυα χρησιμοποιούν συναρτήσεις ενεργοποίησης όπως η SIGMOID ή tanh, οι οποίες έχουν σχεδιαστεί για να λειτουργούν καλύτερα με εισόδους μεταξύ 0 και 1.

- Encode the target value - Κωδικοποιήση 

Machine learning models, especially neural networks, work with numbers, not text. Encoding transforms these text labels into a numerical format the model can understand and process. For multi-class problems, encoded labels allow the use of appropriate loss functions like categorical cross-entropy.

- MLP Model overall structure 

1. Activation Functions:

- ReLU: 

Η ReLU εισάγει μη γραμμικότητα στο δίκτυο, επιτρέποντάς του να μαθαίνει σύνθετα μοτίβα. Σε αντίθεση με το sigmoid ή το tanh, η ReLU δεν συνθλίβει τις κλίσεις στη θετική περιοχή, επιτρέποντας καλύτερη ροή κλίσης. Η ReLU είναι απλή στον υπολογισμό, γεγονός που επιταχύνει την εκπαίδευση.

- Softmax:  

Η κατάλληλη επιλογή για προβλήματα ταξινόμησης πολλαπλών κλάσεων, καθώς μετατρέπει τις εξόδους του μοντέλου σε πιθανότητες για κάθε κλάση.


2. Regularization Techniques:

- L2 Regularization (0.01): 

Discourages the model from relying too heavily on any single feature. It's like telling the model "don't put all your eggs in one basket". Βοηθά στην αποφυγή υπερβολικής προσαρμογής . Η τιμή 0.001 είναι απλά συνήθης

- Dropout (0.4): 

Το Dropout είναι μια τεχνική κανονικοποίησης που απενεργοποιεί τυχαία ένα ποσοστό των νευρώνων κατά την εκπαίδευση. Αυτό βοηθά στην αποφυγή της υπερβολικής προσαρμογής, αναγκάζοντας το μοντέλο να μάθει πιο γενικευμένα χαρακτηριστικά. Η τιμή 0.4 σημαίνει ότι το 40% των νευρώνων θα απενεργοποιούνται τυχαία σε κάθε βήμα εκπαίδευσης.

- BatchNormalization: 

Είναι κι αυτό μια τεχνική κανονικοποιήσης που κανονικοποιεί τις ενεργοποιήσεις των νευρώνων σε κάθε παρτίδα δεδομενων . Βοηθά στην σταθεροποιήση της εκπαίδευσης και επιταγχύνει τη σύγκλιση και μπορεί να βελτιώσει την απόδοση του μοντέλου .

3. Optimizer (Adam) and Learning Rate (0.001):

- Adam is like a smart teacher that adjusts how big of learning steps to take.

- 0.001 is a common starting point - not too fast, not too slow.


4. Loss Function (Categorical Cross-Entropy):

- This measures how wrong the model's predictions are. It's particularly good for problems with multiple classes like this one.


5. Training Parameters:

- Batch size (32): Processes 32 samples at a time. It's a balance between speed and memory use. This can herlp stabliize the learning process.

- Epochs (300): Maximum number of times to go through the entire dataset.

- Validation split (0.2): Uses 20% of data to check how well the model is learning.


6. Callbacks:

- ReduceLROnPlateau: 

Μειώνει αυτόματα τον learning rate όταν η απόδοση του μοντέλου στο validation set σταματά να βελτιώνεται. Αυτό μπορεί να βοηθήσει στην αποφυγή υπερβολικής προσαρμογής και στην επίτευξη καλύτερης απόδοσης.

- EarlyStopping: 

Σταματά την εκπαίδευση όταν η απόδοση στο validation set σταματά να βελτιώνεται για ένα ορισμένο αριθμό εποχών (epochs). Αυτό βοηθά στην αποφυγή υπερβολικής προσαρμογής και στην εξοικονόμηση χρόνου εκπαίδευσης.

7. Metrics: 
Because I'm dealing with imbalanced dataset it's better to use more metrisc because accuracy alone can be misleading !

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, LeakyReLU
from tensorflow.keras.optimizers import Adam, SGD, AdamW
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
from tensorflow.keras.metrics import Precision, Recall
import matplotlib.pyplot as plt
import seaborn as sns

# Load and preprocess the data
data = pd.read_csv('final_dataset.csv')
X = data.iloc[:, :-1].values / 255.0
y = data.iloc[:, -1].values

# Encode the target variable
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
y_categorical = to_categorical(y_encoded)

# Compute class weights
class_weights = compute_class_weight('balanced', classes=np.unique(y_encoded), y=y_encoded)
class_weight_dict = dict(enumerate(class_weights))

def create_model(activation='relu', optimizer='adam', regularizer='l2', dropout_rate=0.3, learning_rate=0.001):
    model = Sequential([
        Dense(256, activation=activation, input_shape=(9,), kernel_regularizer=regularizer(0.001)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(128, activation=activation, kernel_regularizer=regularizer(0.001)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(64, activation=activation, kernel_regularizer=regularizer(0.001)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(182, activation='softmax')
    ])
    
    if optimizer == 'adam':
        opt = Adam(learning_rate=learning_rate)
    elif optimizer == 'sgd':
        opt = SGD(learning_rate=learning_rate)
    elif optimizer == 'adamw':
        opt = AdamW(learning_rate=learning_rate)
    
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy', Precision(), Recall()])
    return model

# Define the hyperparameter grid
param_grid = {
    'activation': ['relu', 'tanh', 'leaky_relu'],
    'optimizer': ['adam', 'sgd', 'adamw'],
    'regularizer': ['l1', 'l2'],
    'dropout_rate': [0.2, 0.3, 0.4, 0.5],
    'learning_rate': [0.1, 0.01, 0.001, 0.0001]
}

# Custom Grid Search with Cross-Validation
def grid_search_cv(param_grid, X, y, n_splits=5):
    best_score = 0
    best_params = {}
    cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
    
    for activation in param_grid['activation']:
        for optimizer in param_grid['optimizer']:
            for regularizer in param_grid['regularizer']:
                for dropout_rate in param_grid['dropout_rate']:
                    for learning_rate in param_grid['learning_rate']:
                        print(f"Testing: activation={activation}, optimizer={optimizer}, regularizer={regularizer}, dropout={dropout_rate}, lr={learning_rate}")
                        
                        scores = []
                        for train_index, val_index in cv.split(X, y_encoded):
                            X_train, X_val = X[train_index], X[val_index]
                            y_train, y_val = y_categorical[train_index], y_categorical[val_index]
                            
                            model = create_model(
                                activation=activation if activation != 'leaky_relu' else 'relu',
                                optimizer=optimizer,
                                regularizer=l1 if regularizer == 'l1' else l2,
                                dropout_rate=dropout_rate,
                                learning_rate=learning_rate
                            )
                            
                            if activation == 'leaky_relu':
                                model.layers[0].activation = LeakyReLU(alpha=0.01)
                                model.layers[3].activation = LeakyReLU(alpha=0.01)
                                model.layers[6].activation = LeakyReLU(alpha=0.01)
                            
                            reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.00001)
                            early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)
                            
                            history = model.fit(
                                X_train, y_train,
                                epochs=300,
                                batch_size=32,
                                validation_data=(X_val, y_val),
                                callbacks=[reduce_lr, early_stopping],
                                class_weight=class_weight_dict,
                                verbose=0
                            )
                            
                            val_pred = model.predict(X_val)
                            val_pred_classes = np.argmax(val_pred, axis=1)
                            val_true_classes = np.argmax(y_val, axis=1)
                            score = accuracy_score(val_true_classes, val_pred_classes)
                            scores.append(score)
                        
                        avg_score = np.mean(scores)
                        print(f"Average score: {avg_score}")
                        
                        if avg_score > best_score:
                            best_score = avg_score
                            best_params = {
                                'activation': activation,
                                'optimizer': optimizer,
                                'regularizer': regularizer,
                                'dropout_rate': dropout_rate,
                                'learning_rate': learning_rate
                            }
    
    return best_params, best_score

# Perform Grid Search
best_params, best_score = grid_search_cv(param_grid, X, y_categorical)
print("Best hyperparameters:", best_params)
print("Best score:", best_score)

# Create the final model with best hyperparameters
final_model = create_model(
    activation=best_params['activation'] if best_params['activation'] != 'leaky_relu' else 'relu',
    optimizer=best_params['optimizer'],
    regularizer=l1 if best_params['regularizer'] == 'l1' else l2,
    dropout_rate=best_params['dropout_rate'],
    learning_rate=best_params['learning_rate']
)

if best_params['activation'] == 'leaky_relu':
    final_model.layers[0].activation = LeakyReLU(alpha=0.01)
    final_model.layers[3].activation = LeakyReLU(alpha=0.01)
    final_model.layers[6].activation = LeakyReLU(alpha=0.01)

# Perform final training and evaluation
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

histories = []
fold_accuracies = []

for fold, (train_index, val_index) in enumerate(skf.split(X, y_encoded), 1):
    print(f'Fold {fold}')
    
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y_categorical[train_index], y_categorical[val_index]
    
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.00001)
    early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)
    
    history = final_model.fit(
        X_train, y_train,
        epochs=300,
        batch_size=32,
        validation_data=(X_val, y_val),
        callbacks=[reduce_lr, early_stopping],
        class_weight=class_weight_dict,
        verbose=1
    )
    
    histories.append(history)
    fold_accuracies.append(max(history.history['val_accuracy']))

# Print average accuracy across folds
print(f"Average accuracy across {n_splits} folds: {np.mean(fold_accuracies):.4f}")

# Plot learning curves
plt.figure(figsize=(12, 4))
for i, history in enumerate(histories):
    plt.plot(history.history['val_accuracy'], label=f'Fold {i+1}')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

# Evaluate on the entire dataset
y_pred = final_model.predict(X)
y_pred_classes = np.argmax(y_pred, axis=1)

# Print classification report
print(classification_report(y_encoded, y_pred_classes))

# Plot confusion matrix
cm = confusion_matrix(y_encoded, y_pred_classes)
plt.figure(figsize=(20, 20))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

In [None]:
# Print classification report
print(classification_report(y_encoded, y_pred_classes))

# Plot confusion matrix
cm = confusion_matrix(y_encoded, y_pred_classes)
plt.figure(figsize=(20, 20))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Feature importance analysis
feature_importance = np.abs(final_model.layers[0].get_weights()[0]).mean(axis=1)
feature_names = [f'Feature {i+1}' for i in range(9)]
importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})
importance_df = importance_df.sort_values('Importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importance_df)
plt.title('Feature Importance')
plt.show()

### Adding improvements in the MLP model !

In [None]:
# Import necessary libraries
import numpy as np  # For numerical operations
import pandas as pd  # For data manipulation and analysis
from sklearn.model_selection import StratifiedKFold  # For stratified k-fold cross-validation
from sklearn.preprocessing import LabelEncoder  # For encoding categorical labels
from sklearn.utils.class_weight import compute_class_weight  # To handle class imbalance
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score  # For model evaluation
from tensorflow.keras.models import Sequential  # For creating the neural network model
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, LeakyReLU  # Neural network layers
from tensorflow.keras.optimizers import Adam, SGD, AdamW  # Optimization algorithms
from tensorflow.keras.utils import to_categorical  # For one-hot encoding of labels
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, LearningRateScheduler  # Training callbacks
from tensorflow.keras.regularizers import l1, l2, l1_l2  # Regularization techniques
from tensorflow.keras.metrics import Precision, Recall  # Additional evaluation metrics
import matplotlib.pyplot as plt  # For plotting
import seaborn as sns  # For enhanced visualizations

# Load and preprocess the data
data = pd.read_csv('final_dataset.csv')  # Load the dataset from a CSV file
X = data.iloc[:, :-1].values / 255.0  # Extract features and normalize to [0, 1] range
y = data.iloc[:, -1].values  # Extract the target variable (last column)

# Encode the target variable
label_encoder = LabelEncoder()  # Initialize the LabelEncoder
y_encoded = label_encoder.fit_transform(y)  # Encode categorical labels to integers
y_categorical = to_categorical(y_encoded)  # Convert integer labels to one-hot encoded format

# Compute class weights to handle class imbalance
class_weights = compute_class_weight('balanced', classes=np.unique(y_encoded), y=y_encoded)
class_weight_dict = dict(enumerate(class_weights))  # Create a dictionary of class weights

# Define the model creation function
def create_model(num_layers, neurons_per_layer, activation='relu', optimizer='adam', regularizer='l2', dropout_rate=0.3, learning_rate=0.001):
    model = Sequential()  # Initialize a sequential model
    
    # Add layers to the model based on the specified parameters
    for i in range(num_layers):
        if i == 0:
            # Add the input layer with specified number of neurons and activation
            model.add(Dense(neurons_per_layer[i], activation=activation, input_shape=(9,), 
                            kernel_regularizer=regularizer(0.01) if regularizer != 'l1_l2' else l1_l2(l1=0.01, l2=0.01)))
        else:
            # Add hidden layers
            model.add(Dense(neurons_per_layer[i], activation=activation, 
                            kernel_regularizer=regularizer(0.01) if regularizer != 'l1_l2' else l1_l2(l1=0.01, l2=0.01)))
        model.add(BatchNormalization())  # Add batch normalization layer
        model.add(Dropout(dropout_rate))  # Add dropout layer for regularization
    
    model.add(Dense(182, activation='softmax'))  # Add output layer with softmax activation
    
    # Configure the optimizer based on the specified parameter
    if optimizer == 'adam':
        opt = Adam(learning_rate=learning_rate)
    elif optimizer == 'sgd':
        opt = SGD(learning_rate=learning_rate)
    elif optimizer == 'adamw':
        opt = AdamW(learning_rate=learning_rate)
    
    # Compile the model with specified loss function and metrics
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy', Precision(), Recall()])
    return model

# Define the hyperparameter grid for search
param_grid = {
    'num_layers': [2, 3, 4],  # Number of hidden layers to try
    'neurons_per_layer': [(256, 128), (256, 128, 64), (512, 256, 128, 64)],  # Neurons in each layer
    'activation': ['relu', 'tanh', 'leaky_relu'],  # Activation functions to try
    'optimizer': ['adam', 'sgd', 'adamw'],  # Optimization algorithms to try
    'regularizer': ['l1', 'l2', 'l1_l2'],  # Regularization techniques to try
    'dropout_rate': [0.2, 0.3, 0.4, 0.5],  # Dropout rates to try
    'learning_rate': [0.1, 0.01, 0.001, 0.0001]  # Learning rates to try
}

# Implement a learning rate schedule function
def lr_schedule(epoch, initial_lr):
    drop = 0.5  # Factor by which the learning rate will be reduced
    epochs_drop = 10.0  # Number of epochs after which learning rate is reduced
    lr = initial_lr * (drop ** np.floor((1 + epoch) / epochs_drop))  # Calculate new learning rate
    return lr

# Custom Grid Search with Cross-Validation function
def grid_search_cv(param_grid, X, y, n_splits=10):
    best_score = 0
    best_params = {}
    cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)  # Initialize stratified k-fold cross-validation
    
    # Iterate through all combinations of hyperparameters
    for num_layers in param_grid['num_layers']:
        for neurons_per_layer in param_grid['neurons_per_layer']:
            if len(neurons_per_layer) != num_layers:
                continue
            for activation in param_grid['activation']:
                for optimizer in param_grid['optimizer']:
                    for regularizer in param_grid['regularizer']:
                        for dropout_rate in param_grid['dropout_rate']:
                            for learning_rate in param_grid['learning_rate']:
                                print(f"Testing: layers={num_layers}, neurons={neurons_per_layer}, "
                                      f"activation={activation}, optimizer={optimizer}, "
                                      f"regularizer={regularizer}, dropout={dropout_rate}, "
                                      f"lr={learning_rate}")
                                
                                scores = []
                                # Perform k-fold cross-validation
                                for fold, (train_index, val_index) in enumerate(cv.split(X, y_encoded), 1):
                                    X_train, X_val = X[train_index], X[val_index]
                                    y_train, y_val = y_categorical[train_index], y_categorical[val_index]
                                    
                                    # Create and configure the model
                                    model = create_model(
                                        num_layers=num_layers,
                                        neurons_per_layer=neurons_per_layer,
                                        activation=activation if activation != 'leaky_relu' else 'relu',
                                        optimizer=optimizer,
                                        regularizer=l1 if regularizer == 'l1' else (l2 if regularizer == 'l2' else l1_l2),
                                        dropout_rate=dropout_rate,
                                        learning_rate=learning_rate
                                    )
                                    
                                    # Apply LeakyReLU activation if specified
                                    if activation == 'leaky_relu':
                                        for layer in model.layers:
                                            if isinstance(layer, Dense):
                                                layer.activation = LeakyReLU(alpha=0.01)
                                    
                                    # Define callbacks for training
                                    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.00001)
                                    early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
                                    lr_scheduler = LearningRateScheduler(lambda epoch: lr_schedule(epoch, learning_rate))
                                    
                                    # Train the model
                                    history = model.fit(
                                        X_train, y_train,
                                        epochs=500,
                                        batch_size=32,
                                        validation_data=(X_val, y_val),
                                        callbacks=[reduce_lr, early_stopping, lr_scheduler],
                                        class_weight=class_weight_dict,
                                        verbose=0
                                    )
                                    
                                    # Monitor for overfitting
                                    train_loss = history.history['loss'][-1]
                                    val_loss = history.history['val_loss'][-1]
                                    if train_loss < 0.3 * val_loss:
                                        print(f"Potential overfitting detected in fold {fold}")
                                    
                                    # Evaluate the model
                                    val_pred = model.predict(X_val)
                                    val_pred_classes = np.argmax(val_pred, axis=1)
                                    val_true_classes = np.argmax(y_val, axis=1)
                                    score = accuracy_score(val_true_classes, val_pred_classes)
                                    scores.append(score)
                                    
                                    print(f"Fold {fold} score: {score:.4f}")
                                
                                # Calculate average score across folds
                                avg_score = np.mean(scores)
                                std_score = np.std(scores)
                                print(f"Average score: {avg_score:.4f} (+/- {std_score:.4f})")
                                
                                # Update best parameters if current configuration is better
                                if avg_score > best_score:
                                    best_score = avg_score
                                    best_params = {
                                        'num_layers': num_layers,
                                        'neurons_per_layer': neurons_per_layer,
                                        'activation': activation,
                                        'optimizer': optimizer,
                                        'regularizer': regularizer,
                                        'dropout_rate': dropout_rate,
                                        'learning_rate': learning_rate
                                    }
                                    print("New best configuration found!")
                                
                                # Plot learning curves for the best configuration
                                if avg_score == best_score:
                                    plt.figure(figsize=(12, 4))
                                    plt.subplot(1, 2, 1)
                                    plt.plot(history.history['loss'], label='Training Loss')
                                    plt.plot(history.history['val_loss'], label='Validation Loss')
                                    plt.title('Model Loss')
                                    plt.xlabel('Epoch')
                                    plt.ylabel('Loss')
                                    plt.legend()
                                    
                                    plt.subplot(1, 2, 2)
                                    plt.plot(history.history['accuracy'], label='Training Accuracy')
                                    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
                                    plt.title('Model Accuracy')
                                    plt.xlabel('Epoch')
                                    plt.ylabel('Accuracy')
                                    plt.legend()
                                    
                                    plt.tight_layout()
                                    plt.show()
    
    return best_params, best_score

# Perform Grid Search
best_params, best_score = grid_search_cv(param_grid, X, y_categorical)
print("Best hyperparameters:", best_params)
print("Best score:", best_score)

# Create the final model with best hyperparameters
final_model = create_model(
    num_layers=best_params['num_layers'],
    neurons_per_layer=best_params['neurons_per_layer'],
    activation=best_params['activation'] if best_params['activation'] != 'leaky_relu' else 'relu',
    optimizer=best_params['optimizer'],
    regularizer=l1 if best_params['regularizer'] == 'l1' else (l2 if best_params['regularizer'] == 'l2' else l1_l2),
    dropout_rate=best_params['dropout_rate'],
    learning_rate=best_params['learning_rate']
)

# Apply LeakyReLU activation if it was the best activation function
if best_params['activation'] == 'leaky_relu':
    for layer in final_model.layers:
        if isinstance(layer, Dense):
            layer.activation = LeakyReLU(alpha=0.01)

# Perform final training and evaluation
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

histories = []
fold_accuracies = []

# Train and evaluate the model using k-fold cross-validation
for fold, (train_index, val_index) in enumerate(skf.split(X, y_encoded), 1):
    print(f'Fold {fold}')
    
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y_categorical[train_index], y_categorical[val_index]
    
    # Define callbacks for training
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.00001)
    early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)
    lr_scheduler = LearningRateScheduler(lambda epoch: lr_schedule(epoch, best_params['learning_rate']))
    
    # Train the model
    history = final_model.fit(
        X_train, y_train,
        epochs=300,
        batch_size=32,
        validation_data=(X_val, y_val),
        callbacks=[reduce_lr, early_stopping, lr_scheduler],
        class_weight=class_weight_dict,
        verbose=1
    )
    
    histories.append(history)
    fold_accuracies.append(max(history.history['val_accuracy']))

# Print average accuracy across folds
print(f"Average accuracy across {n_splits} folds: {np.mean(fold_accuracies):.4f}")

# Evaluate on the entire dataset
y_pred = final_model.predict(X)
y_pred_classes = np.argmax(y_pred, axis=1)

In [None]:
# Print classification report
print(classification_report(y_encoded, y_pred_classes))

# Plot confusion matrix
cm = confusion_matrix(y_encoded, y_pred_classes)
plt.figure(figsize=(20, 20))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

### Use Nested cross-validation and GridSearchCV as hyperparameter in order to check which is better

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.regularizers import l2
import matplotlib.pyplot as plt
import seaborn as sns

# Load and preprocess the data
data = pd.read_csv('final_dataset.csv')
X = data.iloc[:, :-1].values / 255.0
y = data.iloc[:, -1].values

# Encode the target variable
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
y_categorical = to_categorical(y_encoded)

# Compute class weights
class_weights = compute_class_weight('balanced', classes=np.unique(y_encoded), y=y_encoded)
class_weight_dict = dict(enumerate(class_weights))

# Define the model creation function
def create_model(learning_rate=0.001, dropout_rate=0.3, l2_reg=0.001):
    model = Sequential([
        Dense(256, activation='relu', input_shape=(X.shape[1],), kernel_regularizer=l2(l2_reg)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(128, activation='relu', kernel_regularizer=l2(l2_reg)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(64, activation='relu', kernel_regularizer=l2(l2_reg)),
        BatchNormalization(),
        Dropout(dropout_rate),
        Dense(y_categorical.shape[1], activation='softmax')
    ])
    
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Wrapper for sklearn's GridSearchCV
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=32, verbose=0)

# Define the parameter grid
param_grid = {
    'learning_rate': [0.1, 0.01, 0.001],
    'dropout_rate': [0.3, 0.4, 0.5],
    'l2_reg': [0.01, 0.001, 0.0001]
}

# Outer cross-validation
outer_cv = KFold(n_splits=5, shuffle=True, random_state=42)

# Inner cross-validation
inner_cv = KFold(n_splits=3, shuffle=True, random_state=42)

# GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=inner_cv, n_jobs=-1, verbose=1)

# Nested cross-validation
cv_scores = []

for fold, (train_index, test_index) in enumerate(outer_cv.split(X, y_encoded), 1):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y_categorical[train_index], y_categorical[test_index]
    
    # Perform GridSearchCV on the training data
    grid_result = grid_search.fit(X_train, y_train, class_weight=class_weight_dict)
    
    # Get the best model
    best_model = grid_result.best_estimator_
    
    # Evaluate on the test set
    score = best_model.score(X_test, y_test)
    cv_scores.append(score)
    
    print(f"Fold {fold} - Best parameters: {grid_result.best_params_}, Score: {score:.4f}")

# Print the cross-validation results
print("\nCross-validation scores:", cv_scores)
print(f"Mean CV score: {np.mean(cv_scores):.4f} (+/- {np.std(cv_scores) * 2:.4f})")

# Train the final model using the best parameters
best_params = grid_search.best_params_
final_model = create_model(learning_rate=best_params['learning_rate'],
                           dropout_rate=best_params['dropout_rate'],
                           l2_reg=best_params['l2_reg'])

# Fit the final model on the entire dataset
history = final_model.fit(X, y_categorical, epochs=100, batch_size=32, 
                          validation_split=0.2, class_weight=class_weight_dict, verbose=1)

# Plot learning curves
plt.figure(figsize=(12, 4))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.show()

# Generate predictions on the entire dataset
y_pred = final_model.predict(X)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true_classes = y_encoded

# Print classification report
print("\nClassification Report:")
print(classification_report(y_true_classes, y_pred_classes))

# Plot confusion matrix
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_true_classes, y_pred_classes)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()