# Multilayer Perceptum (MLP) <a name="multilayer"></a>


We are going to use **Multilayer Perceptron (MLP)** because it is a flexible neural network architecture. MLPs are great for solving **classification problems**

For this model we will define the model architecture and the training strategy consisting in:
- **Number of layers**
- **Number of neurons of each layer**
- **Choice of the activation functions**
- **Optimizer** 
- **Learning hyperparameters** (e.g., learning rate, mini-batch size, number of epochs, etc.)
- **Regularization techniques to adopt** (e.g., early stopping, weight regularization, dropout)

The network works by processing data through **multiple layers**, with each layer learning to capture different features of the input data.

### Model architecture definition and Training Strategy <a name="archi-train"></a>
[[go back to the top]](#multilayer)

For the architecture of our MLP model we need, as mentioned above, the number of layers, neurons, and choose the activation functions such as relu, softmax and Tanh for example.

We used **dictionaries** to organize and store different options for **hyperparameters**. This allows us to easily experiment with different configurations and manage the settings efficiently.

To optimize our model, we decided to do a grid search to **update and select the best hyperparameter combination in the first iteration**, for one epoch only in order to be less complex. This means that in the beginning, we test several combinations of hyperparameters to find the one that performs best. By doing this, we can quickly narrow down the best model for our task, improving the **accuracy** of the predictions.

Additionally, we will use the **ADAM** optimizer, which is a popular choice for training neural networks due to its adaptive learning rate and efficient performance.
We also implemented **early stopping** to prevent overfitting by monitoring the model's performance and halting training when it stops improving.

In this way, the process of **testing and updating** in the first iteration helps us fine-tune the model efficiently, and **selecting the best combination** ensures we are using the most effective settings for our dataset.

The following table defines the possible combinations of hyperparameters we tested:

| <span style="color: #C70039;">**Hyperparameter**</span> | <span style="color: #C70039;">**Options**</span>        |
|-----------------------------------------------------|-------------------------------------------------------|
| <span style="color: #00bfae;">**Hidden Units**</span> | [128, 64, 32], [256, 128, 64], [256, 128, 64, 32]                |
| <span style="color: #00bfae;">**Activation Functions**</span> | reLU, sigmoid, tanh                             |
| <span style="color: #00bfae;">**Dropout Rate**</span> | 0.3, 0.5                                          |
| <span style="color: #00bfae;">**Batch Size**</span>   | 32                                               |
| <span style="color: #00bfae;">**Epochs**</span>       | 1                                                 |
| <span style="color: #00bfae;">**Regularizations**</span>       | None, L1 (Lasso), L2 (Ridge)                                            |
| <span style="color: #00bfae;">**Learning Rate**</span> | 0.001, 0.0001                                     |


<span style="color: #C70039;">**Note:**</span>
- <span style="color: #00bfae;">**Hidden Units**</span> consists in the number of layers and the number of each neurons of each layer, for example in this case [256, 128, 64], it defines 3 layers with 256, 128 and 64 neurons, respectively.


##### Imports

In [26]:
import tensorflow as tf
import numpy as np
import pandas as pd
import itertools
from pathos.multiprocessing import Pool
import random 
import os

##### MLP Implementation

In [27]:
class MLP(tf.keras.Model):
    def __init__(self, input_dim, output_dim, hidden_units, dropout_rate, activations, regularization_type=None, regularization_value=0.01):
        super(MLP, self).__init__()
        self.hidden_layers = []
        self.regularization_type = regularization_type
        self.regularization_value = regularization_value

        # Construção das camadas ocultas
        for units, activation in zip(hidden_units, activations):
            self.hidden_layers.append(
                tf.keras.layers.Dense(units, activation=activation)
            )
            self.hidden_layers.append(tf.keras.layers.Dropout(dropout_rate))
        
        self.output_layer = tf.keras.layers.Dense(output_dim, activation='softmax')  # Classificação multi-classe


    def call(self, inputs):
        x = inputs
        for layer in self.hidden_layers:
            x = layer(x)
        return self.output_layer(x)
    
    def compute_regularization_loss(self):
        regularization_loss = 0.0
        if self.regularization_type:
            for layer in self.hidden_layers:
                if isinstance(layer, tf.keras.layers.Dense):
                    weights = layer.kernel
                    if self.regularization_type == 'l1':
                        regularization_loss += tf.reduce_sum(tf.abs(weights)) * self.regularization_value
                    elif self.regularization_type == 'l2':
                        regularization_loss += tf.reduce_sum(tf.square(weights)) * self.regularization_value
        return regularization_loss

In [28]:
def generate_configs(configurations):
    keys, values = zip(*configurations.items())
    return [dict(zip(keys, v)) for v in itertools.product(*values)]

# Função para carregar os dados de um fold específico
def load_fold_data(fold_number, files):
    data = pd.read_csv(files[fold_number])
    labels = data.pop('Label').values
    features = data.values
    return features, labels

In [None]:
# Treinar e avaliar o modelo
def train_evaluate_model(config, X_train, y_train, X_val, y_val):
    model = MLP(
        input_dim=X_train.shape[1],
        output_dim=10,
        hidden_units=config['hidden_units'],
        dropout_rate=config['dropout_rate'],
        activations=config['activations'],
        regularization_type=config.get('regularization_type', None),
        regularization_value=config.get('regularization_value', 0.01)
    )
    
    # Função de perda com regularização
    def loss_with_regularization(y_true, y_pred):
        base_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
        regularization_loss = model.compute_regularization_loss()
        return base_loss + regularization_loss

    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=config['learning_rate']),
        loss=loss_with_regularization,
        metrics=['accuracy']
    )
    
    early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    )
    history = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        batch_size=config['batch_size'],
        epochs=config['epochs'],
        callbacks=[early_stopping],
        verbose=0
    )

    return max(history.history['val_accuracy'])


## Cross-validation, apenas uma iteração (1 fold)
def cross_validate_model(config, files):
    # Apenas o primeiro fold para validação
    fold_number = 0
    X_val, y_val = load_fold_data(fold_number, files)
    X_train, y_train = [], []
    
    # Treino com os outros folds
    for i in range(len(files)):
        if i != fold_number:
            X_temp, y_temp = load_fold_data(i, files)
            X_train.append(X_temp)
            y_train.append(y_temp)
    
    X_train = np.concatenate(X_train, axis=0)
    y_train = np.concatenate(y_train, axis=0)
    
    # Treinar e avaliar para este fold
    accuracy = train_evaluate_model(config, X_train, y_train, X_val, y_val)
    return accuracy  # Retorna a acurácia deste único fold

# Função para avaliação em paralelo
def evaluate_config_parallel(args):
    config, files = args
    accuracy = cross_validate_model(config, files)
    
    # Salvar os resultados no arquivo
    with open("results_log.txt", "a") as f:
        f.write(f"Configuration: {config} | Accuracy: {accuracy}\n")
    
    return config, accuracy

def generate_activation_combinations(hidden_units_list, activations_list = ['relu', 'relu', 'relu'], num_combinations=1):
    activation_combinations = []
    for hidden_units in hidden_units_list:
        layers = len(hidden_units)
        # Gerar combinações aleatórias de ativações para este número de camadas
        for _ in range(num_combinations):
            random_combination = [random.choice(activations_list) for _ in range(layers)]
            activation_combinations.append((hidden_units, random_combination))  # Retorna como tupla
    return activation_combinations


hidden_units_list = [[128, 64, 32], [256, 128, 64]]

# Gerar combinações de hidden_units e activations
activation_combinations = generate_activation_combinations(hidden_units_list)

# Separar hidden_units e activations em listas distintas
hidden_units = [combo[0] for combo in activation_combinations]
activations = [combo[1] for combo in activation_combinations]

# Definições de hiperparâmetros corrigidas
configurations = {
    "hidden_units": hidden_units,  # Somente os hidden_units
    "activations": activations,    # Somente as ativações correspondentes
    "dropout_rate": [0.2, 0.3, 0.4],
    "batch_size": [32,64],
    "epochs": [20,50],
    "learning_rate": [0.001, 0.0001],
    "regularization_type": [None, 'l1', 'l2'],
    "regularization_value": [0.01, 0.001],
}

files = [f'datasets/urbansounds_features_fold{i}.csv' for i in range(1,11)] 

all_configs = generate_configs(configurations)
 
all_configs = random.sample(all_configs, k=300)  # Exemplo: selecionar 100 aleatoriamente


if __name__ == '__main__':
    
    if os.path.exists("results_log.txt"):
        os.remove("results_log.txt")
    if os.path.exists("erros.txt"):
        os.remove("erros.txt")
    
    num_workers = 8
    with Pool(num_workers) as pool:
        results = pool.map(evaluate_config_parallel, [(config, files) for config in all_configs])

    # Encontrar a melhor configuração
    best_config, best_accuracy = max(results, key=lambda x: x[1])
    print(f"Best configuration: {best_config}, Best accuracy: {best_accuracy}")

Best configuration: {'hidden_units': [256, 128, 64], 'activations': ['relu', 'relu', 'relu'], 'dropout_rate': 0.3, 'batch_size': 64, 'epochs': 20, 'learning_rate': 0.0001, 'regularization_type': None, 'regularization_value': 0.01}, Best accuracy: 0.7216494679450989


Based on the optimal hyperparameter configuration identified: **'hidden_units':** [256, 128, 64], **'activations':** ['relu', 'relu', 'relu'], **'dropout_rate':** 0.3, **'batch_size':** 64, **'epochs':** 20, **'learning_rate':** 0.0001, **'regularization_type':** None, **'regularization_value':** 0.01** with a **best accuracy** of 0.7216, we will now implement data augmentation techniques to evaluate and enhance model robustness.

In [None]:
import pandas as pd
import numpy as np
import os

#Pitching
# Funções de data augmentation
def add_noise(features, noise_level=0.01):
    noise = np.random.normal(0, noise_level, features.shape)
    return features + noise

def scale_features(features, scale_range=(0.9, 1.1)):
    scale_factor = np.random.uniform(scale_range[0], scale_range[1], features.shape)
    return features * scale_factor

def augment_data(features, labels, augmentation_count=2):
    augmented_features = []
    augmented_labels = []
    for _ in range(augmentation_count):
        augmented_features.append(add_noise(features))
        augmented_features.append(scale_features(features))
        augmented_labels.extend(labels)
        augmented_labels.extend(labels)
    return np.vstack(augmented_features), np.array(augmented_labels)

# Processar os arquivos CSVs
def augment_csv_files(files, output_dir="augmented_datasets"):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    for file in files:
        data = pd.read_csv(file)
        labels = data.pop('Label').values
        features = data.values
        
        # Realizar data augmentation
        aug_features, aug_labels = augment_data(features, labels, augmentation_count=2)
        
        # Combinar os dados originais com os aumentados
        combined_features = np.vstack((features, aug_features))
        combined_labels = np.hstack((labels, aug_labels))
        
        # Salvar o novo CSV
        combined_data = pd.DataFrame(combined_features, columns=data.columns)
        combined_data['Label'] = combined_labels
        output_file = os.path.join(output_dir, os.path.basename(file))
        combined_data.to_csv(output_file, index=False)
        print(f"Augmented data saved to {output_file}")

# Aplicar a função nos arquivos originais
files = [f'datasets/urbansounds_features_fold{i}.csv' for i in range(1, 11)]
augment_csv_files(files)


Augmented data saved to augmented_datasets/urbansounds_features_fold1.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold2.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold3.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold4.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold5.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold6.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold7.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold8.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold9.csv
Augmented data saved to augmented_datasets/urbansounds_features_fold10.csv


In [14]:
import numpy as np

# Função para executar validação cruzada com a melhor configuração
def cross_validate_best_config(config, files):
    accuracies = []
    
    # Para cada fold, use-o como validação e os demais como treino
    for fold_number in range(len(files)):
        # Carregar o fold atual como validação
        X_val, y_val = load_fold_data(fold_number, files)
        X_train, y_train = [], []
        
        # Combinar os outros folds como treino
        for i in range(len(files)):
            if i != fold_number:
                X_temp, y_temp = load_fold_data(i, files)
                X_train.append(X_temp)
                y_train.append(y_temp)
        
        X_train = np.concatenate(X_train, axis=0)
        y_train = np.concatenate(y_train, axis=0)
        
        # Treinar e avaliar para este fold
        accuracy = train_evaluate_model(config, X_train, y_train, X_val, y_val)
        accuracies.append(accuracy)
        print(f"Fold {fold_number + 1}/{len(files)}: Accuracy = {accuracy:.4f}")
    
    # Calcular média e desvio-padrão
    mean_accuracy = np.mean(accuracies)
    std_accuracy = np.std(accuracies)
    print(f"\nCross-Validation Results: Mean Accuracy = {mean_accuracy:.4f}, Std Dev = {std_accuracy:.4f}")
    return mean_accuracy, std_accuracy

# Melhor configuração encontrada
best_config = {
    "hidden_units": [256, 128, 64],
    "activations": ['relu', 'relu', 'relu'],
    "dropout_rate": 0.3,
    "batch_size": 64,
    "epochs": 20,
    "learning_rate": 0.0001,
    "regularization_type": None,
    "regularization_value": 0.01
}

# Lista de arquivos dos folds (os dados aumentados também podem ser usados aqui)
files = [f'augmented_datasets/urbansounds_features_fold{i}.csv' for i in range(1, 11)]

if __name__ == '__main__':
    # Rodar a validação cruzada com a melhor configuração
    mean_accuracy, std_accuracy = cross_validate_best_config(best_config, files)


Fold 1/10: Accuracy = 0.6706
Fold 2/10: Accuracy = 0.6036
Fold 3/10: Accuracy = 0.6045
Fold 4/10: Accuracy = 0.6097
Fold 5/10: Accuracy = 0.6799
Fold 6/10: Accuracy = 0.5548
Fold 7/10: Accuracy = 0.5804
Fold 8/10: Accuracy = 0.6476
Fold 9/10: Accuracy = 0.6172
Fold 10/10: Accuracy = 0.6562

Cross-Validation Results: Mean Accuracy = 0.6224, Std Dev = 0.0382


In [25]:
import itertools
import numpy as np

def load_augmented_data(fold_number, augmented_files):
    # Função para carregar dados aumentados
    return load_fold_data(fold_number, augmented_files)

def cross_validate_with_test(files, augmented_files, best_config):
    """
    Executa a validação cruzada 10-fold com o seguinte esquema:
    - 1 fold para teste (dados originais).
    - 1 fold para validação (dados originais).
    - 8 folds para treinamento (dados aumentados).
    """
    # Combinações de folds: [(test_fold, val_fold)]
    folds = list(range(1,len(files)))
    combinations = [(test, val) for test, val in itertools.permutations(folds, 2)]
    
    test_accuracies = []

    for test_fold, val_fold in combinations:
        # Carregar dados de teste (originais)
        X_test, y_test = load_fold_data(test_fold, files)

        # Carregar dados de validação (originais)
        X_val, y_val = load_fold_data(val_fold, files)

        # Carregar dados de treinamento (aumentados)
        train_folds = [i for i in folds if i != test_fold and i != val_fold]
        X_train, y_train = [], []
        for fold in train_folds:
            X_temp, y_temp = load_augmented_data(fold, augmented_files)
            X_train.append(X_temp)
            y_train.append(y_temp)
        
        X_train = np.concatenate(X_train, axis=0)
        y_train = np.concatenate(y_train, axis=0)

        # Treinar e avaliar o modelo
        accuracy = train_evaluate_model(best_config, X_train, y_train, X_val, y_val)

        # Avaliar no conjunto de teste
        test_loss, test_accuracy = evaluate_on_test(best_config, X_train, y_train, X_test, y_test)
        test_accuracies.append(test_accuracy)

        print(f"Test Fold: {test_fold}, Validation Fold: {val_fold}, Test Accuracy: {test_accuracy}")

    # Cálculo da média e desvio padrão da acurácia no teste
    mean_accuracy = np.mean(test_accuracies)
    std_accuracy = np.std(test_accuracies)

    print(f"Mean Test Accuracy: {mean_accuracy}, Std Dev: {std_accuracy}")
    return mean_accuracy, std_accuracy

def evaluate_on_test(config, X_train, y_train, X_test, y_test):
    """
    Treina o modelo com o conjunto completo de treinamento e avalia no conjunto de teste.
    """
    model = MLP(
        input_dim=X_train.shape[1],
        output_dim=10,
        hidden_units=config['hidden_units'],
        dropout_rate=config['dropout_rate'],
        activations=config['activations'],
        regularization_type=config.get('regularization_type', None),
        regularization_value=config.get('regularization_value', 0.01)
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=config['learning_rate']),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=['accuracy']
    )

    model.fit(X_train, y_train, batch_size=config['batch_size'], epochs=config['epochs'], verbose=0)
    

    test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
    return test_loss, test_accuracy

if __name__ == '__main__':
    files = [f'datasets/urbansounds_features_fold{i}.csv' for i in range(1, 11)]
    augmented_files = [f'augmented_datasets/urbansounds_features_fold{i}.csv' for i in range(1, 11)]
    
    mean_accuracy, std_accuracy = cross_validate_with_test(files, augmented_files, best_config)
    print(f"Final Mean Accuracy: {mean_accuracy}, Final Std Dev: {std_accuracy}")


Test Fold: 1, Validation Fold: 2, Test Accuracy: 0.5990990996360779
Test Fold: 1, Validation Fold: 3, Test Accuracy: 0.5495495200157166
Test Fold: 1, Validation Fold: 4, Test Accuracy: 0.537162184715271
Test Fold: 1, Validation Fold: 5, Test Accuracy: 0.6036036014556885
Test Fold: 1, Validation Fold: 6, Test Accuracy: 0.5213963985443115
Test Fold: 1, Validation Fold: 7, Test Accuracy: 0.5945945978164673
Test Fold: 1, Validation Fold: 8, Test Accuracy: 0.5765765905380249
Test Fold: 1, Validation Fold: 9, Test Accuracy: 0.5990990996360779
Test Fold: 2, Validation Fold: 1, Test Accuracy: 0.5448648929595947
Test Fold: 2, Validation Fold: 3, Test Accuracy: 0.5372973084449768
Test Fold: 2, Validation Fold: 4, Test Accuracy: 0.5751351118087769
Test Fold: 2, Validation Fold: 5, Test Accuracy: 0.5481081008911133
Test Fold: 2, Validation Fold: 6, Test Accuracy: 0.5243242979049683
Test Fold: 2, Validation Fold: 7, Test Accuracy: 0.4972972869873047
Test Fold: 2, Validation Fold: 8, Test Accuracy: 

2024-11-29 17:11:10.031142: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: INVALID_ARGUMENT: Incompatible shapes: [64] vs. [0]
	 [[{{function_node __inference_one_step_on_data_7025009}}{{node adam/truediv_7}}]]


InvalidArgumentError: Graph execution error:

Detected at node adam/truediv_7 defined at (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel_launcher.py", line 17, in <module>

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/traitlets/config/application.py", line 1043, in launch_instance

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/kernelapp.py", line 725, in start

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/tornado/platform/asyncio.py", line 215, in start

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 596, in run_forever

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/events.py", line 80, in _run

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/kernelbase.py", line 502, in process_one

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/kernelbase.py", line 729, in execute_request

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/ipkernel.py", line 422, in do_execute

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/ipykernel/zmqshell.py", line 540, in run_cell

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code

  File "/var/folders/qx/kgqzhwb50b7flqgr05m062t40000gn/T/ipykernel_57880/3902091085.py", line 85, in <module>

  File "/var/folders/qx/kgqzhwb50b7flqgr05m062t40000gn/T/ipykernel_57880/3902091085.py", line 43, in cross_validate_with_test

  File "/var/folders/qx/kgqzhwb50b7flqgr05m062t40000gn/T/ipykernel_57880/3902091085.py", line 75, in evaluate_on_test

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/trainer.py", line 320, in fit

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/trainer.py", line 121, in one_step_on_iterator

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/trainer.py", line 108, in one_step_on_data

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/trainer.py", line 73, in train_step

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/optimizers/base_optimizer.py", line 344, in apply_gradients

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/optimizers/base_optimizer.py", line 409, in apply

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/optimizers/base_optimizer.py", line 472, in _backend_apply_gradients

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/optimizer.py", line 122, in _backend_update_step

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/optimizer.py", line 136, in _distributed_tf_update_step

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/optimizer.py", line 133, in apply_grad_to_update_var

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/optimizers/adam.py", line 147, in update_step

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/ops/numpy.py", line 5876, in divide

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/sparse.py", line 780, in sparse_wrapper

  File "/Users/franciscamihalache/Library/Python/3.9/lib/python/site-packages/keras/src/backend/tensorflow/numpy.py", line 2316, in divide

Incompatible shapes: [64] vs. [0]
	 [[{{node adam/truediv_7}}]] [Op:__inference_one_step_on_iterator_7025062]