# Fragment ion intensities Prediction

This notebook is prepared to be run in Google [Colaboratory](https://colab.research.google.com/). In order to train the model faster, please change the runtime of Colab to use Hardware Accelerator, either GPU or TPU.

### Task 1: Learning Rate
We delve into the pivotal role of the learning rate in training neural networks, a crucial factor impacting model performance. Your objective is to identify the ideal learning rate for our dataset. This entails experimenting with various values. These values directly influence how the model adjusts its parameters during training, determining its overall performance. Explore how changing learning rate influences loss dynamics and other metrics.

In [None]:
# install the mlomix package in the current environment using pip

!python -m pip install -q dlomix==0.0.4

In [None]:
!python -m pip install -q wandb

In [None]:
import numpy as np
import pandas as pd
import dlomix
from dlomix.models import PrositIntensityPredictor
import tensorflow as tf
from dlomix.losses import masked_spectral_distance, masked_pearson_correlation_distance
tf.get_logger().setLevel('ERROR')

import wandb
from wandb.keras import WandbCallback

In [None]:
# enter project name for weights and biases
project_name = 'learning_rate'

In [None]:
from dlomix.data import IntensityDataset

TRAIN_DATAPATH = 'https://raw.githubusercontent.com/wilhelm-lab/dlomix-resources/main/example_datasets/Intensity/proteomeTools_train_val.csv'
BATCH_SIZE = 64

int_data = IntensityDataset(data_source=TRAIN_DATAPATH, seq_length=30,
                            collision_energy_col='collision_energy', batch_size=BATCH_SIZE, val_ratio=0.2, test=False)

In [None]:
# Enter weights and biases run name. Make sure that different learning rates have different run names.
wandb.init(project=project_name, name=)

# create model
model = PrositIntensityPredictor(seq_length=30)

optimizer = tf.keras.optimizers.Adam(learning_rate=)

# compile the model  with the optimizer and the metrics we want to use, we can add our custom time-delta metric
model.compile(optimizer=optimizer,
              loss=masked_spectral_distance, metrics=[masked_pearson_correlation_distance, 'mean_absolute_error', 'mse'])

history = model.fit(int_data.train_data, validation_data=int_data.val_data,
                    epochs=15, callbacks=[WandbCallback(save_model=False)])


# Mark the run as finished
wandb.finish()

### Task 2: Model Architecture
In this task, we are going to change model architecture (embedding output dimensionality and encoder layer type) to explore how this would change the model performance. The parameter `embedding_dim` is the size of the vector representing each amino acid, the higher it is, the more representative power it has. The `recurrent_layers_sizes` is the number of units in the two GRU layers in the model encoder, the higher it is, the more parameters the model will have and that can help with detecting complex patterns but can also lead to overfitting. Explore how changing them would change the model performance. Change one thing at a time to see how it will affect the model.

In [None]:
# enter project name for weights and biases
project_name = 'model_architecture'

In [None]:
# Enter weights and biases run name. Make sure that different models have different run names.
wandb.init(project=project_name, name=)

# create model
model = PrositIntensityPredictor(seq_length=30, embedding_output_dim=,
        recurrent_layers_sizes=)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

# compile the model  with the optimizer and the metrics we want to use, we can add our custom time-delta metric
model.compile(optimizer=optimizer,
              loss=masked_spectral_distance, metrics=[masked_pearson_correlation_distance])

history = model.fit(int_data.train_data, validation_data=int_data.val_data,
                    epochs=30, callbacks=[WandbCallback(save_model=False)])

# Mark the run as finished
wandb.finish()