### Description

Similar to RNN_1_2, this trains an LSTM RNN that takes a 150x5 array (4 nucleotides, 1 placeholder for missing section, and 1 placeholder for padding) and the gene expression (seperate) as inputs.

Changes:
1. Remove masked_input inside model architecture. 
2. Placeholder for missing section and Placeholder for padding are onehot encoded differently, changing the input shape to be (150x5)

Preveviously:
* '_': [0, 0, 0, 0],
* '0': [0, 0, 0, 0]

Now:
* '_': [0, 0, 0, 0, 0],
* '0': [0, 0, 0, 0, 1]

Next: change masking function

In [1]:
import RNN_1_3 as parent

In [2]:
name = 'RNN_1_3'

In [3]:
file_path = '../Data/combined/LaFleur_supp.csv'

df = parent.load_and_preprocess_data(file_path)

In [4]:
X_sequence, X_expressions, y = parent.preprocess_X_y(df)

In [5]:
X_sequence_train, X_sequence_test, X_expressions_train, X_expressions_test, y_train, y_test = parent.train_test_split(
        X_sequence, X_expressions, y, test_size=0.2, random_state=42)

In [6]:
model = parent.build_model(sequence_length=150, input_nucleotide_dim=5, output_nucleotide_dim=4, expression_dim=1)

In [7]:
parent.train_model(model, X_sequence_train, X_expressions_train, y_train, batch_size=32, epochs=10)

Epoch 1/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 45ms/step - accuracy: 0.8358 - loss: 0.6541 - val_accuracy: 0.9655 - val_loss: 0.1604
Epoch 2/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m122s[0m 125ms/step - accuracy: 0.9674 - loss: 0.1498 - val_accuracy: 0.9706 - val_loss: 0.1301
Epoch 3/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m77s[0m 58ms/step - accuracy: 0.9707 - loss: 0.1267 - val_accuracy: 0.9703 - val_loss: 0.1162
Epoch 4/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 40ms/step - accuracy: 0.9707 - loss: 0.1125 - val_accuracy: 0.9699 - val_loss: 0.1071
Epoch 5/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 55ms/step - accuracy: 0.9711 - loss: 0.1051 - val_accuracy: 0.9714 - val_loss: 0.1022
Epoch 6/10
[1m976/976[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 74ms/step - accuracy: 0.9716 - loss: 0.1005 - val_accuracy: 0.9716 - val_loss: 0.0975
Epoch 7/10
[1

<keras.src.callbacks.history.History at 0x222571c52b0>

In [8]:
loss, accuracy = parent.evaluate_model(model, X_sequence_test, X_expressions_test, y_test)
print(f'Loss: {loss}, Accuracy: {accuracy}')

[1m305/305[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 31ms/step - accuracy: 0.9719 - loss: 0.0877
Loss: 0.08784956485033035, Accuracy: 0.9717919230461121


In [9]:
model.save(f'../Models/{name}.keras')

#### From previous modeling:
Test Loss: 0.08784956485033035

Test Accuracy: 0.9717919230461121