## Description

### PyTorch Hyperparameter Tuning (Bayesian Optimization)

A hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. For this CNN, the architecture hyperparameters we optimize are:
- The number and structure of Conv1D layers.
- Filter sizes, kernel sizes, and strides.
- Max-pooling sizes and activation functions for each layer.
- The size of the dense layer.
- The learning rate for optimization.

This approach uses PyTorch to make the sequence generation/infill easier.

### Architecture

Model chnages: higher TX/expression is now higher prediction.

This version optimizes the process of training/testing and uses hyperparameter tuning. It uses a similar architecture to CNN_5_0. It does not include augmented data, just takes the data from La Fleur's supplemental materials including:
- La Fleur et al (and De Novo Designs)
- Urtecho et al
- Hossain et al
- Yu et al
- Lagator (36N, Pl, and Pr)
- Anderson Series

We onehot encode each basepair and pad the whole sequence. Because we use a CNN which is designed to identify "features," the input promoter can be any length (with padding) and the model will be able to accurately predict the expression.

In [1]:
from CNN_6_2 import *

In [2]:
epochs = 100

# Documentation variables
name = 'CNN_6_0'
model_path = f'v2/Models/{name}.pt'
data_dir = 'v2/Data/Train Test/'

# Load and split the data
X_train, y_train = load_features(f'{data_dir}train_data.csv')
X_test, y_test = load_features(f'{data_dir}test_data.csv')
X_train = X_train.transpose(0, 2, 1)
X_test = X_test.transpose(0, 2, 1)

input_shape = (X_train.shape[0], X_train.shape[1], X_train.shape[2])

In [None]:
# Perform hyperparameter search
best_params = hyperparameter_search(X_train, y_train, input_shape, epochs)
print("Best Hyperparameters:", best_params)

In [None]:
# Train the best model
model = PyTorchRegressor(input_shape, best_params, epochs=epochs)
model.fit(X_train, y_train)

In [None]:
# Make predictions and evaluate
y_pred = model.predict(X_test)
metrics = calc_metrics(y_test, y_pred)
print("Performance Metrics:", metrics)

In [None]:
# Save the model
save_model(model, model_path)