# 18S Model Training and Evaluation

**Objective:** To build, train, and evaluate a deep learning classifier for the 18S rRNA gene (Eukaryotes) using the pre-processed data.

**Methodology:**
1. Load the 18S-specific training/testing data and encoders from disk.
2. Define the neural network architecture.
3. Train the model on the training data using the GPU.
4. Save, reload, and evaluate the final model's accuracy on the unseen test set.

In [1]:
import numpy as np
import tensorflow as tf
from scipy.sparse import load_npz
import pickle
from pathlib import Path
import sys

# Set up project path
project_root = Path.cwd().parent

# --- Verification Step: Check for GPU ---
print("--- TensorFlow Setup ---")
print(f"TensorFlow Version: {tf.__version__}")
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    print(f"GPU detected: {gpu_devices[0]}")
else:
    print("WARNING: No GPU detected. TensorFlow will run on CPU.")
print("-" * 26)

--- TensorFlow Setup ---
TensorFlow Version: 2.10.1
GPU detected: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
--------------------------


In [2]:
# --- Define 18S-specific file paths ---
PROCESSED_DATA_DIR = project_root / "data" / "processed"
MODELS_DIR = project_root / "models"

X_TRAIN_PATH = PROCESSED_DATA_DIR / "X_train_18s.npz"
X_TEST_PATH = PROCESSED_DATA_DIR / "X_test_18s.npz"
Y_TRAIN_PATH = PROCESSED_DATA_DIR / "y_train_18s.npy"
Y_TEST_PATH = PROCESSED_DATA_DIR / "y_test_18s.npy"

LABEL_ENCODER_PATH = MODELS_DIR / "18s_genus_label_encoder.pkl"

# --- Load the data and encoders ---
print("--- Loading 18S Data ---")
X_train = load_npz(X_TRAIN_PATH)
X_test = load_npz(X_TEST_PATH)
y_train = np.load(Y_TRAIN_PATH)
y_test = np.load(Y_TEST_PATH)

with open(LABEL_ENCODER_PATH, 'rb') as f:
    label_encoder = pickle.load(f)
print("Data loading complete.")

# --- Verification Step ---
print("\n--- Loaded Data Shapes ---")
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print("-" * 30)
print(f"Shape of X_test:  {X_test.shape}")
print(f"Shape of y_test:  {y_test.shape}")
print(f"Number of classes (genera): {len(label_encoder.classes_)}")

--- Loading 18S Data ---
Data loading complete.

--- Loaded Data Shapes ---
Shape of X_train: (6427, 14058)
Shape of y_train: (6427,)
------------------------------
Shape of X_test:  (1607, 14058)
Shape of y_test:  (1607,)
Number of classes (genera): 616
