# MLP (PyTorch) — Step-by-Step Walkthrough (No CLI)

This notebook is a cell-by-cell version of your `run_model.py` 
for training and evaluating an MLP with repeated K-fold splits and scramble fractions — **but** designed for interactive exploration:

- **Change parameters** in one place (no command line).
- **Print & inspect** every intermediate artifact (CSV heads, shapes, metrics).
- **Run one fold** or the **full loop**.
- Save per-fold predictions and aggregated metrics just like the script.
- Optional early stopping.

> If a CSV or model is missing, the notebook prints a friendly message instead of crashing.

## 1) Parameters (edit here)

In [None]:
# ---- Core task/IO ----
MODE = 0                                # 0=train, 1=evaluate
MODEL_TYPE = "reg"                      # "reg" | "bin" | "mclass"
NUM_CLASSES = 3
DATA_SCALE = "log"

PREFIX = "gbsa"
DATA_DIR = "Data"
MODEL_DIR = "Model"
OUTPUT_FILE = "predictions"

KFOLD = 5
NUM_REPEATS = 1
SCRAMBLE_FRACTIONS = [0.0]

MAX_EPOCHS = 5000
LRN_RATE = 1e-4
WT_DECAY = 1e-4
DROPOUT_INPUT_OUTPUT = 0.1
DROPOUT_HIDDEN = 0.1
HIDDEN_SIZE = 44
HIDDEN_LAYERS = 3
USE_EARLY_STOPPING = True
PATIENCE = 50

REGRESSION_THRESHOLD_LOG = 0.0
REGRESSION_THRESHOLD_NONLOG = 1.0

REF_ID_COL = "sequence"
REF_LABEL_COL = "label"

RUN_ONLY_ONE_REPEAT_AND_FOLD = True
ONE_REPEAT_IDX = 0
ONE_FOLD_IDX = 0

from pathlib import Path
Path(DATA_DIR).mkdir(exist_ok=True, parents=True)
Path(MODEL_DIR).mkdir(exist_ok=True, parents=True)
print('Parameters set.')

## 2) Imports & logging

In [None]:
import os, csv, math, logging
from collections import defaultdict

import numpy as np
import pandas as pd
import torch as T
import matplotlib.pyplot as plt

from sklearn.metrics import matthews_corrcoef, accuracy_score, r2_score
from scipy.stats import pearsonr

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
print("Imports ready.")