# 13 - Scratchpad and Experimental Code

**Purpose**: This notebook serves as a development scratchpad for testing new ideas, running one-off analyses, and prototyping code before it is integrated into the main analysis workflows. The cells below may not represent a coherent, linear workflow and often contain fragmented or experimental code.

**Inputs**: Varies by cell, but often uses data loaded via `get_dataframes()` or `BayesianData`.

**Outputs**: Primarily in-line console output and visualizations for quick, interactive analysis.

### 13.1 PCA for Dimensionality Reduction

This cell explores using Principal Component Analysis (PCA) to reduce the dimensionality of the feature set...


In [None]:
%reload_ext autoreload
%autoreload 2

import polars as pl
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

from early_markers.cribsy.common.hold.data import get_dataframes


data = get_dataframes()

# PCA to reduce dimensionality
df = data["train"].with_columns(
    feature=pl.col("part") + "_" + pl.col("feature_name")
)
df_38 = data["features_38"]

df_train = df.join(
    df_38.with_columns(
        feature = pl.col("part") + "_" + pl.col("feature_name")
    ).select("feature"),
    on="feature",
    how="inner"
).pivot(
    on="feature",
    index="infant",
    values="Value",
)

pca = PCA(n_components=9)
ref_pca = pca.fit_transform(df_train.drop("infant"))

## Plan
- Get means and sds of features
- Calculate min/max N by number of classes
-

In [None]:
pca.explained_variance_ratio_

In [None]:


# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply PCA to reduce dimensionality
pca = PCA(n_components=2)  # Reduce to 2 dimensions
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Train Naive Bayes classifier on reduced data
gnb = GaussianNB()
gnb.fit(X_train_pca, y_train)

# Evaluate model performance
accuracy = gnb.score(X_test_pca, y_test)
print(f"Accuracy after PCA: {accuracy}")




In [None]:
SIZES = (50, 100)
PROBS = (0.008, 0.99)

def linear_interpolation(a, a0, a1, b0, b1):
    """
    Linearly interpolate to map a value from range A to range B.

    Parameters:
    a: The value to interpolate
    a0, a1: The min and max values of range A
    b0, b1: The min and max values of range B

    Returns:
    The interpolated value in range B
    """
    if a0 == a1:  # Handle division by zero case
        return (b0 + b1) / 2

    # Apply the linear interpolation formula
    return b0 + (a - a0) * (b1 - b0) / (a1 - a0)

# linear_interpolation(75, 50, 100, 0.0, 0.99)
linear_interpolation(0.95, 0.0, 0.99, 50, 100)

In [None]:
import numpy as np


def numpy_interpolation(a, range_a, range_b):
    if range_a[0] == range_a[1]:  # Handle division by zero case
        return (range_b[0] + range_b[1]) / 2

    return np.interp(a, range_a, range_b)

numpy_interpolation(0.95, (0.0, 0.99), (50, 100))

In [None]:
from statistics import mean

mean([0.929, 0.857, 0.714, 0.857, 0.857, 0.786, 0.786,])
# sens 0.8265714285714286
# spec 0.6857142857142857
mean([0.625, 0.675, 0.775, 0.650, 0.600, 0.775, 0.700,])