Should aim for:

-If all downloads successful-
[INFO] Feature shape: (89598, 512)
[INFO] Labels shape: (89598,)
[INFO] Label distribution (0=normal, 1=tumor): [82115  7483]
[INFO] PCA explained variance ratio (2 components): [0.73787075 0.2177505 ]
[INFO] PCA mean for class 0: [ 0.1315507 -0.3937801]
[INFO] PCA mean for class 1: [-1.4437389  4.3208265]

Total: 89,598 patches

Normal: 82,115

Tumor: 7,483

Class imbalance: ~11:1 (normal:tumor)

PC1: 73.8%, PC2: 21.8% → Strong dimensionality reduction, nice for visual separation.

Normal (label 0): [ 0.13, -0.39 ]

Tumor (label 1): [ -1.44, 4.32 ]

These means are well separated → strong sign your ResNet is learning class-discriminative features.

### **Evaluate Feature Extractor**

In [None]:
# Sanity Check of Extracted Patch Features 

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
import seaborn as sns
import os
import numpy as np

# Load extracted features and labels
features = np.load("patch_features_3.npy")     # shape: (N, 512)
labels = np.load("patch_labels_3.npy")         # shape: (N,)

print(f"Feature shape: {features.shape}")
print(f"Labels shape: {labels.shape}")
print(f"Label distribution: {np.bincount(labels)}")  # 0 = normal, 1 = tumor

# --------------------------------------
# 1. PCA Visualization (first 2 components)
# --------------------------------------
pca = PCA(n_components=2)
features_pca = pca.fit_transform(features)

plt.figure(figsize=(8,6))
sns.scatterplot(x=features_pca[:,0], y=features_pca[:,1], hue=labels, palette='Set1')
plt.title("PCA: Patch Features (2D)")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.legend(["Normal", "Tumor"])
plt.show()

# --------------------------------------
# 2. t-SNE Visualization
# --------------------------------------
tsne = TSNE(n_components=2, perplexity=30, n_iter=1000, random_state=42)
features_tsne = tsne.fit_transform(features)

plt.figure(figsize=(8,6))
sns.scatterplot(x=features_tsne[:,0], y=features_tsne[:,1], hue=labels, palette='Set1')
plt.title("t-SNE: Patch Features (2D)")
plt.xlabel("Dim 1")
plt.ylabel("Dim 2")
plt.legend(["Normal", "Tumor"])
plt.show()

# --------------------------------------
# 3. Logistic Regression on Extracted Features
# --------------------------------------
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print(f"Logistic Regression Accuracy: {acc:.4f}")
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Normal", "Tumor"], yticklabels=["Normal", "Tumor"])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()


[INFO] Logistic Regression Accuracy: 0.8661 as for 26.06.2025

Features are nicely discriminative, given that the dataset is so imbalanced:)

[INFO] t-SNE mean for class 0: [  1.60,  -3.26]
[INFO] t-SNE mean for class 1: [ -1.30,  36.63]


The class means are far apart, especially along the second dimension (Y-axis). A ~40-unit difference is very large in t-SNE space, which usually compresses distances.