The provided code is a machine learning pipeline that uses the XceptionNet2048 model combined with Support Vector Machine (SVM) for binary classification, specifically for necrosis detection in images. The process involves several steps:

Data Preparation: The training and testing datasets are read from CSV files, which contain features extracted from images processed using the XceptionNet model (both original and segmented). The two datasets (original and segmented) are merged into one, and duplicated columns are removed.

Feature and Label Extraction: The code separates the feature columns (inputs) and the label column (output) from the training and testing datasets. The labels represent whether the image contains necrosis (label 1) or not (label 0).

Model Training: A Support Vector Classifier (SVC) is used to train the model on the extracted features from the training set. The SVC is trained with probability=True to also allow probability estimates.

Model Evaluation: Once the model is trained, it is evaluated on the testing set. The predicted labels are compared with the true labels to calculate performance metrics, including:

Accuracy
Precision
Recall
F1 score
Sensitivity (true positive rate)
Specificity (true negative rate)
Negative Predictive Value (NPV)
Positive Predictive Value (PPV)
Matthews Correlation Coefficient (MCC)
Cohen's Kappa Score
Area Under the ROC Curve (AUC)
Metrics Storage: The calculated metrics, along with the number of necrosis and non-necrosis samples in the test set, are saved into a CSV file in the designated directory (save_drive_dir).

Batch Processing: The script loops through multiple batches (1 to 15), processing data for each batch separately, training models, and saving results for each batch.

In summary, the code uses XceptionNet2048 for feature extraction, followed by SVM for classification. The performance metrics of the model are calculated and saved, providing insights into the model's ability to classify necrosis in images. This approach involves batch processing to handle multiple datasets, making the pipeline scalable for large image-based classification tasks.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import shutil
import os

In [2]:
import cv2
import numpy as np
# from google.colab.patches import cv2_imshow
import pandas as pd
import matplotlib.pyplot as plt
import keras
import tensorflow
import seaborn as sns
import os
import shutil
import os.path
import glob
from skimage.io import imread, imshow
from skimage.exposure import histogram
# from google.colab.patches import cv2_imshow
import skimage.feature as feature
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
import matplotlib.pyplot as plot
import zipfile
import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score,recall_score,f1_score,precision_score

# %tensorflow_version 2.x

import warnings
warnings.filterwarnings("ignore")

import os
import glob
import cv2
from pathlib import Path
import keras

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from skimage.io import imread, imsave
from skimage.transform import resize
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras import layers
from tensorflow.keras.applications import DenseNet169,Xception,MobileNet,ResNet50,DenseNet121,EfficientNetB0,VGG16,MobileNetV2,ResNet101,InceptionResNetV2,InceptionV3,NASNetMobile
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD, Adam, RMSprop
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout,BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score,recall_score,f1_score,precision_score,cohen_kappa_score,matthews_corrcoef,roc_auc_score


import tensorflow as tf
import tensorflow.keras.backend as K
import tensorflow_datasets as tfds
import tensorflow_hub as hub

# import imgaug as ia
# from imgaug import augmenters as iaa

from sklearn.svm import SVC
from sklearn.neighbors import NearestNeighbors
tf.config.run_functions_eagerly(True)
from sklearn.naive_bayes import GaussianNB

import tensorflow
from tensorflow.keras.callbacks import EarlyStopping,ReduceLROnPlateau,ModelCheckpoint


2025-12-02 23:40:06.704832: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-02 23:40:06.786074: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-02 23:40:08.593719: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [3]:
# Local base directories (replace Colab drive paths with local repository paths)
import os
BASE_DIR = '/home/llmPathoUser/pathologyStudentsAug25/pathologyStudentsAug25'
BASE_RESULTS_DIR = os.path.join(BASE_DIR, 'Results')
os.makedirs(BASE_RESULTS_DIR, exist_ok=True)


In [4]:
Batches=[1,2,3,4,5,6,7]

In [4]:
def XceptionNet2048_SVM_Segmented_Original(train_df_ori,train_df_seg,test_df_ori,test_df_seg,Batch):
  save_drive_dir = os.path.join(BASE_RESULTS_DIR, 'Final_Model')

  train_df_seg = train_df_seg.rename(columns=lambda c: f"seg_{c}" if c.startswith("feature_") else c)
  test_df_seg  = test_df_seg.rename(columns=lambda c: f"seg_{c}"  if c.startswith("feature_") else c)

  final_training_df=pd.concat([train_df_ori,train_df_seg],axis=1)
  final_testing_df=pd.concat([test_df_ori,test_df_seg],axis=1)

  final_training_df = final_training_df.loc[:, ~final_training_df.columns.duplicated(keep='last')]
  final_testing_df = final_testing_df.loc[:, ~final_testing_df.columns.duplicated(keep='last')]

  print('Size of Training Set is', final_training_df.shape)
  print('Size of Training Set is', final_testing_df.shape)

  print(final_training_df)
  print(final_testing_df)

  final_training_df=final_training_df.drop(['Image_Name'],axis=1)
  final_testing_df=final_testing_df.drop(['Image_Name'],axis=1)

  X_train=final_training_df.iloc[:,:-1].values
  Y_train=final_training_df.iloc[:,-1].values

  X_test=final_testing_df.iloc[:,:-1].values
  Y_test=final_testing_df.iloc[:,-1].values

  model=SVC(probability=True)
  model.fit(X_train,Y_train)
  
  # Save the trained model
  model_save_path = os.path.join(save_drive_dir, Batch, 'svm_model.joblib')
  joblib.dump(model, model_save_path)
  print(f"Model saved to: {model_save_path}")

  y_pred=model.predict(X_test)
  probability=model.predict_proba(X_test)
  cm= confusion_matrix(Y_test, y_pred)
  TN=cm[0][0]
  TP=cm[1][1]
  FP=cm[0][1]
  FN=cm[1][0]
  precision=precision_score(Y_test,y_pred)
  recall=recall_score(Y_test,y_pred)
  f1score=f1_score(Y_test,y_pred)
  accuracy=accuracy_score(Y_test,y_pred)
  sensitivity=TP/(TP+FN)
  specificity=TN/(FP+TN)
  NPV=TN/(TN+FN)
  PPV=TP/(TP+FP)
  mcc=matthews_corrcoef(Y_test,y_pred)
  kappa_score=cohen_kappa_score(Y_test,y_pred)
  auc_score = roc_auc_score(Y_test, probability[:,1])
  columns_metrics=['model_Name','Necrosis','Non_Necrosis','TP','TN','FP','FN','Accuracy','Recall','F1_score','Precision','AUC_ROC','Sensitiviity','Specificity','MCC','Kappa_Score']

  dff_metrics=pd.DataFrame(columns=columns_metrics)

  Necrosis_samples=final_testing_df[final_testing_df['label']==1].shape[0]
  Non_Necrosis_samples=final_testing_df[final_testing_df['label']==0].shape[0]

  values=['Final_Model',Necrosis_samples,Non_Necrosis_samples,TP,TN,FP,FN,accuracy,recall,f1score,precision,auc_score,sensitivity,specificity,mcc,kappa_score]
  dff_metrics.loc[0]=values
  os.makedirs(os.path.join(save_drive_dir, Batch), exist_ok=True)
  dff_metrics.to_csv(os.path.join(save_drive_dir, Batch, 'Final_Model.csv'))

In [5]:
for x in range(16,20):
  batch_size = 32
  Batch_Name='Batch_'+str(x)
  print('Batch Name',Batch_Name)
  save_drive_dir = os.path.join(BASE_RESULTS_DIR, 'Final_Model')
  os.makedirs(os.path.join(save_drive_dir, Batch_Name), exist_ok=True)

  training_original = pd.read_csv(os.path.join(BASE_RESULTS_DIR, 'XceptionNet2048+SVM', Batch_Name, 'XceptionNet2048_Training.csv'))
  testing_original = pd.read_csv(os.path.join(BASE_RESULTS_DIR, 'XceptionNet2048+SVM', Batch_Name, 'XceptionNet2048_Testing.csv'))

  segmented_training = pd.read_csv(os.path.join(BASE_RESULTS_DIR, 'Segmented_unett_XceptionNet2048+SVM', Batch_Name, 'Segmented_XceptionNet2048_Training.csv'))
  segmented_testing = pd.read_csv(os.path.join(BASE_RESULTS_DIR, 'Segmented_unett_XceptionNet2048+SVM', Batch_Name, 'Segmented_XceptionNet2048_Testing.csv'))

  XceptionNet2048_SVM_Segmented_Original(training_original,segmented_training,testing_original,segmented_testing,Batch_Name)

Batch Name Batch_16
Size of Training Set is (4586, 4098)
Size of Training Set is (3514, 4098)
      feature_0  feature_1  feature_2  feature_3  feature_4  feature_5  \
0      0.000000   0.060937   0.000000   0.000000        0.0        0.0   
1      0.000000   0.000000   0.000000   0.000000        0.0        0.0   
2      0.000000   0.000000   0.000000   0.000000        0.0        0.0   
3      0.000000   0.025690   0.000000   0.000000        0.0        0.0   
4      0.000000   0.225065   0.000000   0.000000        0.0        0.0   
...         ...        ...        ...        ...        ...        ...   
4581   0.430963   0.432432   0.265586   0.362980        0.0        0.0   
4582   0.464536   0.356456   0.384729   0.380096        0.0        0.0   
4583   0.313136   0.244241   0.319663   0.482630        0.0        0.0   
4584   0.495823   0.485285   0.377986   0.109761        0.0        0.0   
4585   0.366865   0.687881   0.570664   0.000000        0.0        0.0   

      feature_6  

In [18]:
# Aggregate Final_Model.csv from all batches into a single Excel file
import os
import glob
import pandas as pd

# results directory (expects BASE_RESULTS_DIR to be defined earlier in the notebook)
results_dir = os.path.join(BASE_RESULTS_DIR, 'Final_Model')
pattern = os.path.join(results_dir, 'Batch_*', 'Final_Model.csv')

files = sorted(glob.glob(pattern))
if not files:
    print(f'No Final_Model.csv files found in: {os.path.join(results_dir, "Batch_*")}.\nMake sure the Results/Final_Model/Batch_X/Final_Model.csv files exist and that BASE_RESULTS_DIR is set correctly.')
else:
    dfs = []
    missing_batches = []
    # prefer explicit batch order (Batch_1 .. Batch_15)
    for b in range(1, 16):
        batch_name = f'Batch_{b}'
        file_path = os.path.join(results_dir, batch_name, 'Final_Model.csv')
        if os.path.exists(file_path):
            try:
                df = pd.read_csv(file_path)
                df['Batch'] = batch_name
                dfs.append(df)
            except Exception as e:
                print(f'Failed to read {file_path}: {e}')
        else:
            missing_batches.append(batch_name)

    if not dfs:
        print('No CSV files were read successfully. Exiting.')
    else:
        aggregated = pd.concat(dfs, ignore_index=True)
        out_path = os.path.join(results_dir, 'All_Batches_Final_Model_unett6.xlsx')
        try:
            aggregated.to_excel(out_path, index=False)
            print(f'Wrote aggregated Excel to: {out_path} (rows={len(aggregated)})')
            if missing_batches:
                print('Note: the following batches were missing or had no Final_Model.csv:', missing_batches)
        except Exception as e:
            print(f'Failed to write Excel file: {e}')


Wrote aggregated Excel to: /home/llmPathoUser/pathologyStudentsAug25/pathologyStudentsAug25/Results/Final_Model/All_Batches_Final_Model_unett6.xlsx (rows=15)
