# Evaluation

In this notebook, we will compare the ML pipeline from notebook 3 with the Deep Learning pipeline from notebook 4.

---

## Ensuring Fair Evaluation

This code sets the CPU affinity to core 0, ensuring that all threads and processes are restricted to a single CPU core. By doing so, you ensure consistent performance evaluation, as it prevents interference from other cores and provides a controlled environment for comparing methods on the same core.

In [1]:
import psutil
import os

# Set CPU affinity to core 0 (the first core)
p = psutil.Process(os.getpid())
p.cpu_affinity([0])

---

## Pipeline Evaluation

In this step, we are developing an evaluation strategy that aims to provide a comprehensive assessment of a given machine-learning pipeline by combining both performance metrics and resource usage metrics. Here’s what we’re trying to achieve:

1.	**Performance Metrics Evaluation**:
    - **Accuracy and F1 Score**: Measure the model’s ability to correctly classify images.
	- **Confusion Matrix Analysis**: Identify specific classes where the model may be underperforming.
2.	**Resource Utilization Assessment**:
	- **Evaluation Time**: Determine the total time taken for preprocessing and prediction.
	- **Pipeline Size**: Calculate the combined size of the trained model and preprocessing steps to understand storage requirements.
	- **Memory Consumption**: Monitor peak memory usage during evaluation to ensure it fits within hardware constraints.
	- **CPU Usage**: Measure average CPU utilization to evaluate computational efficiency.

The primary objective is to not only achieve high classification accuracy but also to assess and optimize the pipeline’s computational efficiency and resource utilization. This holistic approach ensures that the model is not just effective but also practical for deployment, especially in environments with limited computational resources.


In [2]:
from source.pre import evaluate_pipeline # A built-in function to evaluate a given ML pipeline by preprocessing, predicting, and calculating performance metrics.

**Inputs:**
- **model**: The trained machine learning model to evaluate.
- **X_test_raw**: Raw test data that needs to be preprocessed before evaluation.
- **y_test**: True labels corresponding to the test data for performance comparison.
- **preprocessing_fn**: A function used to preprocess the raw test data.
    
**Outputs:**

- **metrics**: A dictionary containing various evaluation metrics like accuracy, F1 score, evaluation time, memory usage, CPU usage, and pipeline size.

---

### Preprocessing (Testing data)

By applying the same preprocessing to both the training and testing data, we ensure consistent feature representation, which is essential for accurate predictions and prevents errors from data mismatches. This alignment improves model evaluation and generalization to new data. Additionally, combining all preprocessing steps into a single function allows the evaluation function to track execution time and resource usage, ensuring both consistency and computational efficiency across the entire pipeline.

### ML method Preprocessing

In [3]:
def preprocessing_fn_ML(X): # We apply the same pre-processing steps implemented in Notebook 3.
    from skimage.color import rgb2gray
    from skimage.transform import resize
    
    # Normalize the data to [0, 1]
    X_pre = X.astype('float32') / 255.0
    
    # Convert to grayscale
    X_pre = np.array([rgb2gray(image) for image in X_pre])
    
    # Resize images to 64x64 pixels
    X_pre = np.array([resize(image, (64, 64), anti_aliasing=True) for image in X_pre])
    
    # Flatten the images
    num_samples = X_pre.shape[0]
    X_pre = X_pre.reshape(num_samples, -1)
    
    return X_pre

---

### ML Evaluation

#### Import data and ML model

In [4]:
import numpy as np
# first let us load the testing data
test_images = np.load('test_images.npy')      # Load image test data
test_labels = np.load('test_labels.npy')      # Load label test data

In [5]:
import pickle
# Load the ml model from the 3rd notebook
with open('sgd_model.pkl', 'rb') as file:
    sgd_model = pickle.load(file)

In [6]:
# Assuming you have:
# - A trained model named like 'lr_model'
# - Raw test data 'X_test_raw'
# - True labels 'y_test'
# - All pre-processing methods gathered in one function



# Evaluate the pipeline
metrics = evaluate_pipeline(sgd_model, test_images, test_labels, preprocessing_fn_ML)

# Print the evaluation metrics
print("Evaluation Metrics:")
for key, value in metrics.items():
    if key == 'evaluation_time':
        print(f"{key}: {value:.2f} seconds")
    elif key == 'pipeline_size':
        print(f"{key}: {value:.2f} MB")
    elif key == 'peak_memory_usage':
        print(f"{key}: {value:.2f} MB")
    elif key == 'average_cpu_usage':
        print(f"{key}: {value:.2f}%")
    elif key == 'confusion_matrix':
        print(key)
        print(value) 
    else:
        print(f"{key}: {value:.4f}")

(2400,)
(2400,)
Evaluation Metrics:
evaluation_time: 31.41 seconds
peak_memory_usage: 13706.14 MB
average_cpu_usage: 99.64%
accuracy: 0.4317
f1_score: 0.3779
confusion_matrix
[[ 74   2  42  52 357]
 [  0 145   1   7   0]
 [ 32   0  76  38 135]
 [ 66   0  33  49 394]
 [ 92   0  54  59 692]]
pipeline_size: 0.08 MB


---

### CubeSatNet_CNN Evaluation

#### preprocessing

In [7]:
def preprocessing_fn_CNN(X):  # we did not use any preprocessing in notebook 4
    
    return X

#### Import CNN model

In [8]:
# Load the CNN model from the 4th notebook
with open('cnn_model.pkl', 'rb') as file:
    cnn_model = pickle.load(file)

2024-10-17 18:49:54.773744: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-10-17 18:49:54.779791: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-10-17 18:49:54.794494: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-17 18:49:54.815749: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-17 18:49:54.822115: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-17 18:49:54.841258: I tensorflow/core/platform/cpu_feature_gu

#### Evaluate the CNN pipeline

In [9]:
from keras.utils import to_categorical
test_labels = to_categorical(test_labels, num_classes=5)


# Evaluate the pipeline
metrics = evaluate_pipeline(cnn_model, test_images, test_labels, preprocessing_fn_CNN)

# Print the evaluation metrics
print("Evaluation Metrics:")
for key, value in metrics.items():
    if key == 'evaluation_time':
        print(f"{key}: {value:.2f} seconds")
    elif key == 'pipeline_size':
        print(f"{key}: {value:.2f} MB")
    elif key == 'peak_memory_usage':
        print(f"{key}: {value:.2f} MB")
    elif key == 'average_cpu_usage':
        print(f"{key}: {value:.2f}%")
    elif key == 'confusion_matrix':
        print(key)
        print(value) 
    else:
        print(f"{key}: {value:.4f}")

[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m232s[0m 3s/step
(2400,)
(2400,)
Evaluation Metrics:
evaluation_time: 264.08 seconds
peak_memory_usage: 6542.15 MB
average_cpu_usage: 88.56%
accuracy: 0.9979
f1_score: 0.9979
confusion_matrix
[[524   0   0   3   0]
 [  0 153   0   0   0]
 [  0   0 281   0   0]
 [  2   0   0 540   0]
 [  0   0   0   0 897]]
pipeline_size: 1.16 MB


---

### Comparison between the two models