# Team Viviane Solomon and Brandon Bonifacio

# HW6: Life Cycle of ML Model

Your goal is to develop a CNN model that, given a cell phone image
taken somewhere inside a building on HMC campus, can identify which building is being
photographed. You may use any online resources that you find helpful, but you must cite your
sources and indicate clearly what portions of your code have been copied and modified from
elsewhere. You may work individually or with a partner on this assignment. Please submit your
assignment as a single jupyter notebook on Sakai. If you work with a partner, make sure to
indicate both partners’ names clearly at the very top of your notebook.

## Part 1: Data Collection/Preparation

In the first part of the assignment, you will do the following:


• Data Collection (15 points). Each student/team will collect 50 cell phone pictures taken
of random locations inside a single building on campus. Please sign up for a building to
photograph in this spreadsheet, and upload your pictures to this shared google drive.
Since the data will be used by the entire class, please complete this portion of the
assignment by Saturday 1pm (-5 points if not done by then). If there are more than 5
buildings represented in the class data, you may simply select 5 buildings to use for this
assignment.


• Data Preparation (15 points). Download the class data onto your laptop. Prepare the
data for use in PyTorch by ensuring image format compatibility, putting the images in a
suitable directory structure, and creating train & validation partitions. Describe your
data preparation process in your notebook.

In [1]:
#Import Statements
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from torchvision import datasets
import pandas as pd

### In the cells below, we prepare the data for use in PyTorch by ensuring image format compatiblity, putting the images in a suitable directory structure, and creating train & validation partitions. We describe our data preparation process in the markdown cell below, and then encode it after that. 

***Input description here***

In [None]:
def load_prepare_data():
    """
    This function loads and partitions our image data.
    """
    #EXAMPLE MNIST CODE:
    mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=None)
    X_train = np.array(mnist_trainset.data.reshape(len(mnist_trainset), -1), dtype = np.float64)/255
    Y_train = pd.get_dummies(np.array(mnist_trainset.targets)).to_numpy(dtype = np.float64) # one-hot encoding

    mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=None)
    X_test = np.array(mnist_testset.data.reshape(len(mnist_testset), -1), dtype = np.float64)/255
    Y_test = pd.get_dummies(np.array(mnist_testset.targets)).to_numpy(dtype = np.float64)
    
    return return X_train, Y_train, X_val, Y_val

In [None]:
X_train, Y_train, X_val, Y_val = load_prepare_data()

In [None]:
sample_img = X_train[0,:].reshape((28,28))
plt.imshow(sample_img, cmap='gray')

## Part 2: Nearest Neighbors Approach

In the second part of the assignment, you will do the following:

• Feature extraction (15 points). Find a pretrained CNN model (e.g. ResNet) and use the
penultimate layer activations as a feature representation. Your jupyter notebook should
include code that demonstrates how to use the pretrained model to extract features
from an image in the dataset.


• Nearest Neighbor Method (15 points). Extract features from all the images in the
training set and store them in a single file along with the building labels. For each image
in the validation set, use the pretrained CNN model to extract the feature
representation, calculate which training image is closest in Euclidean distance, and use
its label as the prediction. Report your classification accuracy.

In [None]:
def load_resnet():
    """
    
    Loads a pretrained ResNet model. 
    
    """
    
    
    model = ##
    
    return model

In [None]:
model = load_resnet()

In [2]:
def feature_extraction(model, data):
    """
    
    Uses a pretrained model and returns the penultimate layer activations 
    as a feature representation. 
    
    This code demonstrates how to use a pretrained model to extract features from an image in the dataset
    
    """
    
    #We have 2 cases. The first case is if we have an array of images
    #The second case is if we only have one image
    
    try: #Case 1: Array of images
        m = len(data) #This will fail if it is case 2
        
        features = ##
    except: #Case 2: Single image
        
        
        features = ##
        
    return features
        
    
    
    

SyntaxError: invalid syntax (62451961.py, line 17)

In [None]:
X_train_features = feature_extraction(model, X_train)
X_val_features = feature_extraction(model, X_val)

In [None]:
def nearest_neighbors(X_train_features, Y_train, data_features):
    """
    
    Here, we calculate which training image is closest in Euclidean distance to each given validation image,
    and we use that to label the training image. 
    
    """
    
    try: #Case 1: Array of images
        m = len(data_features) #This will fail if it is case 2
        
        labels = ##
    except: #Case 2: Single image
        
        
        labels = ##
    
    
    
    return labels

In [None]:
val_labels = nearest_neighbors(X_train_features, Y_train, X_val_features)

In [3]:
accuracy = 100*np.mean(val_labels==np.argmax(Y_val, axis=1))
print(f"Our classification accuracy using a nearest-single-neighbor approach is {accuracy:.2f}%")

NameError: name 'val_labels' is not defined

## Part 3: Fine-tuning Approach

In the third part of the assignment, you will do the following:


• Finetuning (25 points). Remove the output layer of the pretrained CNN model and
replace it with a randomly initialized output layer that classifies among the 5 buildings of
interest. Finetune the modified model on the training samples. Include your
training/validation loss curves in your notebook, along with the final validation
classification accuracy.


• Improvements (10 points). Experiment with different ways to improve the validation
accuracy. Include any results or figures to document your progress.


In [None]:
def load_random_resnet():
    """
    
    Loads a pretrained ResNet model, but with the final output layer removed and replaced with a randomly 
    initialized output layer that will classify among the 5 buildings. 
    
    """
    
    
    model = ##
    
    return model

In [None]:
part3_model = load_random_resnet()

In [None]:
def finetune(model, X_train, Y_train, X_val, Y_val):
    """
    
    Trains the given model on the X_train and Y_train data, and returns the trained model as well as the losses
    and final validation accuracy.
    
    """
    
    hist = [[], []] #Histogram of training and validation losses
    
    
    
    #Train the model
    
    
    return model, hist, final_val_accuracy

In [None]:
model, hist, final_val_accuracy = finetune(model, X_train, Y_train, X_val, Y_val)
print(f"Our final validation accuracy accuracy using the full CNN is {final_val_accuracy:.2f}%")

## Below are some of the different ways we experimented with to improve accuracy. 

### Below I show the results of the augmentation approach. Using this method, we were able to improve the accuracy by _%. 

In [None]:
def load_prepare_data_augmentation():
    """
    This function loads and partitions our image data, and it also performs data augmentation to try to improve results. 
    """
    #EXAMPLE MNIST CODE:
    mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=None)
    X_train = np.array(mnist_trainset.data.reshape(len(mnist_trainset), -1), dtype = np.float64)/255
    Y_train = pd.get_dummies(np.array(mnist_trainset.targets)).to_numpy(dtype = np.float64) # one-hot encoding

    mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=None)
    X_test = np.array(mnist_testset.data.reshape(len(mnist_testset), -1), dtype = np.float64)/255
    Y_test = pd.get_dummies(np.array(mnist_testset.targets)).to_numpy(dtype = np.float64)
    
    return return X_train, Y_train, X_val, Y_val

In [None]:
X_train_aug, Y_train_aug, X_val_aug, Y_val_aug = load_prepare_data_augmentation()

In [None]:
sample_img = X_train_aug[0,:].reshape((28,28))
plt.imshow(sample_img, cmap='gray')

In [None]:
augmentation_model = load_random_resnet()

In [None]:
aug_model, hist_aug, final_val_accuracy_aug = finetune(model_aug, X_train_aug, Y_train_aug, X_val_aug, Y_val_aug)
print(f"Our final validation accuracy accuracy using the full CNN with data augmentation is {final_val_accuracy_aug:.2f}%")