# **Data Analysis Notebook**

This notebook will refer back to ModelTraining&Testing.ipynb for details on how we treated the data, set up the runs, and the model of choice. There is another section further down that handles the visualizations and applications of the model.

## **Training & Validation Run Results**

### **Run 1**

#### **Description:**

For this run, we used a basic MLP model (class MLP) and no image transformations other than normalization (common_transformations). The model was trained using the entire training set and evaluated on the entire validation set.

**Note**: There was an issue with our evaulation function early on in the testing, which is why the evaluation accuracies and losses are missing.

#### **Loss & Accuracy:**

![image.png](attachment:image.png)

#### **Analysis:**

Training the entire dataset took a total of 5 hours and the training losses did appear to converge at around 3.9 losses.  After just 10 epochs, the accuracy is plateauing at 10% accuracy, which was expected with 20 unique car manufacturers.  As mentioned, the evaluation function was corrupted, therefore it's impossible to tell if the model is overfitting or underfitting.

### **Run 2**

#### **Description:**

For this run, we used a basic MLP model (class MLP) and no image transformations other than normalization (common_transformations). The model was trained using the partial (20%), randomized training set evaluated on the partial (20%), randomized validation set.

#### **Loss & Accuracy:**

![image.png](attachment:image.png)

#### **Analysis:**

The performance of this model was obviously worse than the first run due to the fact that the model had less data to work with. However, it took only a fraction of the time that the model trained on the entire dataset did (~1 hour). We forgot to plot the graph, but it's easy to estimate based on the numerical values that there was no severe overfitting going on.

### **Run 3**

#### **Description:**

This run was identical to Run #2 (20% of the dataset, randomized)

#### **Loss & Accuracy:**

Since this run was nearly identical in set up and output, we omit the screenshots.

#### **Analysis:**

See Run 2 analysis.

### **Run 4**

#### **Description:**

For this run, we used a basic CNN model (class CNN) with image augmentations (train_trfms & val_trfms). The model was trained and evaluated on the partial (20%), randomized training and validation set.

#### **Loss & Accuracy:**

![image.png](attachment:image.png)

#### **Analysis:**

In this run, looking at the results, the training and validation datasets are optimized, not underfitting or overfitting.  Even only running 1/5 of the dataset, the CNN model performs the best, beating MLP, training with an entire dataset and more time.  The downside is CNN took a little over an hour, which is much longer than MLP.  

### **Run 5**

#### **Description:**

For this run, we used a basic MLP model (class MLP) with image augmentations (train_trfms & val_trfms). The model was trained and evaluated on the partial (20%), randomized training and validation set.

#### **Loss & Accuracy:**

![image.png](attachment:image.png)

#### **Analysis:**

Overall, compared to previous MLP runs with 20% of the dataset, this MLP setup resulted in slightly worse performance, but not by a noticeable amount. This drop in performance could have been purely due to chance, as the dataset was randomly selected.

### **Run 6:**

#### **Description:**

For this run, we ran the basic CNN on the entire dataset without using image augmentation. This run took a few hundred epochs (a couple of days in training time), so we split the training sessions so they were not running continuously for hours on our local devices.

Loss & Accuracy:

![Screenshot 2025-11-26 184322.png](<attachment:Screenshot 2025-11-26 184322.png>)

![Screenshot 2025-11-26 232245.png](<attachment:Screenshot 2025-11-26 232245.png>)

![Screenshot 2025-11-27 012241.png](<attachment:Screenshot 2025-11-27 012241.png>)

![Screenshot 2025-11-27 040527.png](<attachment:Screenshot 2025-11-27 040527.png>)

![Screenshot 2025-11-27 081551.png](<attachment:Screenshot 2025-11-27 081551.png>)

![Screenshot 2025-11-27 113253.png](<attachment:Screenshot 2025-11-27 113253.png>)

![Screenshot 2025-11-27 130613.png](<attachment:Screenshot 2025-11-27 130613.png>)

![Screenshot 2025-11-27 143026.png](<attachment:Screenshot 2025-11-27 143026.png>)

![Screenshot 2025-11-28 040827.png](<attachment:Screenshot 2025-11-28 040827.png>)

#### **Analysis:**

The graphs for this are unintuitive because the model runs were split into 10 or 20 epochs at a time. However, you can tell that the model is performing very well on the validation set, and the training and evaluation accuracies are probably the highest we've seen (as well as the training and evaluation losses being the lowest we've seen). We stopped the model here because of time constraints and because it appeared as though the model was converging (though we might have wanted to run it for another 20 epochs or so to confirm).

### **Run 7:**

#### **Description:**

This was our last major run, and we decided to save this model for testing. It used data augmentation AND the CNN on the full dataset. Note the training time (it took almost 2 entire days)!

#### **Loss & Accuracy:**

![image.png](attachment:image.png)

#### **Analysis:**

As you can see, the model here took about the same time to complete 10 epochs as it took the CNN model from Run 6 to complete a few hundred. This is because data augmentation is very computationally expensive, and we were running it on the entire dataset. However, it's clear that it gave us the best performance with no overfitting and even more potential to improve.

## **Final Test Set Evaluation Result**

### **Description:**

We saved the model weights from Run 7 and loaded in the CNN model to test it on the full dataset.

#### **Result Visualization:**

These results are taken from the ModelTrainin&Testing.ipynb. Obviously, we couldn't print out all the image evaluations, so this is a representative sample.

Use FullResults.png.


Here's the top 20 predictions with class probabilities (excuse the formatting):

Use Top20.png.

Here's another one with correct predictions only:

Use CorrectResults.png.

And here's one with incorrect predictions only:

Use IncorrectResults.png.

### **Analysis:**

We can see from the difference in pictures between the incorrect & correct results that there may be a few things that contribute to whether or not a model makes a good guess. That is, the model may be focusing on certain aspects of the images to make its predictions.

Firstly, notice that the car images that it predicted correctly on are generally very proportional. That is to say, the camera doesn't distort the image in such a way that certain parts of the car are unnaturally magnified, which is the case for a lot of the cars in the incorrect prediction set.

Secondly, notice that most of the car images in the correct set have their logo in full view (most are head-on images). Compare this to the incorrect dataset, where we're getting a lot of tilted images or just looking at the side profile / back of the car. This is a key sign that the logo is a very important feature for the model's predictions. In fact, if you look at the incorrect prediction set, and more specifically the Volkswagon Jetta incorrect prediction, this is arguably a good quality picture with the logo displayed very clearly. The model predicts the Vokswagon part correctly, just not the model. This supports our hypothesis that the logo is a very important predictive feature for the model.

Thirdly, notice the perceived distance of the camera from the car in the correct set compared to the incorrect set: for the incorrect set, a lot of the cameras are way too close, resulting in distortion, or they're way too far, which makes it difficult even for a human eye to make out the brands (you'd really have to enhance the image and zoom in).

Fourthly, notice that in the incorrect set, in the first row, the model predicts Ford Fiesta / Ford Focus twice when the actual label is Fiat Fiorino. If you do a Google search of a Ford Fiesta and a Ford Focus, the two models actually look quite similar when viewed from the same angle as the Fiat Fiorinos. This indicates that the model has yet to truly learn "finer-grained features" that differentiate similar models of cars.

There are many other features that could contribute significantly to better predictive performance, such as lighting, background, body color, body shape, etc. These are all features that can be further investigated in the future should we continue this project.

## **Applications:**

Car image classifiers are widely used across many different fields. For example, these models can be used in conjuction with traffic cameras, car cameras (like in a Tesla), security cameras, etc. to automatically recognize vehicle types, makes, models, colors, or potential damage incurred. They can be used for automated traffic monitoring, where vehicles can be identified for tolling purposes (some vehicles may pay more, others pay less). They can also be used in law enforcement to flag suspicious vehicles (perhaps a person of interest is known to drive a certain make and model of car). In the insurance sector, even, you could use a car classifier to rapidly assess vehicle conditions and generate insurance statements. Furthermore, you could also use these models for classifying cars to keep track of new inventory at a place like a car dealership.

## **Conclusions**

Based on our results, CNN performs much better than MLP on our dataset.  CNN (only utilizing 20% of the dataset) outperformed MLP (training on our entire dataset), as shown from the accuracies and losses plots.  One main advantage that MLP has against CNN is that MLP takes significantly less time than CNN to train (3-5x less).  Even still, we recommend CNN for image classifications, as waiting for a bit longer yields much greater results.