# Cars Unlimited

## Recommended Resources

- [Geoprocessing with Python][1] by Chris Garrard
- [Machine Learning Bookcamp][2] by Alexey Grigorev
- [Machine Learning, Data Science and Deep Learning with Python][3] by Frank Kane
- [Machine Learning with TensorFlow][4] by Nishant Shukla
- [3Blue1Brown: Neural Networks][5] by Grant Sanderson
- [Deep Learning with JavaScript][6] by Shanqing Cai, Stanley Bileschi, Eric D. Nielsen, and Francois Chollet

## 1.1 Data Preprocessing and Visualization

### Step 1. Download the images for this task

- See <https://www.kaggle.com/ajaykgp12/cars-wagonr-swift>

### Step 2. Parse the images in each class (folder) and get a list of images in each of the classes.

- Use Python to walk through the classes and collect a list of images.
- Get a list of images per class. This would yield a number per class.

[1]: https://livebook.manning.com/book/geoprocessing-with-python/about-this-book/
[2]: https://livebook.manning.com/book/machine-learning-bookcamp/welcome/v-7/
[3]: https://livevideo.manning.com/module/92_1_1/machine-learning-data-science-and-deep-learning-with-python/getting-started/introduction?
[4]: https://livebook.manning.com/book/machine-learning-with-tensorflow
[5]: https://livevideo.manning.com/module/97_1_1/3blue1brown-neural-networks/neural-networks/but-what-is-a-neural-network%3f?
[6]: https://livebook.manning.com/book/deep-learning-with-javascript/about-this-book/

In [None]:
import os
import random

import cv2
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid

In [None]:
def treedir(root):
    """Return dict mirroring directory structure starting at given root.

    Assumes each directory contains only directories or regular files, not both.
    In a directory containing both, only the subdirectories are represented.
    Each directory is represented by a dict nested within the dict of its parent
    directory, associated with the directory name. Regular files are represented
    by a list of file paths that include full relative paths from the root, and
    the list is associated with the parent directory name as the key.
    """
    def recur(walker):
        dirpath, dirnames, filenames = next(walker)
        return (
            {dirname: recur(walker) for dirname in dirnames} if dirnames
            else [os.path.join(dirpath, filename) for filename in filenames]
        )

    return recur(os.walk(root))

In [None]:
files_by_split_by_class = treedir('data')

for split, files_by_class in files_by_split_by_class.items():
    for class_, files in files_by_class.items():
        print(split, class_, len(files))

### Step 3. Use OpenCV to read images from each class, from train, validation and test splits.

- Choose ten images at random from each class.

In [None]:
n_images_per_class_per_split = 10

# {
#     split1: [
#         (class1, img111),
#         (class1, img112),
#         ...,
#         (class2, img121),
#         ...,
#     ],
#     split2: [
#         (class1, img211),
#         (class1, img212),
#         ...,
#         (class2, img221),
#         ...,
#     ],
#     ...
# }
random_labeled_images_by_split = {
    split: [
        (class_, cv2.imread(file))
        for class_, files in files_by_class.items()
        for file in random.choices(files, k=n_images_per_class_per_split)
    ]
    for split, files_by_class in files_by_split_by_class.items()
}

### Step 4. Display each image along with their label (class name) below it.

- Use matplotlib to plot the images as a matrix.

In [None]:
# See https://stackoverflow.com/questions/46615554/how-to-display-multiple-images-in-one-figure-correctly

def plot_labeled_images(labeled_images, *, nrows=10, ncols=2, figsize=(10, 50)):
    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize)

    # plot simple raster image on each sub-plot
    for i, ax in enumerate(axes.flat):
        # i runs from 0 to (nrows * ncols - 1)
        # ax is equivalent with axes[r][c]
        label, img = labeled_images[i]
        ax.imshow(img)
        ax.set_title(label)
        ax.set_xticks([])
        ax.set_yticks([])
        
    plt.tight_layout(True)
    plt.show()

In [None]:
plot_labeled_images(random_labeled_images_by_split['train'])

In [None]:
plot_labeled_images(random_labeled_images_by_split['validation'])

### Step 5. Log your observations with random images.

- Note the challenges in pose, color, lighting in the train and validation data.
- What was your guess of the classes?

Some images are rotated, cropped, or obstructed, which might cause problems.  Additionally, some images show only the front or rear of the vehicle, which might also present challenges.  Regarding lighting, some images show glare/reflections on the windows or body of the vehicle.

Although the 2 classes of vehicle are quite similar, there does seem to be a reasonable difference in "boxiness," where the wagonr is rather clearly more "boxy" (less rounded) than the swift, and also appears to be a bit taller.