# Machine Learning Project to Predict Vehicle Decade, Body Style, and Make

*by Chris Haynes, Ben Newell, and Ryan Williams 2019*

## Introduction

We decided it would be interesting to try and predict a vehicle's decade, the body style, and the make of the cars passed in from an image. To add to the ease of using the neural network and to satisfy an honors option, we also made a web application that allows the user to load a .JPG or .JPEG image of a car and a selection of a classification type to make it easier to use the neural networks. For the web application, we built the application in React.js, and hosted a server in Python Flask with a RESTful API to upload images and classify that image using REST principles. For the neural networks, the hardest part was working with the data. We had to select the right image size, aspect ratios, cropping to the body of the car, standardization of the data, and batch sizes. Choosing the right parameters for the neural networks helped as well.

## Methods

### Web Application
Writing the web application was slightly challenging, especially passing the image from the website to the server. The first thing that I did was begin by running `npx create-react-app` to initialize a simple React web application. This is very basic and essentially handles the creation of files such as `index.js` and `App.js` with some simple React to display a generalized React splash page. It also handles the creation of the `node_modules` directory, which has `react-scripts` that can be run for different lifecycles of the application. The final thing is the `package.json`, which contains all packages that are necassary to run the application. Running `npm install` will install all the correct dependencies for the project without the need to download them one by one, as would be necassary with a `pip install`. This also references the `react-strap` node module and will run the scripts for certain commands. The most common command during development is `npm start` which starts the development server. The start script is not an optimized build, but is nice to use because it will hot load when a change in the React code occurs. Therefore, this command is usually only run once, and when tinkering with the website it prevents the need to make a change and restart the development server. On the actually Python server, we decided to use Python Flask, which is a web framework, but worked just fine providing the RESTful API to the React application. All of the server for the API is contained in `server.py`.

To style the web application, we decided to use the node module called `react-strap`, which also means we needed to install `bootstrap` as well. Bootstrap is common library used for web applications and it's principles are widely known for user interfaces. It has layouts based on a theory of 12, that the layouts should be split into categories, so adding three columns could mean their widths would equally be 4 for a layout, or one large one at 6 and the other two at 3, etc. Not only does it handle the layouts well, but it also has great styling, so we did not need to spend much time working on CSS. It has a React class called `Jumbotron` that makes a nice page header, and a `Navbar` class that works for navigation. If there were more pages than just the one we used, these links would have been available in the nav bar of our web application.

The class were we made the most happen was in `Loader.js`. This class handled the layout below the page header, and included all of the buttons and the image upload. We used some simple React principles to make this class work. We'll start with the buttons. To highlight the button that has been selected, we stored the state of the selected button in the Loader component. If a button was the `activeButton`, they would highlight from a `secondary` color to a `primary` color. We used one function to change the state of `activeButton` upon a click of the button. If a user has pressed the Submit button to actually submit the image to classify, the button will become disabled, since the type of the classification has already been chosen. 

The part that took the longest was getting the file to upload. We did this first by determining if the user was still `loading` an image and had not yet submitted an image. By default, the user starts with `loading` set to `true`. The nice thing is that `reactstrap` has a `Form` class with an `Input`. In the `Input` component, one can select the `type` of the upload, which is set to file. We do validation that it is an image on the server, so no validation was necassary on the client side. Using the `onChange` attribute, we pass the event to my `fileChanged` function. We use the `event.target.files[0]` to store the first file submitted and we display it on the page in a small preview. We achieved the display by using a conditional to display the image if the file was present. If it is not an image, it displays alternative text, but the file will not be accepted upon submitting the file. The magic all happens in `submitImage`, and we followed the blog post from [Ashish Pandey](https://medium.com/@ashishpandey_1612/file-upload-with-react-flask-e115e6f2bf99) to handle submission (Pandey). The submit button cannot be clicked until a file has been loaded, but `onClick` calls `submitImage`. First, it starts by creating `FormData` to pass to the server, and appends the file to that object. If a classification is not chosen, it will trigger an alert that a classification must be chosen. Otherwise, it `fetch` call to the server, which runs on `http://127.0.0.1:5000`. It calls the `upload` using the HTTP verb of `POST` to post the file to the server and appends the `FormData` object as the body of the call. On the server, we check to see if it is a legal file, and we throw an error if it is not legal. If it is legal, namely a JPEG or JPG image, we make a directory on the server called `uploads`. It saves the image to this directory and returns a JSON object containing the name of the file on the server. On the front end, we use the `.then` function to await the response from the server since a fetch is asynchronous and returns a promise. We check if there is an error and if no error is thrown, it will use the file name and call `getClassification` with the file name returned from the server. This function uses another `fetch` and a `GET` request to the `classification` API. It does this using a query string with the file name and the selected classification type. On the server, we get both the file name and classification type from the query string, and uses them to call the correct neural network and get the classification. We call the correct pickled neural network object, and call the `use` function to get the classification. We call the `tbd` function to get the classification's name. This is then returned back to the web application and displayed on the web page in place of the file upload, with the image still displayed on the right. 

The challenges we ran into was first of all getting the image to the server. We first tried using JSON objects to pass to the server, but getting the file was impossible as far as we could tell from debugging the request object. Using `FormData`proved much more successful. Another problem that we ran into was a warning that we were not using `CORS` requests between the server and client. By adding a `CORS` object to surround the server app, we were able to do cross origin requests without errors. We had to see how to parse the returned status that came back from the methods, and doing different statuses, but it proved easier to return JSON strings to front end to use. The challenge with the neural network objects is that they were trained on the GPU. We got several errors related to the fact that it was trained on the GPU rather than CPU. Changing the `map_location` didn't work for `torch.load`, so on CoLab, we call `nnet.to(torch.device('cpu')` and then set `nnet.device = 'cpu`. This allowed for the pickled object to be loaded onto the CPU the server resides on, and to use that network to classify the images.

## Image Processing and Data preparation.

   The data used in this experiment comes from the Stanford Cars dataset. These photos were stored in jpg format. Data processing had to be done in two ways on this data. First, we had to read in the metadata containing the photo’s name, car type, and bounding box. Then, the photos themselves had to be read in into a numpy format compatible with the neural network class. 

### Description of metadata

   The metadata describing the images is presented in three .mat matlab files. The file `cars_meta.mat` contains class names for each of the classes of cars. So if the user wants to determine what car class 57 is, for example, they would use this file. The other two files, `cars_test_annos.mat` and `cars_train_annos.mat` contain information about every image file in the respective test and train folders. The folders `car_test` and `car_train` contain thousands of jpg images of cars. The file `car_train_annos.mat` has descriptions of bounding boxes and class names for each of the images in the `car_train` folder. The bounding box description is of X and Y coordinates of the rectangle enclosing the car in the image. This information is important for training the neural network. In order for it to have the best chance of picking up the trend, only relevant information should be shown. The class number, in combination with `cars_meta.mat` allows the user to find what car is shown in that image. Unfortunately, the file `cars_test_annos.mat` did not contain class labels for each of its entries. This meant that the images in the `cars_test` folder could not be used to train or test our neural network, because the correct class labels would have to be determined by hand. However, there was enough data in just the training dataset that we could set some aside to be used for testing later on. This process is described in detail in the data partitioning section. 

### Description of car files


   The `cars_train` folder contains ~8000 jpg images of cars ranging from the early 1990’s to the early 2010’s. They come from many different manufacturers and have a full range of body styles represented. Unlike many machine learning datasets out there, the images in this file do not have standard size or quality. Some had very small sizes, and others were 10 times the size of the smallest images. Finding a common size to work as input for the neural network would prove to be a challenge later on. However, they are all the jpg file type. These files store pixel information in three channels of 8 bit integers. The next section will describe how these files are read in and prepared for the neural network. 

### Reading in jpgs with tensorflow

   Methods for reading in and preparing the data are located in the notebook `read_car_data` and written out to file `image_processing.py`. The end goal was to read in the photos into a numpy format that would work as valid input to the neural network code supplied in class. The first step in this process was to figure out a way to read the files in. While many libraries for reading images exist, I went with Tensorflow because it has many useful ways of manipulating images that I had become familiar with already, and was readily convertible to numpy in the 2.0 version. The alpha 2.0 version was selected for a few reasons. Mainly, its uses eager execution by default, making debugging much easier and more similar to the numpy style of arrays I am used to. 

   First, I read each file in the `car_train` folder into a raw tensor. The tensorflow function `tf.io.decode_jpeg` can read a raw tensor from a file into a tf tensor preserving the jpg format. By calling `.numpy()` on these images, I was able to verify they had been read in successfully with `plt.imshow()`. This is where I encountered my first problem. I was running the GPU distribution of tensorflow due to its speed advantages during training. However, reading in over 6500 images for the testing set greatly exceeded the capacity of my GPU. Even with 8GB of memory, the image set could not fit all at once. To work with this limitation, the images had to be split up into batches for preprocessing. This meant that the test and training partitioning had to occur first, and with a set seed so that batches could be consistent if a failure occurred during the processing. After partitioning, batches could be made with the selected images for each set. The function `make_batches` takes the `X` and `T` matrices and a batch size and returns lists containing the batched indices. After this, only one more helper function was needed to read all of the images in. The function `matlab_to_dict` takes the metadata in the matlab format, and changes it into a much more usable dictionary format that would aid in processing later on. 
At this point, all of the tensors can be read, as long as one batch is processed at a time. The first step of the preparation was to crop the images to the bounding boxes provided in the metadata files. These crop out the extra information in the picture to leave just the relevant part containing the labeled car. This is an important step because some of the pictures contained multiple vehicles or large scenes. The function `crop` can crop a tf.tensor to the bounding box provided in the metadata using the `tf.image.crop_to_bounding_box` function. This function is highly efficient and can run on the GPU, making large scale cropping of images a quick procedure. 
   
   Second, I went through and found the largest and smallest cropped images so I had an idea of the range of image sizes. The largest was around 3000x4000, and the smallest around 50 by 100. This was quite the gap in both size and aspect ratio. At this point, resizing would be required for the majority of images. While a network with only convolutional layers can sometimes handle variable input size, a network with a fixed sized fully connected layer at the end of several convolutional layers needs to have a fixed input size. Because of this, I set out to find the best aspect ratio to resize the images to. My goal was to minimize the padded area of the image. While some sites argued for stretching images to fit a desired size and not using padding, I believed this could cause issues identifying some of the important traits that help to identify a car, especially its body type. A BMW X3 SUV, for example, looks almost the same as a vertically stretched BMW 3 series sedan. To find the best aspect ratio, I found the mean and median of the heights and widths of all the tensors. I then chose the size 275x550 to resize to. It was between the mean and median sizes, indicating that it was a roughly representative size image. I considered using the mode of the ratios, but considering the how variable the sizes were I did not believe it would be a good representation. After this, all images could be resized to 275x550 using the function `tf.image.resize_with_pad`. This function can upsample and downsample images to the desired size while preserving the original aspect ratio. The empty space is padded with zeros. In the future, I would consider using random noise instead, to ensure the model is not learning to identify classes based on the size of the padded area. 
    After cropping and resizing each batch of photos, the batch was written to the disk using the following process. First, an empty numpy array of the dimensions (Batch Size, Height, Width, 3) was created to hold the entries. Then, a loop called `.numpy()` on each tensor to bring the numpy representation of the image off of the GPU and store it in the precreated array. This resulting array was about 450MB in size. After it was all in memory, this batch was written to a file using the python package `pickle`. This was done for each batch, leaving 7 batches for training and 2 for testing. At this point any of the `.pkl` files could be used for training any neural network that accepts a numpy array.
When I started training, I realized that the size 275x550 was too large to easily use convolutional neural nets on the GPU. Even basic structures would not fit on my 8GB GPU. To alleviate this issue, I reran the image processing at ⅗ the original size and ⅕ the original size. The ⅕ size was especially convenient, because at this size the images could feasibly fit in one array, so no batching was necessary. 

### Training Template and Body styles

   Initially, I started training on my desktop with the full size images. I quickly found that my GPU would not be large enough for this task. Of the 8GB, about 2 were taken up by the OS, and another 2 were taken up from the data being read in, leaving only 4 for the neural network. To have more resources, training was moved onto colab from this point onwards. In a GPU instance, colab grants the user access to a Tesla T4 GPU, with 15GB of memory. The whole GPU can be used as well because there is no graphical overhead in a colab instance. This also made it easier to share notebooks so the whole team could try out different networks for each of their tasks. The notebook `final_training_body_style` was used as a template for the other training notebooks. First, the notebook imports all the needed packages like in any other notebook. It also installs tensorflow 2.0 because colab by default comes with 1.13. Then, using the package `google.colab` the user can mount their Google drive account as a directory in colab. This was very helpful for using the pickled files created earlier, as well as storing trained networks. 
	Next, the file imports `neuralnetworks_pytorch`. This is the file provided in class, with certain modifications. This package presumes that the user is using a square input if they are running a convolution. However, our input is rectangular, so the file had to be modified to work correctly. We could have cropped images to be square input, however this would result in more unnecessary padding. To achieve this the parameter `input_height_width` is split into two parameters, `input_height` and `input_width`. Then, when computing the size of the convolutional networks, `input_height` and `input_width` are updated separately by subtracting the window and dividing by the stride and adding one.  Finally, the `n_inputs` for the fully connected network is found by taking the quantity ` input_height * input_width * n_inputs` instead of `input_height_width ** 2 * n_inputs`. After these changes, the file could run rectangular convolutions just fine.
	The previously pickled files can be read in using the pickle package into batch files. These files are ready to use numpy files of the dimensions (Batch, Height, Width, Channels). This is called channels last notation, which is normally used by tensorflow. Pytorch uses the channels first notation, so some adjustment is needed to use the neural network package. The function `np.rollaxis` allows the user to change channels last to channels first, resulting in the dimensions (Batch, Channels, Height, Width). 
	In this example, the ⅕ dataset is used, so the data is read in all at once. The unique names of makes, body styles and years are found with the function `np.unique`. Once these unique values are in lists, the target arrays `Tbatch` and `Tbatch_test` can be transformed to use the index of these values instead of the values themselves because the neural net package expects to get ints as class labels. The output class also needs to be selected. Some neural network structures allow for two dimensional classification output. This would have allowed us to train the network to classify on all three target classes at once (Decade, body style and manufacturer). However, the neural network package only supports 1D output, so one of these classes needs to be selected for the target array for training. After it is selected, `n_outputs` can be found with the unique number of outputs in the target array. `input_size_in_pixels` is set to three because the input has three channels. 
	Next, various different neural network structures were tested and evaluated on their accuracy. Initially, I found the largest gains in higher n_hiddens counts. So I pushed this value as high as it could go while still fitting on the GPU. At this point, colab will give a warning about high GPU memory usage, but it will not run out of memory during training. Then I tried other hyper parameters. The best benefit came from changing learning rate and n_iterations in tandem. Starting at a .001 rate, I decreased it down in steps to .00005, where I had the most success with body style classification. This necessitated an increase in n_iterations at the same time. Lower n_iterations were tried, but the classification accuracy went down, so I stopped at .00005. The graph shows the performance of the network as `n_iterations` varies. Batch size is also set at 200 based on memory limitations. Larger batches can speed up training, but to large of a batch can not fit on the GPU all at once. The strides and window sizes are chosen based on using odd kernel sizes and strides of 2. This helps to reduce the size of the network in memory and increase performance. When testing where to allocate hidden units, I found that increasing the units per convolution had some benefit, but increasing the number in the fully connected layers had the most benefit. 
	After finding a satisfying result, the network is written out to a file using `pickle`. These files are quite large at ~350 MB. These files are loaded up by the server. Using `nnet.use()` the output of `image_to_numpy()` can be used to make predictions of any photos the user gives as input. 


## Results

Show all results.  Intermediate results might be shown in above Methods section.  Plots, tables, whatever.

## Conclusions

What I learned.  What was difficult.  Changes I had to make to timeline.

### References

* [A. Pandey], “File Upload with React & Flask,” Ashish Pandey, 06-Jul-2018. https://medium.com/@ashishpandey_1612/file-upload-with-react-flask-e115e6f2bf99.


Your report for a single person team should contain approximately 2,000 to 5,000 words, in markdown cells.  You can count words by running the following python code in your report directory.  Projects with two people, for example, should contain 4,000 to 8,000 words.

In [1]:
import io
from nbformat import current
import glob
nbfile = glob.glob('Haynes_Newell_Williams-TermProject.ipynb')
if len(nbfile) > 1:
    print('More than one ipynb file. Using the first one.  nbfile=', nbfile)
with io.open(nbfile[0], 'r', encoding='utf-8') as f:
    nb = current.read(f, 'json')
word_count = 0
for cell in nb.worksheets[0].cells:
    if cell.cell_type == "markdown":
        word_count += len(cell['source'].replace('#', '').lstrip().split(' '))
print('Word count for file', nbfile[0], 'is', word_count)

Word count for file Haynes_Newell_Williams-TermProject.ipynb is 3746



- use nbformat for read/write/validate public API
- use nbformat.vX directly to composing notebooks of a particular version

  """)
