# Cars classification

#### Get the images from google drive (Machine Learning | Practical -> homework_files -> hw12 -> pictures.zip)


#### The data contains car images of 10 different types. The images are colored and have different sizes originally. The correct labels of the car types are in the train.csv file

---

In [2]:
!wget https://dphi-live.s3.eu-west-1.amazonaws.com/dataset/standford_cars.zip

In [3]:
!unzip standford_cars.zip

In [4]:
!cp -r ./standford_cars/train ./pictures

In [5]:
!cp ../input/carstraintest/predictions.csv ./predictions.csv

In [6]:
!cp ../input/carstraintest/train.csv ./train.csv

In [7]:
# !pip install opencv-python

In [8]:
import pandas as pd
import numpy as np
import tensorflow as tf
import os
import cv2
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [9]:
train = pd.read_csv('train.csv')

In [10]:
train

## Loading and preparing training data
The train and test images are given in the same 'pictures' folder. The labels of train images are given in a csv file 'train.csv' with respective image id (i.e. image file name).

#### Getting the labels of the images

In [11]:
labels = pd.read_csv("train.csv")
labels.head()

In [12]:
labels.tail()

#### Getting images file path

In [13]:
file_paths = [[fname, './pictures/' + fname] for fname in labels['filename']]

#### Confirming if no. of labels is equal to no. of images

In [14]:
# Confirm if number of images is same as number of labels given
if len(labels) == len(file_paths):
    print('Number of labels i.e. ', len(labels), 'matches the number of filenames i.e. ', len(file_paths))
else:
    print('Number of labels does not match the number of filenames')

#### Converting the file_paths to dataframe

In [15]:
images = pd.DataFrame(file_paths, columns=['filename', 'filepaths'])
images.head()

#### Combining the labels with the images

In [16]:
train_data = pd.merge(images, labels, how = 'inner', on = 'filename')
train_data.head()       

In [17]:
train_data

In [18]:
# train_data.to_csv('my_train.csv')

In [19]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
train_data['label'] = le.fit_transform(train_data['label'])

The 'train_data' dataframe contains all the image id, their locations and their respective labels. Now the training data is ready.

In [20]:
train_data.head()

## Data Pre-processing
It is necessary to bring all the images in the same shape and size, also convert them to their pixel values because all machine learning or deep learning models accepts only the numerical data. Also we need to convert all the labels from categorical to numerical values.

In [21]:
data = []     # initialize an empty numpy array
image_size = 100      # image size taken is 100 here. one can take other size too
for i in range(len(train_data)):
    img_array = cv2.imread(train_data['filepaths'][i])    #  cv2.IMREAD_GRAYSCALE to convert into grayscale
    new_img_array = cv2.resize(img_array, (image_size, image_size))      # resizing the image array
    data.append([new_img_array, train_data['label'][i]])

In [22]:
# pixels of an image
data[5][0]

In [30]:
plt.imshow(np.flip(data[5][0], axis=-1)) # should flip the order of channels to get the correct colors in the picture but it's not important for the model

#### Shuffle the data

In [39]:
np.random.shuffle(data)  # not necessary

#### Separating the images and labels


In [40]:
x = []
y = []
for image in data:
    x.append(image[0])
    y.append(image[1])

# converting x & y to numpy array as they are lists
x = np.array(x)
y = np.array(y)

In [41]:
np.unique(y, return_counts=True)

#### Splitting the data into Train and Validation Set
We want to check the performance of the model that we built. For this purpose, we always split (both independent and dependent data) the given data into training set which will be used to train the model, and test set which will be used to check how accurately the model is predicting outcomes.

For this purpose we have a class called 'train_test_split' in the 'sklearn.model_selection' module.

In [42]:
# split the data
X_train, X_val, y_train, y_val = train_test_split(x,y,test_size=0.2, random_state = 42)

## Building Model
Now we are finally ready, and we can train the model.

There are many machine learning or deep learning models like Random Forest, Decision Tree, Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN), etc. to say you some.


Then we would feed the model both with the data (X_train) and the answers for that data (y_train)

In [49]:
cnn = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=200, kernel_size=(3, 3), activation='relu', input_shape=(100, 100, 3)),
    tf.keras.layers.Conv2D(filters=100, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    
    tf.keras.layers.Conv2D(filters=50, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.Conv2D(filters=50, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    
    tf.keras.layers.Conv2D(filters=30, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.Conv2D(filters=30, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    
    tf.keras.layers.Conv2D(filters=20, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.Conv2D(filters=20, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    
    # tf.keras.layers.Flatten(input_shape=(100, 100, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

In [50]:
cnn.compile(loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [51]:
cnn.summary()

In [52]:
cnn.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_val, y_val))

## Validate the model
Wonder🤔 how well your model learned! Lets check its performance on the X_val data.

In [53]:
cnn.evaluate(X_val, y_val)

## Predict The Output For Testing Dataset
We have trained our model, evaluated it and now finally we will predict the output/target for the testing data (i.e. Test.csv).

#### Load Test Set
Load the test data on which final submission is to be made.

In [55]:
# Loading the order of the image names that has been provided
test_image_order = pd.read_csv("./predictions.csv")
test_image_order.head()

#### Getting images file path

In [56]:
file_paths = [[fname, './pictures/' + fname] for fname in test_image_order['filename']]

#### Confirm if number of images in test folder is same as number of image names in 'Testing_set_face_mask.csv'

In [57]:
# Confirm if number of images is same as number of labels given
if len(test_image_order) == len(file_paths):
    print('Number of image names i.e. ', len(test_image_order), 'matches the number of file paths i.e. ', len(file_paths))
else:
    print('Number of image names does not match the number of filepaths')

#### Converting the file_paths to dataframe

In [58]:
test_images = pd.DataFrame(file_paths, columns=['filename', 'filepaths'])
test_images.head()

## Data Pre-processing on test_data


In [59]:
test_pixel_data = []     # initialize an empty numpy array
image_size = 100      # image size taken is 100 here. one can take other size too
for i in range(len(test_images)):
    img_array = cv2.imread(test_images['filepaths'][i])   # converting the image to gray scale
    new_img_array = cv2.resize(img_array, (image_size, image_size))      # resizing the image array
    test_pixel_data.append(new_img_array)

In [60]:
test_pixel_data = np.array(test_pixel_data)

### Make Prediction on Test Dataset

In [61]:
pred = cnn.predict(test_pixel_data)

In [62]:
prediction = []
for value in pred:
    prediction.append(np.argmax(value))

In [63]:
predictions = le.inverse_transform(prediction)

## Saving prediction results


In [64]:
res = pd.DataFrame({'filename': test_images['filename'], 'label': predictions})  # prediction is nothing but the final predictions of your model on input features of your new unseen test data
res.to_csv("cnn_predictions.csv", index = False)      # the csv file will be saved locally on the same location where this notebook is located.