# Image Classification
This notebook deals with the image classification portion of the final project. We'll be utilizing a Convolutional Neural Network from the Tensorflow framework.

In [1]:
import pandas as pd
import numpy as np
from matplotlib import image
from IPython.display import clear_output
import os
import shutil
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing import image_dataset_from_directory

## Prepping our data
We'll need split our images into train and test sets. To do this we'll create two new directories, train_images and test_images. We'll make a function to make theses directories.

In [2]:
def makeFolder(subfolder):
    parent_dir = os.getcwd()
    path = os.path.join(parent_dir,subfolder)
    try:
        os.mkdir(path)
        print('Successfully made directory.')
    except OSError as error:
        print(error)

Now we'll make the directories.

In [3]:
makeFolder('train_images')
makeFolder('test_images')

[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\train_images'
[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\test_images'


We'll create a function to copy to the directory.

In [4]:
def copyDir(destination):
    current = os.getcwd()
    for sub in ['animal_crossing','doom']:
        try:
            shutil.copytree(f'{current}\\{sub}',f'{current}\\{destination}\\{sub}')
            print(f'Successfully copied {sub} to {destination}')
        except OSError as error:
            print(error)

Now we'll copy to our directory.

In [5]:
copyDir('train_images')
copyDir('test_images')

[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\train_images\\animal_crossing'
[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\train_images\\doom'
[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\test_images\\animal_crossing'
[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\test_images\\doom'


Next we get our image file names for each.

In [6]:
train = pd.read_csv('train.csv')['filename']
test = pd.read_csv('test.csv')['filename']

Now we'll loop through our directories and remove the files not in the specified list.

In [7]:
def removeFilesNotInList(array,folder):
    flist = array.tolist()
    for sub in ['animal_crossing','doom']:
        subpath = os.path.join(folder,sub)
        for file in os.listdir(subpath):
            f = os.path.join(subpath,file)
            if file not in flist:
                os.remove(f)

In [8]:
removeFilesNotInList(train,'.\\train_images')
removeFilesNotInList(test,'.\\test_images')

Now let's load our training images into tensorflow. We'll be utilizing the image_dataset_from_directory function in tensorflow. This function looks into our directory for subdirectories containing images. The subdirectory of the images will be the class that the image is assigned to. This function will also change the size of our image. This is important as all images in our model will need to be the same size. We'll also input our images as a batch size. Neural Networks are trained in batches so it's important that our datasets are trained in these batches.

In [9]:
train_ds = image_dataset_from_directory('.\\train_images',image_size=(300,300),
                                       batch_size=32)

Found 1276 files belonging to 2 classes.


And now our testing dataset

In [10]:
test_ds = image_dataset_from_directory('.\\test_images',image_size=(300,300),
                                       batch_size=32)

Found 318 files belonging to 2 classes.


Let's assign our class names. We can use these later on to decide create a better understanding of our model's output.

In [11]:
class_names = train_ds.class_names

## Model Creation
Now let's create our model. This model will have a couple of different types of layers in it. The first layer in the model rescales the images so that the model can work better. Convolutional Neural Networks ideally want to work with pixel data with a max value of 1. 

The 2nd type of layer we have is a convolutional 2d layer that utilizes convolutions to understand features distinctly. More explanation on convolutions can be found here: https://www.youtube.com/watch?v=KuXjwB4LzSA&t=713s&ab_channel=3Blue1Brown. 

The 3rd type of layer we have is a pooling layer. Pooling layers work to reduce the size of our next layer which helps with computation time. 

The 4th layer is simply a layer that flattens our output to reshape our layer to the desired shape. 

The last type of layer is a dense layer. Dense layers consist of nodes that work utilize many linear parameters to make estimations. Note the last dense layer doesn't have an activation function and contains a node for each of our classes. 

More info about convolutional neural networks in tensor flow can be found in the following colab notebook: https://colab.research.google.com/drive/1ZZXnCjFEOkp_KdNcNabd14yok0BAIuwS#forceEdit=true&sandboxMode=true.

In [12]:
model = models.Sequential()
model.add(layers.Rescaling(1./255))
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(300, 300, 3)))
model.add(layers.MaxPooling2D((3, 3)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((3, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(32,activation='softmax'))
model.add(layers.Dense(2))

Now that we have created our model framework, we need to compile and fit it. The optimizer, loss function, and metric was taken from the colab notebook referenced above.

In [13]:
model.compile(optimizer='adam',
             loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
             metrics=['accuracy'])
history =  model.fit(train_ds,epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Let's evaluate our performance on our test dataset. 

In [14]:
model.evaluate(test_ds)



[0.5689965486526489, 0.7672955989837646]

## Making Predictions
Before we make predictions on our testing and training sets, let's get our files into new folders so we can loop through the file names and make predictions on our file. We need to do this because we need to have the associated file name in our output to combine this with our other dataset.

In [15]:
makeFolder('train_images_combined')
makeFolder('test_images_combined')

[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\train_images_combined'
[WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\sulli\\Code\\Python\\Data Mining\\Final Project\\test_images_combined'


Let's also create a function to copy all of the contents from our image directories so that we can copy the contents correctly.

In [16]:
def copyFromDir(start,end):
    for folder in os.listdir(start):
        folder_path = os.path.join(start,folder)
        for file in os.listdir(folder_path):
            f = os.path.join(folder_path,file)
            shutil.copy(f,end)

In [17]:
copyFromDir('.\\train_images','.\\train_images_combined')
copyFromDir('.\\test_images','.\\test_images_combined')

Now we copy the list of each of the file names.

In [18]:
train_files = list(os.listdir(".\\train_images_combined"))
test_files  = list(os.listdir(".\\test_images_combined" ))

Finally we'll make a prediction function for our model. This will be a little complicated as input needs to be a batch of images. This method was taken from the tensorflow documentation. https://www.tensorflow.org/api_docs/python/tf/keras/utils/load_img.

In [19]:
def makePredFromDir(targetDir,model):
    animal_crossing = []
    doom = []
    for file in os.listdir(targetDir):
        f = os.path.join(targetDir,file)
        image = tf.keras.utils.load_img(f,target_size=(300,300))
        input_arr = tf.keras.utils.img_to_array(image)
        input_arr = np.array([input_arr]) 
        predictions = model.predict(input_arr,verbose=0)
        animal_crossing.append(predictions[0,0])
        doom.append(predictions[0,1])
    return([animal_crossing,doom])

Now let's make our predictions for our training set and testing set. We'll export them as csv's for later use.

In [20]:
train_predictions = pd.DataFrame()
train_predictions['File'] = train_files
preds = makePredFromDir(".\\train_images_combined",model)
train_predictions['AnimalCrossingWeights']= preds[0]
train_predictions['DoomWeights'] = preds[1]
train_predictions.to_csv('TrainPred.csv',index=False)

In [21]:
test_predictions = pd.DataFrame()
test_predictions['File'] = test_files
preds = makePredFromDir(".\\test_images_combined",model)
test_predictions['AnimalCrossingWeights']= preds[0]
test_predictions['DoomWeights'] = preds[1]
test_predictions.to_csv('TestPred.csv',index=False)

The last thing we want to do is delete the copied directories as they take up space on our devices. I only have one TB of storage on my machine so I want to preserve this space.

In [22]:
shutil.rmtree(".\\train_images_combined")
shutil.rmtree(".\\test_images_combined" )
shutil.rmtree(".\\train_images")
shutil.rmtree(".\\test_images")