### Single-label Album Cover Classification of the MUMU Dataset 

In this project we attempt to create an algorithm that classifies album cover art by their musical genres. We do this by implementing different convolutional neural networks. Our motivation for this project is both of our liking towards music and our appreciation for album cover art! We both thought it would be an interesting attempt to see if we can create a network that can accurately classify the genre as we believe that the cover art sets the tone for an album and that there are many giving signs for this. Essentially, the differences in styles can be obvious to human observers and we wanted to test if machines can detect them as well. A total of 18,584 album covers and 18,584 labels from the MUMU dataset were acquired , with 80% allocated to training, 10% to testing and 10% to development. The album covers were expected to be classified as one of the following labels: 0 - Alternative Rock, 1 - Christian, 2 - Classical, 3 - Country, 4 - Dance & Electronic, 5 - Folk, 6 - Jazz, 7 - Latin Music, 8 - Metal, 9 - R&B, 10 - Rap & Hip-Hop, 11 Reggae. 

 The MUMU dataset was first downloaded and sorted into directories using the code found here: https://github.com/koenig125/album-artwork-classification/blob/single-label-classifier/build_dataset.py

 Next, we processed the train, test, and dev data into usable numpy arrays.

In [None]:
data_dir = '..\\album-artwork-classification\\data\\300x300_MUMU'  # local path
train_dir = data_dir + '\\train\\images'
test_dir = data_dir + '\\test\\images'
dev_dir = data_dir + '\\dev\\images'
train_genres = data_dir + '\\train\\genres\\y_train.npy'
test_genres = data_dir + '\\test\\genres\\y_test.npy'
dev_genres = data_dir + '\\dev\\genres\\y_dev.npy'

# filenames of the image files
train_filenames = sorted(
    [os.path.join(train_dir, f) for f in os.listdir(train_dir) if f.endswith('.jpg')],
    key=os.path.getctime)
test_filenames = sorted(
    [os.path.join(test_dir, f) for f in os.listdir(test_dir) if f.endswith('.jpg')],
    key=os.path.getctime)
dev_filenames = sorted(
    [os.path.join(dev_dir, f) for f in os.listdir(dev_dir) if f.endswith('.jpg')],
    key=os.path.getctime)

# binary vectors representing all genres
y_train = np.load(train_genres)
y_test = np.load(test_genres)
y_dev = np.load(dev_genres)

photos_dev = []
photos_train = []
photos_test = []
for file in dev_filenames:
    photo = tf.keras.preprocessing.image.load_img(file, target_size=(128, 128))
    #convert to numpy array
    photo = tf.keras.preprocessing.image.img_to_array(photo)
    #store
    photos_dev.append(photo/255)
x_dev = np.asarray(photos_dev)
print('done dev')

for file in train_filenames:
    photo = tf.keras.preprocessing.image.load_img(file, target_size=(128, 128))
    photo = tf.keras.preprocessing.image.img_to_array(photo)
    photos_train.append(photo/255)
x_train = np.asarray(photos_train)
print('done train')

for file in test_filenames:
    photo = tf.keras.preprocessing.image.load_img(file, target_size=(128, 128))
    photo = tf.keras.preprocessing.image.img_to_array(photo)
    photos_test.append(photo/255)
x_test = np.asarray(photos_test)
print('done test')

print("x_dev shape:", x_dev.shape)
print("x_train shape:",x_train.shape)
print("x_test shape:",x_test.shape)

print("y_dev shape:", y_dev.shape)
print("y_train shape:",y_train.shape)
print("y_test shape:",y_test.shape)

plt.imshow(x_dev[0])
plt.xticks([])
plt.yticks([])
plt.show()
print("y_dev:",y_dev[0])

plt.imshow(x_train[0])
plt.xticks([])
plt.yticks([])
plt.show()
print("y_train:",y_train[0])

plt.imshow(x_test[0])
plt.xticks([])
plt.yticks([])
plt.show()
print("y_test:",y_test[0])


The genre labels are integers from 0 to 11, corresponding to the following genres:

In [None]:
genre_list = ['Metal', 'Alternative Rock', 'Dance & Electronic', 'Rap & Hip-Hop', 'R&B',
              'Jazz', 'Folk', 'Country', 'Latin Music', 'Reggae', 'Classical', 'Christian']
genre_list.sort()
for i in range(len(genre_list)):
    print("Genre:", i, genre_list[i])

Genre: 0 Alternative Rock
Genre: 1 Christian
Genre: 2 Classical
Genre: 3 Country
Genre: 4 Dance & Electronic
Genre: 5 Folk
Genre: 6 Jazz
Genre: 7 Latin Music
Genre: 8 Metal
Genre: 9 R&B
Genre: 10 Rap & Hip-Hop
Genre: 11 Reggae


## 1. Baseline Model

We started classfication with the base model given to us from assignment 2.
We ensured to use a softmax output, sparse categorical cross entropy layer because we have 12 labels and the adam optimizer.

In [None]:
def my_loss(y_true, y_predict):
    return (y_true-y_predict)**2

a = []
b = []
c = []
d = []


weight_decay = 1e-4
for _ in range(5):    
        model = tf.keras.models.Sequential([
          tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
          tf.keras.layers.Flatten(),    
          tf.keras.layers.Dense(64, activation='relu'),
          tf.keras.layers.Dense(12, activation='softmax')
        ])
        model.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                     )
        
        model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))        
        
        a.append(model.history.history['accuracy'])
        b.append(model.history.history['loss'])
        c.append(model.history.history['val_loss'])
        d.append(model.history.history['val_accuracy'])
        
print("a:",a)

plt.figure(figsize=(12,5))
plt.subplot(1, 2, 1)
plt.plot(model.history.history['accuracy'], c='k')
plt.ylabel('training accuracy')
plt.xlabel('epochs')
plt.twinx()
plt.plot(model.history.history['loss'], c='b')
plt.ylabel('training loss (error)')
plt.title('training')
#plt.show()


plt.subplot(1, 2, 2)
plt.plot(model.history.history['val_accuracy'], c='k')
plt.ylabel('testing accuracy')
plt.xlabel('epochs')
plt.twinx()
plt.plot(model.history.history['val_loss'], c='b')
plt.ylabel('testing loss (error)')
plt.title('testing')
plt.tight_layout()
plt.show()

#PLOTTING 10 MODELS

for i in range(5):
    plt.plot(a[i], label = str(i+1))
plt.ylabel('training accuracy')
plt.xlabel('epochs')
plt.title('Training for 10 runs')
plt.legend()
plt.show()

for i in range(5):
    plt.plot(d[i], label = str(i+1))
plt.ylabel('testing accuracy')
plt.xlabel('epochs')
plt.title('Testing for 10 runs')
plt.legend()
plt.show()




#Part 1 question 2

for i in range(5):
    plt.plot(b[i], label = str(i+1))
plt.ylabel('training error')
plt.xlabel('epochs')
plt.title('Training for 10 runs')
plt.legend()
plt.show()

for i in range(5):
    plt.plot(c[i], label = str(i+1))
plt.ylabel('testing error')
plt.xlabel('epochs')
plt.title('Testing for 10 runs')
plt.legend()
plt.show()






Plot results for Model 1:
<img src='https://drive.google.com/uc?id=1b9YhZ_6-EnkH0BNpgcwCUJQd1gLWqg-d'>

<img src='https://drive.google.com/uc?id=1NXC2Zrk2Ydku-sa4dAGPH2Qz8SgqdMEz'>

<img src='https://drive.google.com/uc?id=1GN2EUJCb92AYcTd7ZD4YGbCc1lFsQTES'>

<img src='https://drive.google.com/uc?id=1r17ACNar-QCKweaC0pS03rooZfGnN5i3'>

<img src='https://drive.google.com/uc?id=1rTlf25EQwjTaFopLED6lyquy6Zydvn2G'>

Overfitting is when the testing error starts increasing while the training error continues to decrease. We assumed that the model was quickly learning how to model the training data and not the testing data. Methods will be introduced to address the overfitting occuring in the architecture.

## 2. Model with Activity regularizers

Activity regularization is an approach to encourage a neural network to learn sparse features. The regularizer function applied to the output of the layer (its "activation") and is mostly used to regularize hidden units. These update the general cost function by adding another term known as the regularization term. As a result of the additional regularization term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it also reduces overfitting significantly. 

Additionally we attempted to use 90x90 pixelated images and have the same dimesions for the first input layer.


In [None]:
for _ in range(10):    
        model = tf.keras.models.Sequential([
          tf.keras.layers.Conv2D(90, (3, 3), activation='relu', input_shape=(90, 90, 3)),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Conv2D(64, (3, 3), activation='relu', activity_regularizer= tf.keras.regularizers.l1(0.001)),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Conv2D(64, (3, 3), activation='relu', activity_regularizer= tf.keras.regularizers.l1(0.001)),
          tf.keras.layers.Flatten(),    
          tf.keras.layers.Dense(64, activation='relu'),
          tf.keras.layers.Dense(12, activation='softmax')
        ])
        model.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                     )
        
        model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))        
        
        a.append(model.history.history['accuracy'])
        b.append(model.history.history['loss'])
        c.append(model.history.history['val_loss'])
        d.append(model.history.history['val_accuracy'])


actual output from network: [0.24964698 0.02952243 0.01603668 0.06710279 0.11900116 0.05433126
 0.14699052 0.07987288 0.10439866 0.07019878 0.0367722  0.02612562]
category (the largest output): 0

Plot results for Model 2:
<img src='https://drive.google.com/uc?id=1li8YyC3c88SJb4uP23DBDwsMqxT7LiZI'>

<img src='https://drive.google.com/uc?id=1uUEc3lh6H_B91coAitxXvWOEVbv_LwCt'>

## 3.  Convolutional Model with Higher Learning rate

A higher learning rate of 0.01, compared to the other models which used 0.001, was tried to see the effects on the model

In [None]:
# Trying a higher learning rate
fig, ax = plt.subplots(4, figsize=(12,44))
for n in range(10):
    print(f'run {n}')
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Flatten(),    
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(12, activation='softmax')
    ])
    
    opt = tf.keras.optimizers.Adam(learning_rate=0.01)
    model.compile(optimizer=opt,
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )

    model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), verbose=0)
    ax[0].plot(model.history.history['accuracy'])
    ax[1].plot(model.history.history['val_accuracy'])
    print('Validation accuracy: ', model.history.history['val_accuracy'])

plt.title('Basic Convolutional Model w/ LRate=0.01')
ax[0].set_xlabel('epochs')
ax[0].set_ylabel('training accuracy')
ax[1].set_xlabel('epochs')
ax[1].set_ylabel('testing accuracy')
plt.show()

**Plot results for Model 3:**

Training Accuracy
<img src='https://drive.google.com/uc?id=1odtm47GOVLq4Ege5x0BDqXcapV3olc2z'>

Testing Accuracy
<img src='https://drive.google.com/uc?id=1LZSWkjX4ZqFUlgy8ynFnNibu4KYmxb58'>

As seen in the plots above, every run produced the same result of ~25.7% accuracy, which did not increase or decrease through 10 epochs.

## 4.1 Convolutional Model with Dropout Layers 1

Another way to address overfitting is adding dropout layers. Dropout is another regularization method where some number of layer outputs are randomly dropped out during training. This creates the effect of making the layer appear to be a layer with a different number of nodes and connectivity to the previous layer. Essentially, while training it is performed with a different view of the layer through each update.

In [None]:
        model = tf.keras.models.Sequential([
          tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
          tf.keras.layers.MaxPooling2D((2, 2)),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
          tf.keras.layers.Flatten(),    
          tf.keras.layers.Dense(64, activation='relu'),
          tf.keras.layers.Dropout(0.2),
          tf.keras.layers.Dense(12, activation='softmax')
        ])
        
        model.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                     )
        
        model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))        
        
        a.append(model.history.history['accuracy'])
        b.append(model.history.history['loss'])
        c.append(model.history.history['val_loss'])
        d.append(model.history.history['val_accuracy'])

**Plot results for Model 4.1:**

Training Accuracy

<img src='https://drive.google.com/uc?id=1eSz8m_DQSCGj45PiINS5m7zKA28lAMK8'>

Testing Accuracy

<img src='https://drive.google.com/uc?id=1xesEIdydZQnIPsMBvj1brwOj__lttLFI'>

Training Loss

<img src='https://drive.google.com/uc?id=1MNGnZI9tMLlqvqzFRK3EfO0USSousY7t'>

Testing Loss

<img src='https://drive.google.com/uc?id=111eDIKsMQGANy6SVNgKjl3i2FQcaMdNQ'>

While the training accuracy continues to increase from ~25% to between 40-55% accuracy in 10 epochs, the testing accuracy actually begins to decrease after 4-6 epochs. This decrease indicates that there is still overfitting in the model

## 4.2 Convolutional Model with Dropout Layers 2

We decided to run the same model as seen above except this time we created 128 neurons in the input layer and keep the neurons in the following layers to be the same size. We decided to implement this change as it is typically better practice to have the input neurons be the same dimensions as the pixels of the image.

In [None]:
# Change the input layer to 128 neurons

fig, ax = plt.subplots(2, figsize=(12,24))
for n in range(10):
    print(f'run {n}')
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(128, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.Flatten(),    
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(12, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )

    model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test), verbose=0)
    ax[0].plot(model.history.history['accuracy'])
    ax[1].plot(model.history.history['loss'])
    ax[2].plot(model.history.history['val_accuracy'])
    ax[3].plot(model.history.history['val_loss'])
    print('Validation accuracy: ', model.history.history['val_accuracy'])

plt.title('Basic Convolutional Model w/ 128 input neurons and 2 Dropout layers')
ax[0].set_xlabel('epochs')
ax[0].set_ylabel('training accuracy')
ax[1].set_xlabel('epochs')
ax[1].set_ylabel('training loss')
ax[2].set_xlabel('epochs')
ax[2].set_ylabel('testing accuracy')
ax[3].set_xlabel('epochs')
ax[3].set_ylabel('testing loss')
plt.show()

**Plot results for Model 4.2:**
Shown in order: Training Accuracy, Training Loss, Testing Accuracy, Testing Loss

<img src='https://drive.google.com/uc?id=1s0JEdAzhnwm2JXAKybPpCLLaUFONF9C-'>

As observed in the images above the models training loss improves after the 6th epoch and the testing loss begins to worsen. To address this we attempt to use early-stopping.

## 5. Early Stopping 

In [None]:
for _ in range(10):              
    model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(128, 128, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(12, activation='softmax')
    ])
    
        
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )
    
    #model.fit(x_train_more, y_train_more, epochs=6, validation_data=(x_test, y_test))       
    model.fit(x_train, y_train, epochs=6, validation_data=(x_test, y_test))        
    
    a.append(model.history.history['accuracy'])
    b.append(model.history.history['loss'])
    c.append(model.history.history['val_loss'])
    d.append(model.history.history['val_accuracy'])

**Plot results for Model 5:**

<img src='https://drive.google.com/uc?id=1cWZlHJ-K1uI2x6p6yIEoghlei4Loyjtl'>

<img src='https://drive.google.com/uc?id=1HK6OrLCA4yY2HycWwoaN6DQbuO344epE'>

<img src='https://drive.google.com/uc?id=13N0b7L9LeRmR9VDZzh-pxNoL6k1cuPHP'>

<img src='https://drive.google.com/uc?id=1KQfSuKAuKh11ivuymhPS-8VDuIYpyM4W'>

<img src='https://drive.google.com/uc?id=1inBgpGMnFiP3srb2KsHaFFZ3uZwn0NID'>

From the plots as seen above we can observe that this model performs better and addresses the overfitting that was occurring previously. However, the testing accuracy is still not that much better. We will now explore methods to improve the accuracy of the model.

## 6. Acquiring More Training Data 

A very common method to improve the accuracy of a model is by increasing the amount of training data. Although we could not acquire more data externally, we decided to include the development set to the training set to create a bigger training set.

In [None]:
"..."
x_train_more = asarray(photos_train.append(photos_dev))
y_train_more = y_train.append(y_dev)
"..."

for _ in range(10): 
    model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(128, 128, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(12, activation='softmax')
    ])
    
        
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )
    
    model.fit(x_train_more, y_train_more, epochs=6, validation_data=(x_test, y_test))       
    #model.fit(x_train, y_train, epochs=6, validation_data=(x_test, y_test))        
    
    a.append(model.history.history['accuracy'])
    b.append(model.history.history['loss'])
    c.append(model.history.history['val_loss'])
    d.append(model.history.history['val_accuracy'])

actual output from network: [0.2710606  0.00456741 0.0137594  0.01716287 0.18178785 0.03163417
 0.14130108 0.01684094 0.22918765 0.05834203 0.03085023 0.00350579]

category (the largest output): 0

[[806  83  48 186 386 150 389 254 285 203 109  58]

[  0   0   0   0   0   0   0   0   0   0   0   0]  

[  0   0   0   0   0   0   0   0   0   0   0   0] 

[  0   0   0   0   0   0   1   0   0   0   0   0]

[ 16   4   2   2  16   3   9   2   1   4   0   0]

[  0   0   0   0   0   0   0   0   0   0   0   0]

[ 74  12   4  24  35  22  91  31  21  39   3   6]

[  2   1   0   7   1   0   1   8   1   3   0   0]

[ 58   5   5  11  24  16  33  15  74  15  14   6]

[  3   2   0   9   5   4  10   3   2   9   5   1]

[  0   0   0   0   0   0   0   0   0   0   1   0]

[  0   0   0   0   0   0   0   0   0   0   0   0]]

**Plot results for Model 6:**

<img src='https://drive.google.com/uc?id=1kril7TF-9Fo7kW5FWUZzJjioGrlgMnig'>

<img src='https://drive.google.com/uc?id=17BwHCOFBRILPYnaREOpn8gzxM7ec-q8Z'>

<img src='https://drive.google.com/uc?id=1BdA0RBvp0RP9Q1eRq-SDiN1fPbXbmk6-'>

<img src='https://drive.google.com/uc?id=1Bm2Rawu_VRAu4YkVaTpF77jsqdXJEm8Y'>

**This** model is slightly better than the model seen above by 1 or 2 percent and less randomized.

## 7.1 Data Augmentation with Initial Train Set, Batch Size of 64

Data augmentation is another method of reducing overfitting. It essentially simulates a larger training set by applying image manipulations such as flips, shifts, grain, brightness changes, and colour variations to the dataset at random to create additional images.

In [None]:
# Adding data augmentation
# Version 1: just rotation, shifts, and flips
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        rotation_range=45,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')
datagen.fit(x_train)

# Data augmentation version 1
fig, ax = plt.subplots(4, figsize=(12,24))
for n in range(10):
    print(f'run {n}')
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.Flatten(),    
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(12, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )

    model.fit(datagen.flow(x_train, y_train, batch_size=64),
              steps_per_epoch=len(x_train)/64, epochs=6, verbose=0,
              validation_data=(x_test, y_test))
    ax[0].plot(model.history.history['accuracy'])
    ax[1].plot(model.history.history['val_accuracy'])
    ax[2].plot(model.history.history['loss'])
    ax[3].plot(model.history.history['val_loss'])
    print('Validation accuracy: ', model.history.history['val_accuracy'])

plt.title('Convolutional Model w/ 4 hidden layers')
ax[0].set_xlabel('epochs')
ax[0].set_ylabel('training accuracy')
ax[1].set_xlabel('epochs')
ax[1].set_ylabel('testing accuracy')
ax[2].set_xlabel('epochs')
ax[2].set_ylabel('training loss')
ax[3].set_xlabel('epochs')
ax[3].set_ylabel('testing loss')
plt.show()

**Plot results for Model 7.1:**

Training and Testing Accuracy

<img src='https://drive.google.com/uc?id=11exgBLVG4nFT7xLbPqsJnQ8Bc4DS5Q4s'>

Training and Testing Loss
<img src='https://drive.google.com/uc?id=1IoJWp9qSCtZ-ke24Q_HKqBlj8VZYGwfN'>

This model showed somewhat less of a disparity between the training and testing accuracy, indicating that the combination of early stopping and data augmentation helped to address the overfitting. However, there is still room for improvement in both reducing overfitting, as shown in the testing accuracy plot, and reducing error.

## 7.2 Data Augmentation with Image Recolouring and Larger Train Set

The next model maintains the early stopping at 6 epochs and data augmentation, but adds more features to the data augmentation and incorporates the dev data into the training set for a larger dataset.

In [None]:
# Before trying a second round of data augmentation,
# combine the dev and train datasets for more data 

# binary vectors representing all genres
y_train = np.load(train_genres)
y_dev = np.load(dev_genres)
y_train_more = np.concatenate([y_dev,y_train])

photos_train = []
photos_train_more = []
for file in dev_filenames:
    photo = tf.keras.preprocessing.image.load_img(file, target_size=(128, 128))
    #convert to numpy array
    photo = tf.keras.preprocessing.image.img_to_array(photo)
    #store
    #photos_dev.append(photo/255)
    photos_train_more.append(photo/255)
x_dev = np.asarray(photos_dev)
print('done dev')

for file in train_filenames:
    photo = tf.keras.preprocessing.image.load_img(file, target_size=(128, 128))
    photo = tf.keras.preprocessing.image.img_to_array(photo)
    photos_train.append(photo/255)
    photos_train_more.append(photo/255)
x_train_more = np.asarray(photos_train_more)
print('done train')

print("x_train shape:",x_train.shape)
print("x_train_more shape:",x_train_more.shape)
print("x_test shape:",x_test.shape)

print("y_train shape:",y_train.shape)
print("y_train_more shape:",y_train_more.shape)
print("y_test shape:",y_test.shape)

In [None]:
# Version 2: addition of image colouring augmentation
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        rotation_range=45,
        width_shift_range=0.2,
        height_shift_range=0.2,
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        brightness_range=(0.75,1.25),
        zca_whitening=False,
        zca_epsilon=1e-06,
        fill_mode='nearest')
datagen.fit(x_train)

# Data augmentation version 2
fig, ax = plt.subplots(4, figsize=(12,24))
for n in range(10):
    print(f'run {n}')
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.Flatten(),    
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(12, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )

    model.fit(datagen.flow(x_train_more, y_train_more, batch_size=64),
              steps_per_epoch=len(x_train_more)/64, epochs=6, verbose=0,
              validation_data=(x_test, y_test))
    ax[0].plot(model.history.history['accuracy'])
    ax[1].plot(model.history.history['val_accuracy'])
    ax[2].plot(model.history.history['loss'])
    ax[3].plot(model.history.history['val_loss'])
    print('Validation accuracy: ', model.history.history['val_accuracy'])

plt.title('Data Augmentation')
ax[0].set_xlabel('epochs')
ax[0].set_ylabel('training accuracy')
ax[1].set_xlabel('epochs')
ax[1].set_ylabel('testing accuracy')
ax[2].set_xlabel('epochs')
ax[2].set_ylabel('training loss')
ax[3].set_xlabel('epochs')
ax[3].set_ylabel('testing loss')
plt.show()

**Plot results for Model 7.2:**

Training Accuracy

<img src='https://drive.google.com/uc?id=12dzGcojFX2w_fhnOjTrx-jpve8lYF_uk'>

Testing Accuracy

<img src='https://drive.google.com/uc?id=13duiSgH8p_6dYsnQhwzwTW6mL3vJswyF'>

Training Loss

<img src='https://drive.google.com/uc?id=1_04aub3xJbsxucCdlWKMjXnl7q388_Gd'>

Testing Loss

<img src='https://drive.google.com/uc?id=1iMITrHi4r2XcsK2Y0zChN_RmPvvzhYTM'>

This model showed significant improvements in the testing loss, where the model in **8.1** ranged from 3-10, in this model loss is under 2.25. The next model will apply one final modification to this one.

## 7.3 Data Augmentation with Batch Size of 32 

This model is almost identical to the previous model, using the larger training set and data augmentation. However, the batch size of the `datagen.flow` step has been reduced from 64 to 32.

In [None]:
# Data augmentation version 3
fig, ax = plt.subplots(4, figsize=(12,24))
for n in range(10):
    print(f'run {n}')
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.Flatten(),    
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(12, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )

    model.fit(datagen.flow(x_train_more, y_train_more, batch_size=32),
              steps_per_epoch=len(x_train_more)/32, epochs=6, verbose=0,
              validation_data=(x_test, y_test))
    ax[0].plot(model.history.history['accuracy'])
    ax[1].plot(model.history.history['val_accuracy'])
    ax[2].plot(model.history.history['loss'])
    ax[3].plot(model.history.history['val_loss'])
    print('Validation accuracy: ', model.history.history['val_accuracy'])

ax[0].set_xlabel('epochs')
ax[0].set_ylabel('training accuracy')
ax[1].set_xlabel('epochs')
ax[1].set_ylabel('testing accuracy')
ax[2].set_xlabel('epochs')
ax[2].set_ylabel('training loss')
ax[3].set_xlabel('epochs')
ax[3].set_ylabel('testing loss')
plt.show()

**Plot results for Model 7.3:**

Training Accuracy

<img src='https://drive.google.com/uc?id=1IT0xIzb-mzMGVI05dy4U5gunzrLWr9k-'>

Testing Accuracy

<img src='https://drive.google.com/uc?id=1kKmJxAjgD6W888gx-J4LzKQvePWh4zrx'>

Training Loss

<img src='https://drive.google.com/uc?id=1Kyzf0mHk2MWk3FfmecFtbMsJF4p8x3ei'>

Testing Loss

<img src='https://drive.google.com/uc?id=1pr3Q9V5wovgzzHSt-xZ9PNlmgWwSFMWQ'>

This final model did not show a very noticeable improvement over the previous model. However, the overfitting has been reduced, and it may be possible to see an increase in accuracy with more epochs. After closer inspection of the labelling distribution in the training dataset, we realized that there is a large imbalance in regards to how the labels are distributed in the training set which is likely causing the algorithm to be biased towards the majority values present.

Looking at the distributions of each genre within the dataset, we observed that there is an uneven distribution of each genre, which likely effects the accuracy of the classification. To combat this, we tried implementing Class Weights.
<img src='https://drive.google.com/uc?id=1ikhtd11cfP2N5Sdl5j_hTLwoxN4mXtXm'>

## 8. Adding Class Weights

To account for this imbalance in labels we decided to add class weights which will influence the classification of the classes during the training phase. The main purpose is to penalize the misclassification made by the minority class by setting a higher class weight and at the same time reducing weight for the majority class. 

In [None]:
class_weights = class_weight.compute_class_weight('balanced',np.unique(y_train_more), y_train_more)
weight = {i : class_weights[i] for i in range(12)}

weight_decay = 1e-4
for _ in range(10):              
    model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(128, 128, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(12, activation='softmax')
    ])
      
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy']  # in addition to the loss, also compute the categorization accuracy
                 )
    
    model.fit(x_train_more, y_train_more, epochs=6, validation_data=(x_test, y_test), class_weight = weight)       
    #model.fit(x_train, y_train, epochs=6, validation_data=(x_dev, y_dev))  

actual output from network: [0.08180414 0.03029461 0.0525596  0.03892834 0.17196111 0.04264539
 0.08377606 0.042708   0.12648761 0.06940803 0.20566058 0.05376663]
category (the largest output): 10
[[ 19   0   0   0   8   0  10   3   4   0   2   0]
 [ 80   9   4  18  28  15  25  19  20  13   8   5]
 [217  29   8  51  99  46 108  73  45  60  16   8]
 [ 43   8   4  20  16  18  24  22   9  17   3   5]
 [ 99   9   3  18  64  15  42  22  25  10  11   8]
 [ 21   4   2   3  11   4  17   9   6   6   1   0]
 [ 25   3   2   9  18   7  24   7   7   9   5   3]
 [ 37  11   5   8  22  14  31  32  14  16   5   6]
 [176  13  14  37  82  31 107  41 158  45  24  11]
 [ 94  11   8  43  59  25  84  44  49  51  10  16]
 [118   6   6  24  45  14  41  27  41  34  44   7]
 [ 30   4   3   8  15   6  21  14   6  12   3   2]]

**Plot results for Model 8:**

<img src='https://drive.google.com/uc?id=10Y7onkbIWgwS3FVosfz0_kqs5ru8--tv'>

<img src='https://drive.google.com/uc?id=1o-S56M0cHmNFmZIA4Gtunz85OcCATcqW'>

<img src='https://drive.google.com/uc?id=1igIiWlrd6YJ3TT9YtSZeMGL8APt2jw1B'>

<img src='https://drive.google.com/uc?id=12E14ja-4reKyqNrcVjHGqaNMKaS7xidu'>

With this we see that the model does not perform that well however we believe it is a better representative of the machine learning model that we are trying to achieve. We will discuss this in the conclusion below. 

## Conclusion

In conclusion, we have a few models that we think were successful and reported worthy results. First, better performing models were the ones that had early stopping and drop out layers to consider overfitting. The model with more data to work with from the dev showed better performance. The models which used data augmentation also showed less overfitting, because they simulated more data by creating slight variations in the images. Our last model that we suspect is most accurate is the one that considers the class weights. Although it did worse than the other models we suspect that it is more accurate because it takes into consideration the disproportionate number of data points from class to class. We should have started developping our model with class weights first to achieve better and more accurate models. The reason that our other models were attaining better "validation accuracy" was because those models were biased towards the first label, which make. The model would predict the test label for most cover arts to be the first label, which was oftentimes the case anyways since the test/training data were equally disproportionate of the classes. This explains why we were able to get 25% accuracies but if we considered the class-weights it would have likely been outputting lower accuracies within the 10-20% range.

From the models that we have developed over the course of this project, we may suspect that the issue stems from the nature of the object that we are trying to classify and the given labels.  When we attempt to classify the genre of an image, this can be more complex than just trying to classify its shape. In other datasets like the MNIST hand-written digits or the CIFAR-10 images that we dealt with in class, they were able to take out structures or distinguishable patterns for object detection which can be consistent across a particular label. However, in our case, it can be difficult to identify an agreeable pattern pertaining to each album cover. Moreover, it is not simply the classification of what is inside the image, but the capability to connect the objects within the image and determine the style or aesthetic that is indicative of the genre. 
Additionally, other well-performing models have significantly much more data to build their model on, i.e the CIFAR-10 has 60,000 and other more complex machine-learning models have hundreds of thousands of images for their model. Our dataset had a maximum number of 14,581 images to train our network on. We also would like to acknowledge that our results may have also underperformed as a consequence of how our labels were grouped. Many of the genres are similar to one another or can be considered sub-genres of the other. For instance, our first label is Rock and our ninth label is Metal, where metal is oftentimes considered a sub-genre of the main genre Rock. A way to address this in the future is multi-labelling the cover art, specifically by their sub-genres. The sub-genres can be easier to classify than the main genre with more specific features pertaining to one as opposed to a range seen in the main genre.

### Future approaches

Given the time and the computer resources, we could test on other CNN models such as AlexNet, Resnet34, Resnet 18 and Resnet152. We would also like to explore the option of testing on pre-trained versions of the model architectures mentioned above. Some have attempted album cover art classification with the approaches mentioned above and yet have also acknowledged that they were unsuccessful due to the data that the model had pre-trained on, one of them being CIFAR-10, which would justify the poor performance on the album cover/ genre dataset where the classification is far less objective [1]. Others have also reported success in pre-trained models, specifically the VGG16 which was trained on the imagenet dataset [2]. We would like to explore the latter approach and see how our data would perform with a model like so. Additionally, we would like to see how well our current model would perform in multi-labelling album cover art and see how that would perform on other more complex models.





[1] Nathan Lee and Robert Baraldi. CSE 546 final paper predicting musical genre from album cover art.

[2] Jonathan Li, Tongxin Cai and Di Sun,.CSE 230 Stanford University Genre Classification via Album Cover.

[3] Akshi Kumar, Arjun Rajpal, and Dushyant Rathore. Genre classification using feature extraction and deep learning techniques. pages 175–180, 2018.

[4] Christian Koenig. Classifying album genres by artwork. 2017. 

*Dataset:*

Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916
