# Machine Learning Fashionista 2.0
In this assignment we revisit the dataset from the dimension reduction unit. The pictures of clothing are all originally taken from ImageNet, which is a large dataset containing over a million photos with many different categories. Every year there is a competition to see which techniques perform the best. The winning entry is then open-sourced and made available to all machine learning researchers for further research or to allow the development of novel applications.

Now we want to compare SVMs and deep neural networks.

In [11]:
#Import necessary packages and libraries

import zipfile
import os
from PIL import Image

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

In [12]:
#get image file paths 
m_files = os.listdir('../fashion-classifier/men')
w_files = os.listdir('../fashion-classifier/women')
print('men:',len(m_files),'| women:',len(w_files))

#merge both datasets 
all_files = m_files + w_files
print('total:',len(all_files))

#convert image files into arrays
def read_img(file,name):
    images = np.zeros((14700))
    for image in file: 
        arr = Image.open(name+'/'+image) #get img array 
        img = arr.resize((70,70)) #resize for standard sizing
        arr = np.asarray(img) #turn into array
        img.close()
        flatten = arr.flatten() #flatten
        images = np.vstack((images,flatten)) #stack 
    images = np.delete(images, 0, 0)
    return images

#read images from files 
men = read_img(m_files[:500],'men')
women = read_img(w_files[:500],'women')

#turn into dataframes and add class labels
men = pd.DataFrame(men)
men['label'] = 0
women = pd.DataFrame(women)
women['label'] = 1

#merge and shuffle
fashion = men.append(women,ignore_index=True)
fashion = shuffle(fashion)

#separate dependent and independent variables
X = fashion.loc[:, fashion.columns != 'label']
y = fashion['label']

#see dataframe
fashion.head()

#split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=42)

men: 1242 | women: 1270
total: 2512


## 1. Support Vector Machines
- Train a support vector classifier using each of the following kernels:
    - Linear
    - Poly (degree = 2)
    - RBF

- If you encounter any issues with training time or memory issues, then you may use a reduced dataset, but carefully detail why and how you reduced the dataset. Unnecessarily reducing the dataset will result in reduced grades!
- Report your error rates on the testing dataset for the different kernels.

In [None]:
#LINEAR KERNEL 

In [None]:
#POLY KERNEL

In [None]:
#RBF KERNEL

## 2. Deep Neural Networks
Using Keras load the VGG16 network. This is the convolutional neural network which won ImageNet 2014, and the accompanying paper is available here, if you want to read more about it. Keras code to perform this step is available here, under the heading "Extract features with VGG16."

- Perform transfer learning using VGG16.
- What loss function did you choose, and why?
- What performance do you achieve on your test set and how does this compare to the performance you were originally able to achieve with the linear methods?
- (optional) If you want, you can also perform a "fine-tuning" step. In this step we unfreeze the weights and then perform a few more iterations of gradient descent. This fine tuning can help the network specialize its performance in the particular task that it is needed for. Now, measure the new performance on your test set and compare it to the performance from the previous step.

## 3. Comparison
Write a short comparison of the two methods, and provide a brief argument of which method you feel is superior, and why.