# Welcome to the image classification notebook

This notebook will cover the following areas:
 - visualize images in python
 - perform feature reduction on the images using **Gabor filter**
 - divide the data into training and testing splits
 - build a simple **support vector machines** (SVMs) classifier
 - perform hyper-parameter tuning

---

First we import some libraries:

In [None]:
# ___Cell no. 1___

import matplotlib.pyplot as plt
import numpy as np
import glob

#imports for the Gabor filter (used for feature )
from scipy import ndimage as ndi
from skimage.filters import gabor_kernel
from scipy.stats import kurtosis, skew
#
from sklearn.model_selection import train_test_split 
from sklearn import datasets,  metrics
from sklearn.svm import SVC


---

#### Gabor features

In [None]:
# ___Cell no. 2___

# first we will define a function that will use Gabor filters to reduce the images to a constant set of features
#define Gabor features
def compute_feats(image, kernels):
    feats = np.zeros((len(kernels), 2), dtype=np.double)
    for k, kernel in enumerate(kernels):
        filtered = ndi.convolve(image, kernel, mode='wrap')
        #feats[k, 0] = filtered.mean()
        #feats[k, 1] = filtered.var()
        feats[k, 0] = kurtosis(np.reshape(filtered,-1))
        feats[k, 1] = skew(np.reshape(filtered,-1))
    return feats

In [None]:
# ___Cell no. 3___

# prepare Gabor filter bank kernels
kernels = []
for sigma in (1,4):
    theta = np.pi
    for frequency in (0.05, 0.25):
        print('theta = {}, sigma = {} frequency = {}'.format(theta, sigma, frequency) )
        kernel = np.real(gabor_kernel(frequency,theta=theta,sigma_x=sigma, sigma_y=sigma))
        kernels.append(kernel)
                         
np.array(kernels, dtype=object).shape

#### Load Images

In [None]:
# ___Cell no. 4___

#load zebra into an array
zebrafolder = '/users/hussein/source/zebra_imageClassification/images/zebra/'
zebra_images = glob.glob('{}*.jpg'.format(zebrafolder)) # (array1) just collect all images that ends with 'jpg'

#load non zebra images (others) into an array
otherfolder = '/users/hussein/source/zebra_imageClassification/images/others/'
other_images = glob.glob('{}*.jpg'.format(otherfolder)) #(array2)

In [None]:
# ___Cell no. 5___

print((zebra_images[0])) # showing the first element in the array

the first images is number 376, which shows that the data is not sorted. sorting data in some other cases like time series can be essintial, bit not in this tutorial.

---

**Exercise 1:** try to sort the above arrays (array 1,2) using the sort function.
<br>
##### **hint: use google**
---

In [None]:
#---- code here -----


<br>

**Exercise 2:** try to visualise the first 10 images
<br>
##### **hint: use google**


In [None]:
#---- code here -----


---

#### Building a Machine learning model

Before staring the ML part, there is some preprocessing that needs to be done. The main issue with this dataset is that all the images are of random sizes. To use this as a train/test dataset, we can do two things:

1. Use Convolutional Neural Networks
2. Use an image feature reduction technique. 

Here we're going to use Method 2. The image reduction technique we will use is Gabor Filters to reduce the images to 8 features.

In [None]:
# ___Cell no. 6___

zebra_feats = np.zeros((len(zebra_images),9))
for i, image in enumerate(zebra_images):
    im = plt.imread(image,format='jpeg')
    if len(im.shape) > 2:
        imean = im.mean(axis=2) # converting 3D (RBG) images to 1D (greyscale)
    else:
        imean = im
    imfeats = compute_feats(imean,kernels).reshape(-1) # computing the Gabor features
    zebra_feats[i,:-1] = imfeats  # adding the reduced features
    zebra_feats[i,-1] = 1  # adding the class label 1 for zebra and 0 for other images

In [None]:
# ___Cell no. 7___

nozebra_feats = np.zeros((len(other_images),9))
for i, image in enumerate(other_images):
    im = plt.imread(image,format='jpeg')
    imfeats = compute_feats(im.mean(axis=2),kernels).reshape(-1)
    nozebra_feats[i,:-1] = imfeats 
    nozebra_feats[i,-1] = 0 

In [None]:
# ___Cell no. 8___

print( "The shape of zebra data: " + str( np.array(zebra_feats).shape))
print( "The shape of nonzebra data: " + str( np.array(nozebra_feats).shape))

---

#### combine the datasets


In [None]:
# ___Cell no. 9___

ds = np.concatenate((nozebra_feats,zebra_feats), axis=0)

#### sperate the input from the output

In [None]:
# ___Cell no. 10___

x = ds[:,:-1] # our input, features
y = ds[:,-1]  # the output 

In [None]:
# ___Cell no. 11___

print( "The shape of input data: " + str( np.array(x).shape))
print( "The shape of output data: " + str( np.array(y).shape))

#### split the data

In [None]:
# ___Cell no. 12___

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, shuffle = True, random_state=66)

In [None]:
# ___Cell no. 13___

print('Training data and target sizes: \n{}, {}'.format(X_train.shape,y_train.shape))
print('Test data and target sizes: \n{}, {}'.format(X_test.shape,y_test.shape))

---

**Exercise 3:** Once you've run through the tutorial, come back to this point and see what difference changing the relative size of your train:test datasets makes.

---

#### Create a classifier

In [None]:
# ___Cell no. 14___

# a support vector classifier
classifier = SVC(C=1,kernel='rbf',gamma=1)
#fit to the training data
classifier.fit(X_train,y_train)

In [None]:
# ___Cell no. 15___

# now to Now predict the value of the digit on the test data
y_pred = classifier.predict(X_test)

#### Results

In [None]:
# ___Cell no. 16___

print("Confusion matrix:\n%s" % metrics.confusion_matrix(y_test, y_pred))
print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(y_test, y_pred)))

---

#### now with paramter optmization

In [None]:
# ___Cell no. 17___

from sklearn.model_selection import RandomizedSearchCV


In [None]:
# ___Cell no. 18___

C_range = np.logspace(-2, 10, 5) # define a set of values for the parameter C
gamma_range = np.logspace(-9, 3, 5) # define a set of values for the parameter gamma

svm = SVC(kernel="rbf")
svm_par = dict(gamma=gamma_range, C=C_range)

In [None]:
# ___Cell no. 19___

svm_random = RandomizedSearchCV(estimator = svm, param_distributions = svm_par, n_iter = 4, cv = 3, verbose=2, random_state=42, n_jobs = 3)
svm_random.fit(X_train, y_train)


In [None]:
# ___Cell no. 20___

print("the best chosen paramters are: " + str(svm_random.best_params_))

In [None]:
# ___Cell no. 21___

y_pred2 = svm_random.predict(X_test)

In [None]:
# ___Cell no. 22___

print("Confusion matrix:\n%s" % metrics.confusion_matrix(y_test, y_pred2))
print("Classification report for classifier : ")
print ( metrics.classification_report(y_test, y_pred2))

---

**Exercise 4:** Once you've run through the tutorial, change the hyperparameters (SVMs) and observe the difference. You also can try other models like CNNs, or random forest (RF).

---

### You are now ready to start working on a bigger task!
Think of what can we can do to make this more interesting

For example, we can try to replace "zebra" and "others" data sets with ["sad face emoji"](https://www.google.com/search?q=sad+face+emoji&tbm=isch&ved=2ahUKEwisj7j7wI73AhUPahoKHXt-AeMQ2-cCegQIABAA&oq=sad+face+emoji&gs_lcp=CgNpbWcQAzIECAAQQzIECAAQQzIECAAQQzIECAAQQzIECAAQQzIFCAAQgAQyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEOgcIIxDvAxAnUNxAWKNFYNdLaABwAHgAgAHIA4gBmg6SAQcyLTIuMS4ymAEAoAEBqgELZ3dzLXdpei1pbWfAAQE&sclient=img&ei=HG5VYqyRLY_Uafv8hZgO&bih=977&biw=1841&rlz=1C1GCEU_enZA918ZA918&hl=en), ["smiley face emoji"](https://www.google.com/search?q=smiley+face+emoji&tbm=isch&ved=2ahUKEwiyotz3wI73AhUHGRoKHdFUDQ0Q2-cCegQIABAA&oq=smiley+face&gs_lcp=CgNpbWcQARgBMgcIIxDvAxAnMgcIABCxAxBDMgQIABBDMgQIABBDMgQIABBDMgQIABBDMgQIABBDMgQIABBDMgQIABBDMgQIABBDUABYAGCwF2gAcAB4AIAB-AKIAfgCkgEDMy0xmAEAqgELZ3dzLXdpei1pbWfAAQE&sclient=img&ei=FG5VYrLIOYeyaNGptWg&bih=977&biw=1841&rlz=1C1GCEU_enZA918ZA918&hl=en) and then perform binary classifcation.

You will need to
* Pick an interesting binary classification problem
* Get the URLs for both data sets, in two separate files
* Download both datasets
* Perform binary classification
* Draw some conclusions and present your findings