# 6 Support Vector Machine
## 6.1 Kernels
Support Vector Machine can [use different kernels](https://en.wikipedia.org/wiki/Kernel_method): linear, radial basis function, polynomial, sigmoid, etc. The difference between some of them can be seen after running the code below that uses a classical example. Besides the usual packages, the *sklearn* package is also used here.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

#take the well-known iris dataset
iris = datasets.load_iris()
#we will use only sepal length and width
x=iris.data[:, :2]
y=iris.target

#plot points
x1, x2=x[:, 0], x[:, 1]
x_min, x_max=x1.min()-1, x1.max()+1
y_min, y_max=x2.min()-1, x2.max()+1
h=0.02
plot_x, plot_y=np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

#regularization
C=1.0  
models=(svm.SVC(kernel="linear", C=C),
          svm.SVC(kernel="rbf", gamma=0.7, C=C),
          svm.SVC(kernel="poly", degree=3, C=C))
models=(model.fit(x, y) for model in models)

# title for the plots
titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")


for model, title in zip(models, titles):
    points=model.predict(np.c_[plot_x.ravel(), plot_y.ravel()]).reshape(plot_x.shape)
    plt.contourf(plot_x, plot_y, points, cmap=plt.cm.coolwarm, alpha=0.8)
    plt.xlim(plot_x.min(), plot_x.max())
    plt.ylim(plot_y.min(), plot_y.max())
    plt.xlabel("Sepal length")
    plt.ylabel("Sepal width")
    plt.title(title)
    
    predicted=model.predict(x);
    print("Accuracy: %.2lf%%"%(100*np.sum(y==predicted)/y.size))
    
    plt.scatter(x1, x2, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
    
    plt.show()

**Tasks**

1. What accuracies are achieved when other features are used as well?
2. Split the dataset into a training and testing part, fit the SVM model on the training part, and test it on the testing part. What gives the highest accuracy?
3. Make the code below give over 90% accuracy and then explain how you achieved it and why did it work.

In [None]:
#1. zadatak

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

#take the well-known iris dataset
iris = datasets.load_iris()
#we will use only sepal length and width
x=iris.data[:, :2]
y=iris.target


#regularization
C=1.0  
models=(svm.SVC(kernel="linear", C=C),
          svm.SVC(kernel="rbf", gamma=0.7, C=C),
          svm.SVC(kernel="poly", degree=3, C=C))
models=(model.fit(x, y) for model in models)

# title for the plots
titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")

for model, title in zip(models, titles):

    predicted=model.predict(x);
    print("Za jezgru "+ title, end="")
    print(" accuracy: %.2lf%%"%(100*np.sum(y==predicted)/y.size))

In [None]:
#2. zadatak

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split

#take the well-known iris dataset
iris = datasets.load_iris()
#we will use only sepal length and width
x=iris.data[]
y=iris.target

#split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=42)


#regularization
C=1.0  
models=(svm.SVC(kernel="linear", C=C),
          svm.SVC(kernel="rbf", gamma=0.7, C=C),
          svm.SVC(kernel="poly", degree=3, C=C))
models=(model.fit(x_train, y_train) for model in models)

# title for the plots
titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")

for model, title in zip(models, titles):

    predicted=model.predict(x_test);
    print("Za jezgru "+ title, end="")
    print(" accuracy: %.2lf%%"%(100*np.sum(y_test==predicted)/y_test.size))

In [None]:
import numpy as np
from sklearn import svm, datasets

n1=400
n2=400

class1=(np.tile(np.random.uniform(low=0.0, high=1, size=n2).reshape((n2, 1)), (1, 2))+3/2)*\
np.array([(np.cos(a), np.sin(a)) for a in np.random.uniform(low=2, high=8, size=n2)])+np.tile(np.array([[3/2, 0]]), (n1, 1))
class2=(np.tile(np.random.uniform(low=0.0, high=1, size=n2).reshape((n2, 1)), (1, 2))+3/2)*\
np.array([(np.cos(a), np.sin(a)) for a in np.random.uniform(low=-1, high=4, size=n2)])
x=np.vstack((class1, class2))
y=np.concatenate((np.ones((n1)), 2*np.ones((n2))))

idx=np.random.permutation(y.size)
x=x[idx, :]
y=y[idx]

s=round((n1+n2)/2)
#s=600

x_train=x[:s, :]
y_train=y[:s]

x_test=x[s:, :]
y_test=y[s:]

#EDIT ONLY FROM HERE...
model=svm.SVC(kernel="rbf")
model.fit(x_train, y_train)
#...TO HERE

predicted=model.predict(x_test);
print("Accuracy: %.2lf%%"%(100*np.sum(y_test==predicted)/y_test.size))


Promjenom jezgre na rbf točnost se podigla iznad 90%. To je radilo jer rbf jezgra može bolje definirati granice tj mogu biti ukošene, a ne samo ravne kao kod linearne.

## 6.2 Wine dataset
Here we are going to make some experiments with the wine dataset to see how features can [affect](https://en.wikipedia.org/wiki/Feature_selection) the classification.

**Tasks**

1. Which SVM kernel will achieve the highest accuracy when all features are used?

Linearna jezgra

2. If you can use **only one** feature and any kernel to achieve highest possible accuracy, which feature and kernel would that be?

Sedmi feature(najčešće najveći) i linearnu jezgru.

3. If you can use **only two** features and any kernel to achieve highest possible accuracy, which feature and kernel would that be?

Deseti i sedmi jer su ta 2 najćešće najveći i linearnu jezgru
4. How do you explain the results?

Za težine granica največi zbroj za sve tri granice donosi najveći zančaj te značajke jer za nebitne značajke nisu visoke težine jer te težine ne utječu na klasifikaciju.

Testiranje u donjem kodu pokazuje da je pretpostavka za 1 znacajku dobro, ali za dvije da dolazi do razlicitih kombinacija, najcesce koristeci RBF jezgru.

In [None]:
from sklearn.datasets import load_wine
wine=load_wine()
x=wine.data
y=wine.target
idx=np.random.permutation(y.size)
x=x[idx, :]
y=y[idx]

#all features
features_idx=range(x.shape[1])
#only some of the features
#features_idx=[0, 1]

x=x[:, features_idx]

s=round(y.size/2)

x_train=x[:s, :]
y_train=y[:s]

x_test=x[s:, :]
y_test=y[s:]

models=(svm.SVC(kernel="linear"),
          svm.SVC(kernel="rbf"),
          svm.SVC(kernel="poly", degree=3))
models=(model.fit(x_train, y_train) for model in models)

titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")

i=1
tezine=[]
for model, title in zip(models, titles):

    if i==1:
      i=0
      tezine=model.coef_

    predicted=model.predict(x_test);
    print("Za jezgru "+ title, end="")
    print(" accuracy: %.2lf%%"%(100*np.sum(y_test==predicted)/y_test.size))

sume_tezina=[]
for w1, w2, w3 in zip(tezine[0], tezine[1], tezine[2]):
    sume_tezina.append(abs(w1)+abs(w2)+abs(w3))

i=1
for suma in sume_tezina:
  print(str(i)+". feature has sum of weights equal to "+ str(suma))
  i+=1

In [None]:
#2. zadatak
from sklearn.datasets import load_wine
wine=load_wine()


#all features
#features_idx=range(x.shape[1])
#only some of the features

max_acc=0
max_model=""
max_znacajka=0

for feature in range(0, 12):
  x=wine.data
  y=wine.target
  idx=np.random.permutation(y.size)
  x=x[idx, :]
  y=y[idx]

  features_idx=[feature]

  x=x[:, features_idx]

  s=round(y.size/2)

  x_train=x[:s, :]
  y_train=y[:s]

  x_test=x[s:, :]
  y_test=y[s:]

  models=(svm.SVC(kernel="linear"),
            svm.SVC(kernel="rbf"),
            svm.SVC(kernel="poly", degree=3))
  models=(model.fit(x_train, y_train) for model in models)

  titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")

  for model, title in zip(models, titles):
      predicted=model.predict(x_test);
      if (100*np.sum(y_test==predicted)/y_test.size)>max_acc:
          max_acc=(100*np.sum(y_test==predicted)/y_test.size)
          max_model=title
          max_znacajka=feature

print("Za jezgru "+ max_model+ " i znacajku "+ str(max_znacajka+1), end="")
print(" accuracy: "+ str(max_acc))

In [None]:
#3. zadatak
from sklearn.datasets import load_wine
wine=load_wine()


#all features
#features_idx=range(x.shape[1])
#only some of the features

max_acc=0
max_model=""
max_znacajka1=0
max_znacajka2=0
for feature1 in range(0, 12):

  for feature2 in range(0, 12):
    if feature1== feature2:
      continue

    x=wine.data
    y=wine.target
    idx=np.random.permutation(y.size)
    x=x[idx, :]
    y=y[idx]

    features_idx=[feature1, feature2]

    x=x[:, features_idx]

    s=round(y.size/2)

    x_train=x[:s, :]
    y_train=y[:s]

    x_test=x[s:, :]
    y_test=y[s:]

    models=(svm.SVC(kernel="linear"),
              svm.SVC(kernel="rbf"),
              svm.SVC(kernel="poly", degree=3))
    models=(model.fit(x_train, y_train) for model in models)

    titles = ("Linear kernel", "RBF kernel", "Polynomial (degree 3) kernel")

    for model, title in zip(models, titles):
        predicted=model.predict(x_test);
        if (100*np.sum(y_test==predicted)/y_test.size)>max_acc:
            max_acc=(100*np.sum(y_test==predicted)/y_test.size)
            max_model=title
            max_znacajka1=feature1
            max_znacajka2=feature2

print("Za jezgru "+ max_model+ " i znacajke "+ str(max_znacajka1+1)+ " i "+ str(max_znacajka2+1), end="")
print(": accuracy: "+ str(max_acc))

## 6.3 Speed
SVM is really great, but it has an important disadvantage with respect to neural networks in general. Here we are going to demonstrate it.

**Tasks**
1. Run the code below for various dataset sizes and each time store the time needed for the model to fit.
2. Draw a plot that shows the influence of dataset size on execution time.
3. How would you model the influence?

Eksponencijalnom funkcijom

4. How would you model the same influence in case of multilayer perceptron?
Vjerojatno nekom polinomijalnom ili linearnom funkcijom

In [None]:
import numpy as np
from sklearn import svm, datasets

def create_data(n1, n2):
    class1=np.c_[np.random.normal(0, 1, size=n1), np.random.normal(0, 1, size=n1)]
    class2=np.c_[np.random.normal(2, 1, size=n2), np.random.normal(0, 1, size=n2)]
    x=np.vstack((class1, class2))
    y=np.concatenate((np.ones((n1)), 2*np.ones((n2))))
    
    return x, y

times=[]
data_sizes=[1000, 5000, 10000, 50000, 100000]
for data_size in data_sizes:
  x, y=create_data(data_size, data_size)
  model=svm.SVC(kernel="linear", C=1.0)
  import time;
  start=time.time()
  model.fit(x, y)
  end=time.time();
  t=end-start
  times.append(t)

plt.plot(data_sizes, times)  