## 1.

In [1]:
# In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models,
# $that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning 
# of non-linear models.

# Kernel Function: is a method used to take data as input and transform it into the required form of processing data. “Kernel” is used due to a set
#            of mathematical functions used in Support Vector Machine providing the window to manipulate the data

## 2.

In [2]:
# Self Coded SVM algorithm.
# from sklearn.metrics import accuracy_score.
# from svm2 import SVM.
# svm = SVM(kernel="poly")
# w, b, losses = svm. fit(X_train, y_train)
# pred = svm. predict(X_test)
# Accuracy: 0.7657142857142857.

## 3.

In [3]:
# The value of ε can affect the number of support vectors used to construct the regression function. The bigger ε, the fewer support vectors
# are selected. On the other hand, bigger ε-values results in more flat estimates. "The value of epsilon determines the level of accuracy of the
# approximated function.

## 4.

In [4]:
# Kernels: SVR can use different types of kernels, which are functions that determine the similarity between input vectors. A linear kernel is a 
# simple dot product between two input vectors, while a non-linear kernel is a more complex function that can capture more intricate patterns in the 
# data. The choice of kernel depends on the data’s characteristics and the task’s complexity.

In [5]:
# C:  ‘C’ parameter controls the trade-off between the insensitive loss and the sensitive loss. A larger value of ‘C’ means that the model will try to minimize the insensitive loss more, while a smaller value of C means that the model will be more lenient in allowing larger errors.

In [6]:
#  gamma parameter: defines how far the influence of a single training example reaches, with low values meaning 'far' and high values meaning 'close'.
# The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.

In [7]:
# Epsilon: The value of ϵ defines a margin of tolerance where no penalty is given to errors. Remember the support vectors are the instances across
# the margin, i.e. the samples being penalized, which slack variables are non-zero. The larger ϵ is, the larger errors you admit in your solution.

## 5.

In [8]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer

In [9]:
cancer=load_breast_cancer()

In [10]:
print(cancer.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

In [11]:
cancer.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [12]:
cancer.target_names

array(['malignant', 'benign'], dtype='<U9')

In [13]:
cancer.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [14]:
df=pd.DataFrame(data=cancer.data,columns=cancer['feature_names'])

In [15]:
df

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,...,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,...,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,...,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,...,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,...,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,...,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,...,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,...,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,...,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


In [16]:
X=cancer.data[:,:2]
y=cancer.target

In [17]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=10)

In [18]:
from sklearn.svm import SVC
classifier=SVC()
classifier.fit(X_train,y_train)


In [19]:
y_pred=classifier.predict(X_test)

In [27]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
score=accuracy_score(y_test,y_pred)
print(score)

matrix=confusion_matrix(y_test,y_pred)
print(matrix)

print(classification_report(y_test,y_pred))

0.9181286549707602
[[ 51   8]
 [  6 106]]
              precision    recall  f1-score   support

           0       0.89      0.86      0.88        59
           1       0.93      0.95      0.94       112

    accuracy                           0.92       171
   macro avg       0.91      0.91      0.91       171
weighted avg       0.92      0.92      0.92       171



In [33]:
accuracy=(matrix[0][0]*matrix[1][1])/(matrix[0][0]*matrix[1][1]*matrix[0][1]*matrix[1][0])
print(accuracy)

0.020833333333333332


In [37]:
precision=matrix[0][0]/(matrix[0][0]*matrix[0][1])
precision

0.125

In [52]:
from sklearn.model_selection import GridSearchCV
parameter = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
reg=GridSearchCV(classifier,param_grid=parameter)
reg
reg.fit(X_train,y_train)
y_predict=reg.predict(X_test)

from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
score=accuracy_score(y_test,y_pred)
print(score)

matrix=confusion_matrix(y_test,y_pred)
print(matrix)

print(classification_report(y_test,y_pred))

0.9181286549707602
[[ 51   8]
 [  6 106]]
              precision    recall  f1-score   support

           0       0.89      0.86      0.88        59
           1       0.93      0.95      0.94       112

    accuracy                           0.92       171
   macro avg       0.91      0.91      0.91       171
weighted avg       0.92      0.92      0.92       171



In [50]:
y_pred

array([1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0,
       1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0,
       1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
       1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1])

In [53]:
reg.best_params_

{'C': 10, 'kernel': 'rbf'}

In [54]:
reg.best_estimator_

In [55]:
reg.best_score_

0.901867088607595