In [None]:
1:
  Polynomial functions and kernel functions are both used as basis functions in machine learning 
algorithms. In fact, polynomial functions can be used as a type of kernel function in certain algorithms
such as support vector machines.
Kernel functions are used to map the input data into a higher-dimensional feature space where the data may
become more separable. Polynomial functions can be used as kernel functions by taking the dot product of two
vectors in the feature space raised to a certain power, which is equivalent to computing the polynomial
function of the original input data.
In summary, while polynomial functions and kernel functions have different mathematical formulations, they both 
serve a similar purpose in machine learning algorithms - to transform the input data into a higher-dimensional
feature space where it may be more easily separated or classified.  

In [None]:
2:
   To implement an SVM with a polynomial kernel in Python using Scikit-learn 

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
svm = SVC(kernel='poly', degree=3)
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)



Accuracy: 0.9777777777777777


In [None]:
3:
  In Support Vector Regression (SVR), epsilon is a hyperparameter that controls the width of 
the epsilon-insensitive zone around the predicted values. Increasing the value of epsilon allows
more training points to be considered as support vectors, as it allows for a larger margin of error.

As the value of epsilon is increased, the number of support vectors generally increases as well.
This is because the larger epsilon value allows for more training points to fall within the 
epsilon-insensitive zone, which means that more training points will need to be considered as 
support vectors to define the regression function.

However, it should be noted that the relationship between epsilon and the number of support 
vectors is not always linear and may depend on the specific dataset and other hyperparameters
used in the SVR algorithm.

In [None]:
4:
  The choice of kernel function, C parameter, epsilon parameter, and gamma parameter can significantly
affect the performance of Support Vector Regression (SVR).

Kernel Function: The kernel function determines the type of decision boundary that the SVR will learn.
Some common kernel functions include linear, polynomial, and radial basis function (RBF). The choice of 
kernel function depends on the complexity and non-linearity of the data. For example, if the data is highly
non-linear, a non-linear kernel such as RBF might perform better than a linear kernel.

C Parameter: The C parameter controls the trade-off between maximizing the margin and minimizing the error.
A smaller value of C will result in a wider margin but more errors on the training set, while a larger value
of C will result in a narrower margin but fewer errors on the training set. Increasing the value of C will 
make the model more complex and may lead to overfitting. Decreasing the value of C will make the model simpler
and may lead to underfitting.

Epsilon Parameter: The epsilon parameter controls the width of the margin around the regression line. It determines
the threshold at which errors are considered acceptable. A larger value of epsilon will result in a wider margin 
and more errors being allowed, while a smaller value of epsilon will result in a narrower margin and fewer errors
being allowed. Increasing the value of epsilon will make the model more tolerant to errors but may result in lower 
accuracy. Decreasing the value of epsilon will make the model less tolerant to errors but may result in higher accuracy.

Gamma Parameter: The gamma parameter determines the shape of the decision boundary. It controls the influence of each
training example on the decision boundary. A smaller value of gamma will result in a smoother decision boundary, while
a larger value of gamma will result in a more complex decision boundary that tries to fit the training data more closely.
Increasing the value of gamma will make the model more complex and may lead to overfitting. Decreasing the value of gamma
will make the model simpler and may lead to underfitting.

In summary, the performance of SVR is highly dependent on the choice of kernel function, C parameter, epsilon parameter,
and gamma parameter. Choosing the right combination of these parameters can result in a highly accurate model, while choosing 
the wrong combination can result in poor performance or overfitting. Therefore, it is important to carefully tune these parameters
based on the characteristics of the data and the desired performance of the model.

In [None]:
5:
  

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("diabetes.csv")


In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv("diabetes.csv")

# Split the dataset into features and labels
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv('diabetes.csv')

# Split the dataset into features and labels
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features using the StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)


In [18]:
from sklearn.svm import SVC

# Create an instance of the SVC classifier
svm_clf = SVC(kernel='linear')

# Train the classifier on the training data
svm_clf.fit(X_train, y_train)


In [20]:
y_pred = svm_clf.predict(X_test)

In [23]:
from sklearn.metrics import precision_score, recall_score , accuracy_score

# Calculate the precision of the classifier on the testing data
precision = precision_score(y_test, y_pred)

# Calculate the recall of the classifier on the testing data
recall = recall_score(y_test, y_pred)

# Calculate the accuracy of the classifier on the testing data
accuracy = accuracy_score(y_test, y_pred)

print("Precision:", precision)
print("Recall:", recall)
print("Accuracy:", accuracy)

Precision: 0.6666666666666666
Recall: 0.6545454545454545
Accuracy: 0.7597402597402597


In [24]:
from sklearn.model_selection import GridSearchCV

# Define the hyperparameters to search over
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf', 'poly'],
    'gamma': ['scale', 'auto']
}

# Create an instance of the SVC classifier
svm_clf = SVC()

# Create an instance of GridSearchCV to search over the hyperparameters
grid_search = GridSearchCV(svm_clf, param_grid, cv=5)

# Train the classifier using GridSearchCV
grid_search.fit(X_train, y_train)

# Print the best hyperparameters and the corresponding accuracy
print("Best hyperparameters:", grid_search.best_params_)
print("Best accuracy:", grid_search.best_score_)


Best hyperparameters: {'C': 1, 'gamma': 'scale', 'kernel': 'rbf'}
Best accuracy: 0.7687458349993335


In [25]:
from sklearn.svm import SVC

# Create an instance of the SVC classifier with the tuned hyperparameters
svm_clf = SVC(C=1, kernel='rbf', gamma='scale')

# Train the classifier on the entire dataset
svm_clf.fit(X, y)


In [None]:
import pickle

# Save the trained classifier to a file
with open('diabetes_svm.pkl', 'wb') as f:
    pickle.dump(svm_clf, f)
