## SVM 

Utilize the digits dataset from the sklearn library (import it using from sklearn.datasets import load_digits) to train a Support Vector Machine (SVM) classifier. Then, complete the following tasks:

1. Evaluate your model's accuracy using different kernel functions, specifically the RBF (Radial Basis Function) and linear kernels. +
2. Further, optimize your model by experimenting with the regularization and gamma parameters. Aim to achieve the highest possible accuracy score. +
3. Use 80% of the samples from the dataset for training your model +

In [1]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

In [4]:
#svm tasks implementation
# SVM -> Support Vector Machine
'''
    Formula: w * x + b, where w - weights, x - features, b - noise, bias, error (adaptivity)
    Гиперплоскость для разделения с помощью зазора между двумя классами
'''

digit_data = load_digits()

# Preprocessing

X = digit_data.data
y = digit_data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)


'''
    Единый масштаб, судя по всему SVM чувствительна ко всему этому
'''

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Different kernel function

svm_rbf = SVC(kernel='rbf') # nonlinear tasks
svm_linear = SVC(kernel='linear') # linear tasks
svm_rbf.fit(X_train_scaled, y_train)
svm_linear.fit(X_train_scaled, y_train)

# Prediction

y_pred_rbf = svm_rbf.predict(X_test_scaled)
y_pred_linear = svm_linear.predict(X_test_scaled)

# Accuracy score for

accuracy_rbf = accuracy_score(y_test, y_pred_rbf)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# so we see that accuracy with RBF kernel

print(f'RBF ядро: {accuracy_rbf * 100:.2f}%')
print(f'Linear ядро: {accuracy_linear * 100:.2f}%')

RBF ядро: 98.06%
Linear ядро: 95.83%


In [5]:
# Optimizing parameters
from sklearn.model_selection import GridSearchCV

svm = SVC()

param_grid = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100], # Regularization parameter
    'gamma': [1, 0.1, 0.01, 0.001], # kernel parameter
    'kernel': ['rbf', 'linear'] # kernel
}

grid_search = GridSearchCV(svm, param_grid, refit=True, verbose=2, cv=5)
# refit = True ~> переобучение с нужными параметрами, когда модель найдет их
# verbose = 2 ~> информация о каждом этапе кросс-валидации, включая параметры, которые тестируются.
# сv = 5 ~> 1 часть тестовый, 4 тренировочный и так 5 раз для каждой части

grid_search.fit(X_train_scaled, y_train)

best_params = grid_search.best_params_

y_pred_optimized = grid_search.predict(X_test_scaled)

accuracy_optimized = accuracy_score(y_test, y_pred_optimized)

best_params, accuracy_optimized


Fitting 5 folds for each of 40 candidates, totalling 200 fits
[CV] END .......................C=0.001, gamma=1, kernel=rbf; total time=   0.1s
[CV] END .......................C=0.001, gamma=1, kernel=rbf; total time=   0.1s
[CV] END .......................C=0.001, gamma=1, kernel=rbf; total time=   0.1s
[CV] END .......................C=0.001, gamma=1, kernel=rbf; total time=   0.1s
[CV] END .......................C=0.001, gamma=1, kernel=rbf; total time=   0.1s
[CV] END ....................C=0.001, gamma=1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.001, gamma=1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.001, gamma=1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.001, gamma=1, kernel=linear; total time=   0.0s
[CV] END ....................C=0.001, gamma=1, kernel=linear; total time=   0.0s
[CV] END .....................C=0.001, gamma=0.1, kernel=rbf; total time=   0.1s
[CV] END .....................C=0.001, gamma=0.

({'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}, 0.9777777777777777)

## Naive Bayes

1. Using the wine dataset available in sklearn.datasets +
2. Develop a classification model to categorize wines into three distinct classes.
3. Begin by importing the dataset and dividing it into separate training and testing subsets. +
4. Next, train two different Naive Bayes models: one using the Gaussian algorithm and another using the Multinomial algorithm. +
5. Compare the performance of these two models to determine which one yields better results. +
6. Finally, utilize the superior model to generate predictions for the test data you set aside earlier.

In [6]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.metrics import accuracy_score

In [20]:
wine_data = load_wine()

X = wine_data.data#,
y = wine_data.target#,

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

'''
    Bayes Theorem:
        P(C|X) = P(C) * P(X|C) / P(X)

'''

gnb = GaussianNB() # Непрерывные данные ~> (есть любой отрезок и на этом отрезке есть бесконечное количество значений), работает с цифрами
mnb = MultinomialNB() # Дискретные данные ~> Работает с категориями (Рандомные значения (обычно целые числа или категориальные данные), не имеющие промежутка)

gnb.fit(X_train, y_train)
mnb.fit(X_train, y_train)

y_pred_gnb = gnb.predict(X_test)
y_pred_mnb = mnb.predict(X_test)

accuracy_gnb = accuracy_score(y_test, y_pred_gnb)
accuracy_mnb = accuracy_score(y_test, y_pred_mnb)

print(f"Gaussian Naive Bayes: {accuracy_gnb * 100:.2f}%")
print(f"Multinomial Naive Bayes: {accuracy_mnb * 100:.2f}%")

Gaussian Naive Bayes: 100.00%
Multinomial Naive Bayes: 83.33%


In [22]:
X

array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]])