## Задание 6. Нейронная сеть (цифры)

Используя небольшой встроенный набор, загружаемый с помощью

`X, y = datasets.load_digits(return_X_y=True)`

продемонстрируйте:

1. обучение и работу многоклассовой классификации,
2. бинарной классификации на чётные и нечётные цифры,
3. бинарной классификации на '0' и остальные цифры (приведите примеры, когда классификатор ошибается),
4. *проиллюстрируйте советы по использованию многослойного персептрона. (https://scikit-learn.org/stable/modules/neural_networks_supervised.html#tips-on-practical-use)

In [1]:
import numpy as np
SEED = 42

def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

# Загружаем датасет

In [2]:
from sklearn import datasets
X_orig, y_orig = datasets.load_digits(return_X_y=True)

In [3]:
from sklearn.metrics import f1_score, accuracy_score

def nn_clf_score(y_true, y_pred):
    print(f"Accuracy: {accuracy_score(y_true, y_pred)}")
    print(f"F1 score: {f1_score(y_true, y_pred, average=None)}")

# Многоклассовая классификация

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_orig.copy(), y_orig.copy(), test_size=0.3, random_state=SEED)

In [5]:
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(random_state=SEED, max_iter=500)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
nn_clf_score(y_test, y_pred)


Accuracy: 0.9666666666666667
F1 score: [0.98113208 0.96969697 0.96907216 0.96153846 1.         0.94573643
 0.98113208 0.98181818 0.90909091 0.95867769]


# Бинарная классификация (чётность)

In [6]:
y = (y_orig % 2 == 0).astype(int)
y

array([1, 0, 1, ..., 1, 0, 1])

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_orig.copy(), y, test_size=0.3, random_state=SEED)

In [8]:
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(random_state=SEED, max_iter=500)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
nn_clf_score(y_test, y_pred)

Accuracy: 0.9814814814814815
F1 score: [0.98214286 0.98076923]


# Бинарная классификация (равенство 0)

In [9]:
y = (y_orig == 0).astype(int)
y

array([1, 0, 0, ..., 0, 0, 0])

In [10]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_orig.copy(), y, test_size=0.3, random_state=SEED)

In [11]:
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(random_state=SEED, max_iter=500)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
nn_clf_score(y_test, y_pred)

Accuracy: 0.9981481481481481
F1 score: [0.99897436 0.99047619]


# Пробуем советы

1. Multi-layer Perceptron is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0, 1] or [-1, +1], or standardize it to have mean 0 and variance 1. Note that you must apply the same scaling to the test set for meaningful results. You can use StandardScaler for standardization.

2. Finding $\alpha$ reasonable regularization parameter is best done using GridSearchCV, usually in the range 10.0 ** -np.arange(1, 7).

3. Empirically, we observed that L-BFGS converges faster and with better solutions on small datasets. For relatively large datasets, however, Adam is very robust. It usually converges quickly and gives pretty good performance. SGD with momentum or nesterov’s momentum, on the other hand, can perform better than those two algorithms if learning rate is correctly tuned.

## 1. Преобразование диапазона яркости из [0-16] в [0-1]

In [12]:
X_orig.min(), X_orig.max()

(0.0, 16.0)

In [13]:
X = X_orig.astype(float) / 16
X.min(), X.max()

(0.0, 1.0)

In [14]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y_orig.copy(), test_size=0.3, random_state=SEED)

In [15]:
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(random_state=SEED, max_iter=500)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
nn_clf_score(y_test, y_pred)

Accuracy: 0.9796296296296296
F1 score: [1.         0.98       1.         0.98113208 1.         0.95454545
 0.98113208 0.98181818 0.94382022 0.97435897]


Улучшения налицо (было 0.96)

## 3. Выбор метода обучения

In [16]:
for solver in ['lbfgs', 'sgd', 'adam']:
    clf = MLPClassifier(solver=solver, random_state=SEED, max_iter=500)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(f"### Solver: {solver}")
    nn_clf_score(y_test, y_pred)

### Solver: lbfgs
Accuracy: 0.9611111111111111
F1 score: [0.97142857 0.95049505 0.95833333 0.95412844 0.99159664 0.94656489
 0.98113208 0.97247706 0.90909091 0.96551724]
### Solver: sgd
Accuracy: 0.9518518518518518
F1 score: [0.98113208 0.89795918 1.         0.96226415 0.95081967 0.94656489
 0.98113208 0.98181818 0.8988764  0.91525424]
### Solver: adam
Accuracy: 0.9796296296296296
F1 score: [1.         0.98       1.         0.98113208 1.         0.95454545
 0.98113208 0.98181818 0.94382022 0.97435897]


## 2. Подбор $\alpha$ -- регулязиционного параметра

In [17]:
from sklearn.model_selection import GridSearchCV

params = {
    'solver': ['lbfgs'],
    'alpha': np.power(10.0, np.arange(-10.0, 7.0)),
    'random_state': [SEED],
    'max_iter': [500]
    }
mlp = MLPClassifier()
clf = GridSearchCV(mlp, params)
clf.fit(X_train, y_train)

print(f"Best params: {clf.best_params_}")
y_pred = clf.predict(X_test)
nn_clf_score(y_test, y_pred)

Best params: {'alpha': 0.1, 'max_iter': 500, 'random_state': 42, 'solver': 'lbfgs'}
Accuracy: 0.9814814814814815
F1 score: [1.         0.99009901 1.         0.98113208 1.         0.95454545
 0.98113208 0.99082569 0.95454545 0.96610169]
