Статья, из которой заимствован метод оптимизации нейросети:
https://arxiv.org/pdf/1704.04289.pdf

In [5]:
import tensorflow as tf
import numpy as np
from ipywidgets import IntProgress
from tqdm import tqdm_notebook
import matplotlib.pyplot as plt
%matplotlib inline

Класс SimpleTwoLayerNN задает архитектуру полносвязной нейросети с одним скрытым слоем. Принимает на вход размерности входа, скрытого слоя и выхода. Поддерживает задачи классификации и регрессии.

Весь функционал модели реализован в классе NNFunctional

Класс NNFunctional принимает в качестве парамтеров инициализации нейросеть и параметры для ее обучения и оценки качества. В нем реализованы следующие методы:

* fit - оптимизация параметров нейросети с данными ограничениями
* prune(p) - зануление p*n_params весов с наименьшим абсолютным значением
* disable_optimization(p) - отключение оптимизации для p*n_params весов с наименьшим абсолютным значением в precondition-матрице
* reset_all_params - возврат нейросети в исходное состояние для последующего обучения "с нуля"

Проведем вычислительный эксперимент на датасете digits (классификация рукописных цифр, 64 признака, 10 классов).

В качестве функции потерь возьмем кроссэнтропию, метрику качества - accuracy.

In [6]:
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, random_state=42, test_size=0.3)
y_train = y_train[:, np.newaxis]
y_test = y_test[:, np.newaxis]
X_mean = X_train.mean(axis=0)
X_std = X_train.std(axis=0)
eps = 1e-8
X_train = (X_train - X_mean) / (X_std + eps)
X_test = (X_test - X_mean) / (X_std + eps)
X_train = X_train[:, X_std > 0]
X_test = X_test[:, X_std > 0]



In [7]:
from architectures import SimpleTwoLayerNN
from functional import NNFunctional

In [8]:
nn = SimpleTwoLayerNN(X_train.shape[1], 32, 10, mode='classification')
func = NNFunctional(model=nn, 
                    loss=tf.losses.sparse_softmax_cross_entropy, 
                    metric=lambda x, y:tf.metrics.accuracy(x, y)[1],
                    learning_rate=0.01,
                    k_coef=1,
                    batch_size=32)

### Prune

Для разных значений р проведем 1000 шагов полной оптимизации, затем занулим p*N весов и проведем еще 4000 шагов оптимизации остальных весов. Построим график качества модели в зависимости от р

In [7]:
prune_params = np.linspace(0, 1, 51)[:-1]
train_accs = []
val_accs = []
for p in tqdm_notebook(prune_params):
    train_acc = []
    val_acc = []
    for _ in range(5):
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=200,
                                             val_data=(X_test, y_test),
                                             verbose_freq=199,
                                             warm_start=False,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.prune(p)
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=3000,
                                             val_data=(X_test, y_test),
                                             verbose_freq=2999,
                                             warm_start=True,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.reset_all_params()
        train_acc.append(train_history[-1])
        val_acc.append(val_history[-1])
    train_accs.append(train_acc)
    val_accs.append(val_acc)

HBox(children=(IntProgress(value=0, max=50), HTML(value='')))




In [8]:
train_accs = np.array(train_accs)
val_accs = np.array(val_accs)

In [9]:
np.savetxt('./results/prune_train_acc.txt', train_accs)
np.savetxt('./results/prune_val_acc.txt', val_accs)

### Disable optimization

Для разных значений р проведем 1000 шагов полной оптимизации, затем отключим оптимизацию для p*N параметров и проведем еще 4000 шагов оптимизации остальных весов. Построим график качества модели в зависимости от р

In [10]:
disable_params = np.linspace(0, 1, 51)
train_accs_disable = []
val_accs_disable = []
for p in tqdm_notebook(disable_params):
    train_acc = []
    val_acc = []
    for _ in range(5):
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=200,
                                             val_data=(X_test, y_test),
                                             verbose_freq=199,
                                             warm_start=False,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.disable_optimization(p, mode='H')
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=3000,
                                             val_data=(X_test, y_test),
                                             verbose_freq=2999,
                                             warm_start=True,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.reset_all_params()
        train_acc.append(train_history[-1])
        val_acc.append(val_history[-1])
    train_accs_disable.append(train_acc)
    val_accs_disable.append(val_acc)

HBox(children=(IntProgress(value=0, max=51), HTML(value='')))




In [11]:
train_accs_disable = np.array(train_accs_disable)
val_accs_disable = np.array(val_accs_disable)
print(train_accs_disable.shape, val_accs_disable.shape)

(51, 5) (51, 5)


In [12]:
np.savetxt('./disable_train_acc.txt', train_accs_disable)
np.savetxt('./disable_val_acc.txt', val_accs_disable)

### Базовый алгоритм

### Disable optimization (minimal)

In [13]:
disable_params = np.linspace(0, 1, 51)
train_accs_disable_base = []
val_accs_disable_base = []
for p in tqdm_notebook(disable_params):
    train_acc = []
    val_acc = []
    for _ in range(5):
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=200,
                                             val_data=(X_test, y_test),
                                             verbose_freq=199,
                                             warm_start=False,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.disable_optimization(p, mode='minimal')
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=3000,
                                             val_data=(X_test, y_test),
                                             verbose_freq=2999,
                                             warm_start=True,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.reset_all_params()
        train_acc.append(train_history[-1])
        val_acc.append(val_history[-1])
    train_accs_disable_base.append(train_acc)
    val_accs_disable_base.append(val_acc)

HBox(children=(IntProgress(value=0, max=51), HTML(value='')))




In [14]:
train_accs_disable_base = np.array(train_accs_disable_base)
val_accs_disable_base = np.array(val_accs_disable_base)

In [15]:
np.savetxt('./results/disable_train_acc_base.txt', train_accs_disable_base)
np.savetxt('./results/disable_val_acc_base.txt', val_accs_disable_base)

## Случайный выбор параметров

### Disable optmization (random)

In [16]:
disable_params = np.linspace(0, 1, 51)
train_accs_disable_random = []
val_accs_disable_random = []
for p in tqdm_notebook(disable_params):
    train_acc = []
    val_acc = []
    for _ in range(5):
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=200,
                                             val_data=(X_test, y_test),
                                             verbose_freq=199,
                                             warm_start=False,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.disable_optimization(p, mode='random')
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=3000,
                                             val_data=(X_test, y_test),
                                             verbose_freq=2999,
                                             warm_start=True,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.reset_all_params()
        train_acc.append(train_history[-1])
        val_acc.append(val_history[-1])
    train_accs_disable_random.append(train_acc)
    val_accs_disable_random.append(val_acc)

HBox(children=(IntProgress(value=0, max=51), HTML(value='')))




In [17]:
train_accs_disable_random = np.array(train_accs_disable_random)
val_accs_disable_random = np.array(val_accs_disable_random)

In [18]:
np.savetxt('./results/disable_train_acc_random.txt', train_accs_disable_random)
np.savetxt('./results/disable_val_acc_random.txt', val_accs_disable_random)

### Prune (random)

In [19]:
prune_params = np.linspace(0, 1, 51)[:-1]
train_accs_prune_random = []
val_accs_prune_random = []
for p in tqdm_notebook(prune_params):
    train_acc = []
    val_acc = []
    for _ in range(5):
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=200,
                                             val_data=(X_test, y_test),
                                             verbose_freq=199,
                                             warm_start=False,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.prune(p, mode='random')
        train_history, val_history = func.fit(X_train,
                                             y_train,
                                             steps=3000,
                                             val_data=(X_test, y_test),
                                             verbose_freq=2999,
                                             warm_start=True,
                                             print_out=False,
                                             tqdm=False
                                            )
        func.reset_all_params()
        train_acc.append(train_history[-1])
        val_acc.append(val_history[-1])
    train_accs_prune_random.append(train_acc)
    val_accs_prune_random.append(val_acc)

HBox(children=(IntProgress(value=0, max=50), HTML(value='')))




In [20]:
train_accs_prune_random = np.array(train_accs_prune_random)
val_accs_prune_random = np.array(val_accs_prune_random)

In [21]:
np.savetxt('./results/prune_train_acc_random.txt', train_accs_prune_random)
np.savetxt('./results/prune_val_acc_random.txt', val_accs_prune_random)