# Практическое задание

## Данные о студенте

1. **ФИО**: Хайруллин Артур Миннахматович
2. **Факультет**: механико-математический
3. **Курс**: 6
4. **Группа**: 611

## Замечания

* Заполненный ноутбук необходимо сдать боту
* Соблюдаем кодекс чести (по нулям и списавшему, и давшему списать)
* Можно (и нужно!) применять для реализации только библиотеку **Numpy**
* Ничего, крому Numpy, нельзя использовать для реализации 
* **Keras** используется только для тестирования Вашей реализации
* Если какой-то из классов не проходит приведенные тесты, то соответствующее задание не оценивается
* Возможно использование дополнительных (приватных) тестов
 

## Реализация собственного нейросетевого пакета для запуска и обучения нейронных сетей

Задание состоит из трёх частей:
1. Реализация прямого вывода нейронной сети (5 баллов)
2. Реализация градиентов по входу и распространения градиента по сети (5 баллов)
3. Реализация градиентов по параметрам и метода обратного распространения ошибки с обновлением параметров сети (10 баллов)

Дополнительные баллы можно получить при реализации обучения сети со свёрточными слоями (10 баллов), с транспонированной свёрткой (10 баллов), дополнительного оптимизатора (5 баллов). 

###  1. Реализация вывода собственной нейронной сети

1.1 Внимательно ознакомьтесь с интерфейсом слоя. Любой слой должен содержать как минимум три метода:
- конструктор
- прямой вывод 
- обратный вывод, производные по входу и по параметрам

In [26]:
class Layer(object):
    def __init__(self):
        self.name = 'Layer'       
    def forward(self, input_data):
        pass
    def backward(self, input_data):
        return [self.grad_x(input_data), self.grad_param(input_data)]
    
    def grad_x(self, input_data):
        pass
    def grad_param(self, input_data):
        return []
    
    def update_param(self, grads, learning_rate):
        pass


1.2 Ниже предствален интерфейс класса  Network. Обратите внимание на реализацию метода predict, который последовательно обрабатывает входные данные слой за слоем.

In [111]:
import numpy as np
from sklearn.model_selection import train_test_split
from tqdm import tqdm

class Network(object):
    def __init__(self, layers, loss=None):
        self.name = 'Network'
        self.layers = layers
        self.loss = loss
    
    def forward(self, input_data):
        return self.predict(input_data)
    
    def grad_x(self, input_data, labels):
        b = input_data.shape[0]
        grad = []
        for i in range(b):
            current_input = input_data
            g = np.eye(self.layers[0].grad_x(current_input)[i,...].shape[1])
            for layer in self.layers:
                g = layer.grad_x(current_input)[i]@g
                current_input = layer.forward(current_input)
            g = self.loss.grad_x(current_input, labels)[i]@g
            grad.append(g)
        return np.array(grad)
    def grad_param(self, input_data, labels):
        b = input_data.shape[0]
        current_input = input_data.copy()
        params = []
        for layer in self.layers:
            for p in range(len(params)):
                for s in range(len(params[p])):
                    a = []
                    for i in range(b):
                        a.append(layer.grad_x(current_input)[i]@params[p][s][i])
                    params[p][s] = np.array(a)
            params.append(layer.grad_param(current_input))
            current_input = layer.forward(current_input)
        for p in range(len(params)):
            for s in range(len(params[p])):
                a = []
                for i in range(b):
                    a.append(self.loss.grad_x(current_input, labels)[i]@params[p][s][i])
                params[p][s] = np.array(a)
        return params
    def update(self, grad_list, learning_rate):
        #print(grad_list)
        for i in range(len(self.layers)):
            self.layers[i].update_param(grad_list[i], learning_rate)
    def predict(self, input_data):
        current_input = input_data
        for layer in self.layers:
            current_input = layer.forward(current_input)     
        return current_input
    
    def calculate_loss(self, input_data, labels):
        return self.loss.forward(self.predict(input_data), labels)
    
    def train_step(self, input_data, labels, learning_rate=0.001):
        grad_list = self.grad_param(input_data, labels)
        self.update(grad_list, learning_rate)
    
    
    def fit(self, trainX, trainY, validation_split=0.25, 
            batch_size=1, nb_epoch=1, learning_rate=0.01):
        
        train_x, val_x, train_y, val_y = train_test_split(trainX, trainY, 
                                                          test_size=validation_split,
                                                          random_state=42)
        for epoch in range(nb_epoch):
            #train one epoch
            for i in tqdm(range(int(len(train_x)/batch_size))):
                batch_x = train_x[i*batch_size: (i+1)*batch_size]
                batch_y = train_y[i*batch_size: (i+1)*batch_size]
                self.train_step(batch_x, batch_y, learning_rate)
            #validate
            val_accuracy = self.evaluate(val_x, val_y)
            print('%d epoch: val %.2f' %(epoch+1, val_accuracy))
            
    def evaluate(self, testX, testY):
        y_pred = np.argmax(self.predict(testX), axis=1)            
        y_true = np.argmax(testY, axis=1)
        val_accuracy = np.sum((y_pred == y_true))/(len(y_true))
        return val_accuracy

#### 1.1 Необходимо реализовать метод forward для вычисления следующих слоёв:

- DenseLayer
- ReLU
- Softmax
- FlattenLayer
- MaxPooling

In [112]:
#импорты
import numpy as np

In [113]:
class DenseLayer(Layer):
    def __init__(self, input_dim, output_dim, W_init=None, b_init=None, random_state=42):
        self.name = 'Dense'
        self.input_dim = input_dim
        self.output_dim = output_dim
        if W_init is None or b_init is None:
            self.W = np.random.RandomState(random_state).random((input_dim, output_dim))
            self.b = np.zeros(output_dim, 'float32')
        else:
            self.W = W_init
            self.b = b_init
    def forward(self, input_data):
        out = input_data@self.W + self.b
        return out
    def grad_x(self, input_data):
        grad = []
        b = input_data.shape[0]
        for i in range(b):
            grad.append(self.W.T.copy())
        return np.array(grad)
    def grad_b(self, input_data):
        grad = []
        b = input_data.shape[0]
        for i in range(b):
            grad.append(np.eye(self.output_dim))
        return np.array(grad)
    def grad_W(self, input_data):
        grad = []
        b = input_data.shape[0]
        n = self.input_dim
        w = self.output_dim
        for l in range(b):
            a = np.zeros((w, w*n))
            for k in range(w):
                for i in range(n):
                    for j in range(w):
                        if k == j:
                            a[k][i*w + j] = input_data[l][i]
            grad.append(a)
        return np.array(grad)
    
    def update_W(self, grad, learning_rate):
        self.W -= learning_rate * np.mean(grad, axis=0).reshape(self.W.shape)
    
    def update_b(self, grad,  learning_rate):
        self.b -= learning_rate * np.mean(grad, axis=0)
        
    def update_param(self, params_grad, learning_rate):
        self.update_W(params_grad[0], learning_rate)
        self.update_b(params_grad[1], learning_rate)
    
    def grad_param(self, input_data):
        return [self.grad_W(input_data), self.grad_b(input_data)]
    
class ReLU(Layer):
    def __init__(self):
        self.name = 'ReLU'
    def forward(self, input_data):
        out = input_data.copy()
        b = out.shape[0]
        if out.ndim == 2:
            n = out.shape[1]
            for i in range(b):
                for j in range(n):
                    out[i][j] = max(0, out[i][j])
        if out.ndim == 4:
            c = out.shape[1]
            h = out.shape[2]
            w = out.shape[3]
            for i in range(b):
                for j in range(c):
                    for k in range(h):
                        for l in range(w):
                            out[i][j][k][l] = max(0, out[i][j][k][l])
        return out
    def grad_x(self, input_data):
        b = input_data.shape[0]
        if input_data.ndim == 2:
            n = input_data.shape[1]
            a = np.zeros((b, n, n))
            for k in range(b):
                for i in range(n):
                    if input_data[k][i] > 0:
                        a[k][i][i] = 1
                    if np.abs(input_data[k][i]) < 1e-12:
                        a[k][i][i] = 0.5
        if input_data.ndim == 4:
            a = np.zeros((b, input_data[0].size, input_data[0].size))
            for k in range(b):
                for l in range(input_data.shape[1]):
                    for i in range(input_data.shape[2]):
                        for j in range(input_data.shape[3]):
                            if input_data[k][l][i][j] > 0:
                                a[k][l*input_data.shape[2]*input_data.shape[3] + i*input_data.shape[3] + j]\
                                [l*input_data.shape[2]*input_data.shape[3] + i*input_data.shape[3] + j] = 1
                            if np.abs(input_data[k][l][i][j]) < 1e-12:
                                a[k][l*input_data.shape[2]*input_data.shape[3] + i*input_data.shape[3] + j]\
                                [l*input_data.shape[2]*input_data.shape[3] + i*input_data.shape[3] + j] = 0.5
        return a
    
class Softmax(Layer):
    def __init__(self):
        self.name = 'Softmax'
    def forward(self, input_data):
        out = []
        b = input_data.shape[0]
        for i in range(b):
            p = np.sum(np.exp(input_data[i,...]-np.max(input_data[i,...])))
            res = input_data[i,...].copy()
            out.append(np.exp(res-np.max(input_data[i,...]))/p)
        return np.array(out)
    def grad_x(self, input_data):
        b = input_data.shape[0]
        n = input_data.shape[1]
        grad = []
        f = self.forward(input_data)
        for k in range(b):
            a = np.zeros((n,n))
            for i in range(n):
                for j in range(n):
                    if i==j:
                        a[i][j] = f[k][i]*(1-f[k][i])
                    else:
                        a[i][j] = -f[k][i]*f[k][j]
            grad.append(a)
        return np.array(grad)


class FlattenLayer(Layer):
    def __init__(self):
        self.name = 'Flatten'
        
    def forward(self, input_data):
        return input_data.reshape(len(input_data), -1)
    def grad_x(self, input_data):
        grad = []
        b = input_data.shape[0]
        a = input_data[0,...]
        s = a.size
        for i in range(b):
            grad.append(np.eye(s))
        return np.array(grad)


class MaxPooling(Layer):
    def __init__(self, pool_size=2, stride=2):
        self.name = 'MaxPooling'
        self.pool_size = pool_size
        self.stride = stride
    def forward(self, input_data):
        b = input_data.shape[0]
        c = input_data.shape[1]
        h = int((input_data.shape[2] - self.pool_size)/self.stride) + 1
        w = int((input_data.shape[3] - self.pool_size)/self.stride) + 1
        out = np.zeros((b, c, h, w))
        for i in range(b):
            for j in range(c):
                for k in range(h):
                    for l in range(w):
                        out[i][j][k][l] = np.max(input_data[i, j, k*self.stride:(k*self.stride + self.pool_size),\
                                                            l*self.stride:(l*self.stride + self.pool_size)])
        return out
    def grad_x(self, input_data):
        b = input_data.shape[0]
        c = input_data.shape[1]
        h = int((input_data.shape[2] - self.pool_size)/self.stride) + 1
        w = int((input_data.shape[3] - self.pool_size)/self.stride) + 1
        out = np.zeros((b, c*h*w, c*input_data.shape[2]*input_data.shape[3]))
        for k in range(b):
            for l in range(c):
                for i in range(h):
                    for j in range(w):
                        ind = np.argmax(input_data[k, l, i*self.stride:(i*self.stride + self.pool_size),\
                                                j*self.stride:(j*self.stride + self.pool_size)])
                        ik = int(ind/self.pool_size)
                        jk = ind - ik*self.pool_size
                        out[k][l*h*w + i*w + j][l*input_data.shape[2]*input_data.shape[3] + \
                                                (i*self.stride + ik)*input_data.shape[3] + j*self.stride + jk] = 1
        return out

#### 1.2 Реализуйте теперь свёрточный слой и транспонированную свёртку  (опционально)

In [114]:
class Conv2DLayer(Layer):
    def __init__(self, kernel_size=3, input_channels=2, output_channels=3, 
                 padding='same', stride=1, K_init=None, b_init=None, random_state=42):
        # padding: 'same' или 'valid'
        # Работаем с квадратными ядрами, поэтому kernel_size - одно число
        # Работаем с единообразным сдвигом, поэтому stride - одно число
        # Фильтр размерности [kernel_size, kernel_size, input_channels, output_channels]
        self.name = 'Conv2D'
        self.kernel_size = kernel_size
        self.input_channels = input_channels
        self.output_channels = output_channels
        if K_init is None or b_init is None:
            self.kernel = np.random.RandomState(random_state).random((output_channels, input_channels, \
                                                                      kernel_size, kernel_size))
            self.bias = np.zeros(output_channels, 'float32')
        else:
            self.kernel = K_init
            self.bias = b_init
        self.padding = padding
        self.stride = stride
    def forward(self, input_data):
        # На входе - четырехмерный тензор вида [batch, input_channels, height, width]
        # Вначале нужно проверить на согласование размерностей входных данных и ядра!
        # Нужно заполнить Numpy-тензор out
        if input_data.shape[1]!=self.input_channels:
            raise ValueError('mismatched channels')
        if self.padding == 'same':
            shape = list(input_data.shape)
            shape[2]+=2*int(self.kernel_size/2)
            shape[3]+=2*int(self.kernel_size/2)
            a = np.zeros(shape)
            for k in range(shape[0]):
                for l in range(shape[1]):
                    for i in range(input_data.shape[2]):
                        for j in range(input_data.shape[3]):
                            a[k][l][i+int(self.kernel_size/2)]\
                            [j+int(self.kernel_size/2)] = input_data[k][l][i][j]
            inputd = a.transpose((0, 2, 3, 1))
        else:
            inputd = input_data.transpose((0, 2, 3, 1))
        out = self.conv(inputd, self.kernel.transpose((2, 3, 1, 0)), self.bias, self.stride)
        return out.transpose((0, 3, 1, 2))
    def conv(self, tens, kernel, bias, s):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        batch = tens.shape[0]
        cout = kernel.shape[3]
        cin = tens.shape[3]
        p = list(range(batch))
        for b in range(batch):
            p[b] = []
            for j in range(cout):
                res = np.zeros(oh*ow)
                for i in range(cin):
                    kerm = self.initm(tens, kernel, s, i, j)
                    tv = self.initv(tens, b, i)
                    r = kerm @ tv
                    res = res + r
                res = res + self.initb(tens, kernel, bias, s, j)
                res = np.reshape(res,(oh,-1))
                p[b].append(res)
        return np.transpose(np.array(p), (0, 2, 3, 1))
    def initm(self, tens, kernel, s, i, j):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        kerm = np.zeros((oh*ow, h*w))
        for k in range(oh):
            lm = k*s*w
            for l in range(ow):
                for n in range(kw):
                    for p in range(kw):
                        kerm[k*ow + l][lm + l*s + n*w + p] = kernel[n][p][i][j]
        return kerm
    def initv(self, tens, i, j):
        h = tens.shape[1]
        w = tens.shape[2]
        tv = np.zeros(h*w)
        for k in range(h):
            for l in range(w):
                tv[k*w + l] = tens[i][k][l][j]
        return tv
    def initb(self, tens, kernel, bias, s, j):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        bv = np.ones(oh*ow)
        bv = bias[j]*bv
        return bv        
    def grad_x(self, input_data):
        grad = []
        b = input_data.shape[0]
        if self.padding == 'same':
            shape = list(input_data.shape)
            shape[2]+=2*int(self.kernel_size/2)
            shape[3]+=2*int(self.kernel_size/2)
            a1 = np.zeros(shape)
            for k in range(shape[0]):
                for l in range(shape[1]):
                    for i in range(input_data.shape[2]):
                        for j in range(input_data.shape[3]):
                            a1[k][l][i+int(self.kernel_size/2)]\
                            [j+int(self.kernel_size/2)] = input_data[k][l][i][j]
            inputd = a1.transpose((0, 2, 3, 1))
            kw = self.kernel_size
            h = input_data.shape[2]
            w = input_data.shape[3]
            a = np.zeros((h*w, (h+kw-1)*(w+kw-1)))
            for i in range(h):
                for j in range(w):
                    a[i*w+j][w+kw-1+int(kw/2)+i*(w+kw-1)+j] = 1
            fres = []
            for j in range(self.output_channels):
                res = []
                for i in range(self.input_channels):
                    res.append(self.initm(inputd, self.kernel.transpose((2, 3, 1, 0)), self.stride, i, j)@a.T)
                res = np.array(res).transpose((1,0,2))
                res = res.reshape(res.shape[0],-1)
                fres.append(res)
            fres = np.array(fres)
            fres = np.array(fres.reshape(fres.shape[0]*fres.shape[1],-1))
            for i in range(b):
                grad.append(fres.copy())
            return np.array(grad)
        else:
            inputd = input_data.transpose((0, 2, 3, 1))
            fres = []
            for j in range(self.output_channels):
                res = []
                for i in range(self.input_channels):
                    res.append(self.initm(inputd, self.kernel.transpose((2, 3, 1, 0)), self.stride, i, j))
                res = np.array(res).transpose((1,0,2))
                res = res.reshape(res.shape[0],-1)
                fres.append(res)
            fres = np.array(fres)
            fres = np.array(fres.reshape(fres.shape[0]*fres.shape[1],-1))
            for i in range(b):
                grad.append(fres.copy())
            return np.array(grad)
    def grad_kernel(self, input_data):
        grad = []
        kw = self.kernel_size
        b = input_data.shape[0]
        h = input_data.shape[2]
        oh = int((h-kw)/self.stride) + 1
        w = input_data.shape[3]
        ow = int((w-kw)/self.stride) + 1
        if self.padding == 'same':
            shape = list(input_data.shape)
            shape[2]+=2*int(self.kernel_size/2)
            shape[3]+=2*int(self.kernel_size/2)
            inputd = np.zeros(shape)
            for k in range(shape[0]):
                for l in range(shape[1]):
                    for i in range(input_data.shape[2]):
                        for j in range(input_data.shape[3]):
                            inputd[k][l][i+int(self.kernel_size/2)]\
                            [j+int(self.kernel_size/2)] = input_data[k][l][i][j]
            for c in range(b):
                a = np.zeros((self.output_channels*h*w, kw*kw*self.output_channels*self.input_channels))
                for i in range(self.output_channels):
                    for j in range(h):
                        for k in range(w):
                            for l in range(self.input_channels):
                                for q in range(kw):
                                    for p in range(kw):
                                        a[i*h*w + j*w + k][i*kw*kw*self.input_channels + l*kw*kw + q*kw + p] = \
                                        inputd[c][l][j+q][k+p]
                grad.append(a.copy())
        else:
            for c in range(b):
                a = np.zeros((self.output_channels*oh*ow, kw*kw*self.output_channels*self.input_channels))
                for i in range(self.output_channels):
                    for j in range(oh):
                        for k in range(ow):
                            for l in range(self.input_channels):
                                for q in range(kw):
                                    for p in range(kw):
                                        a[i*oh*ow + j*ow + k][i*kw*kw*self.input_channels + l*kw*kw + q*kw + p] = \
                                        input_data[c][l][j*self.stride+q][k*self.stride+p]
                grad.append(a.copy())
        return np.array(grad)
    def grad_bias(self, input_data):
        grad = []
        kw = self.kernel_size
        b = input_data.shape[0]
        h = input_data.shape[2]
        oh = int((h-kw)/self.stride) + 1
        w = input_data.shape[3]
        ow = int((w-kw)/self.stride) + 1
        if self.padding == 'same':
            a = np.zeros((self.output_channels*h*w, self.output_channels))
            for i in range(self.output_channels):
                for j in range(h*w):
                    a[i*h*w + j][i] = 1
        else:
            a = np.zeros((self.output_channels*oh*ow, self.output_channels))
            for i in range(self.output_channels):
                for j in range(oh*ow):
                    a[i*oh*ow + j][i] = 1
        for i in range(b):
            grad.append(a.copy())
        return np.array(grad)
    def update_kernel(self, grad, learning_rate):
        self.kernel -= learning_rate * np.mean(grad, axis=0).reshape(self.kernel.shape)
    
    def update_bias(self, grad,  learning_rate):
        self.bias -= learning_rate * np.mean(grad, axis=0)
        
    def update_param(self, params_grad, learning_rate):
        self.update_kernel(params_grad[0], learning_rate)
        self.update_bias(params_grad[1], learning_rate)
    
    def grad_param(self, input_data):
        return [self.grad_kernel(input_data), self.grad_bias(input_data)]

In [115]:
class Conv2DTrLayer(Layer):
    def __init__(self, kernel_size=3, input_channels=2, output_channels=3, 
                 padding=0, stride=1, K_init=None, b_init=None, random_state=42):      
        # padding: число (сколько отрезать от модифицированной входной карты)
        # Работаем с квадратными ядрами, поэтому kernel_size - одно число
        # stride - одно число (коэффициент расширения)
        # Фильтр размерности [kernel_size, kernel_size, input_channels, output_channels]
        self.name = 'Conv2DTr'
        self.kernel_size = kernel_size
        self.input_channels = input_channels
        self.output_channels = output_channels
        if K_init is None or b_init is None:
            self.kernel = np.random.RandomState(random_state).random((kernel_size, kernel_size,\
                                                                      input_channels, output_channels))
            self.bias = np.zeros(output_channels, 'float32')
        else:
            self.kernel = K_init
            self.bias = b_init
        self.padding = padding
        self.stride = stride
    def forward(self, input_data):
        # На входе - четырехмерный тензор вида [batch, input_channels, height, width]
        # Вначале нужно проверить на согласование размерностей входных данных и ядра!
        # Нужно заполнить Numpy-тензор out 
        if input_data.shape[1]!=self.input_channels:
            raise ValueError('mismatched channels')
        shape = list(input_data.shape)
        shape[2] = (shape[2]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        shape[3] = (shape[3]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        a = np.zeros(shape)
        for k in range(shape[0]):
            for l in range(shape[1]):
                for i in range(input_data.shape[2]):
                    for j in range(input_data.shape[3]):
                        a[k][l][i*self.stride + self.kernel_size-1-self.padding]\
                        [j*self.stride + self.kernel_size-1-self.padding] = input_data[k][l][i][j]
        inputd = a.transpose((0, 2, 3, 1))
        out = self.conv(inputd, self.kernel, self.bias, 1)
        return out.transpose((0, 3, 1, 2))
    def conv(self, tens, kernel, bias, s):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        batch = tens.shape[0]
        cout = kernel.shape[3]
        cin = tens.shape[3]
        p = list(range(batch))
        for b in range(batch):
            p[b] = []
            for j in range(cout):
                res = np.zeros(oh*ow)
                for i in range(cin):
                    kerm = self.initm(tens, kernel, s, i, j)
                    tv = self.initv(tens, b, i)
                    r = kerm @ tv
                    res = res + r
                res = res + self.initb(tens, kernel, bias, s, j)
                res = np.reshape(res,(oh,-1))
                p[b].append(res)
        return np.transpose(np.array(p), (0, 2, 3, 1))
    def initm(self, tens, kernel, s, i, j):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        kerm = np.zeros((oh*ow, h*w))
        for k in range(oh):
            lm = k*s*w
            for l in range(ow):
                for n in range(kw):
                    for p in range(kw):
                        kerm[k*ow + l][lm + l*s + n*w + p] = kernel[n][p][i][j]
        return kerm
    def initv(self, tens, i, j):
        h = tens.shape[1]
        w = tens.shape[2]
        tv = np.zeros(h*w)
        for k in range(h):
            for l in range(w):
                tv[k*w + l] = tens[i][k][l][j]
        return tv
    def initb(self, tens, kernel, bias, s, j):
        kw = kernel.shape[1]
        h = tens.shape[1]
        oh = int((h-kw)/s) + 1
        w = tens.shape[2]
        ow = int((w-kw)/s) + 1
        bv = np.ones(oh*ow)
        bv = bias[j]*bv
        return bv 
    def grad_x(self, input_data):
        grad = []
        b = input_data.shape[0]
        shape = list(input_data.shape)
        shape[2] = (shape[2]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        shape[3] = (shape[3]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        a = np.zeros(shape)
        for k in range(shape[0]):
            for l in range(shape[1]):
                for i in range(input_data.shape[2]):
                    for j in range(input_data.shape[3]):
                        a[k][l][i*self.stride + self.kernel_size-1-self.padding]\
                        [j*self.stride + self.kernel_size-1-self.padding] = input_data[k][l][i][j]
        inputd = a.transpose((0, 2, 3, 1))
        kw = self.kernel_size
        h1 = shape[2]
        w1 = shape[3]
        h = input_data.shape[2]
        w = input_data.shape[3]
        a1 = np.zeros((h*w, h1*w1))
        for i in range(h):
            for j in range(w):
                a1[i*w+j][((w-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1)*(self.kernel_size-1-self.padding) + \
                          (self.kernel_size-1-self.padding) + j*self.stride + \
                          i*self.stride*((w-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1)] = 1
        fres = []
        for j in range(self.output_channels):
            res = []
            for i in range(self.input_channels):
                res.append(self.initm(inputd, self.kernel, 1, i, j)@a1.T)
            res = np.array(res).transpose((1,0,2))
            res = res.reshape(res.shape[0],-1)
            fres.append(res)
        fres = np.array(fres)
        fres = np.array(fres.reshape(fres.shape[0]*fres.shape[1],-1))
        for i in range(b):
            grad.append(fres.copy())
        return np.array(grad)
    def grad_kernel(self, input_data):
        grad = []
        kw = self.kernel_size
        b = input_data.shape[0]
        shape = list(input_data.shape)
        shape[2] = (shape[2]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        shape[3] = (shape[3]-1)*self.stride + 2*(self.kernel_size-1-self.padding) + 1
        inputd = np.zeros(shape)
        h = (input_data.shape[2]-1)*self.stride - 2*self.padding + kw
        w = (input_data.shape[3]-1)*self.stride - 2*self.padding + kw
        for k in range(shape[0]):
            for l in range(shape[1]):
                for i in range(input_data.shape[2]):
                    for j in range(input_data.shape[3]):
                        inputd[k][l][i*self.stride + self.kernel_size-1-self.padding]\
                        [j*self.stride + self.kernel_size-1-self.padding] = input_data[k][l][i][j]
        for c in range(b):
            a = np.zeros((self.output_channels*h*w, kw*kw*self.output_channels*self.input_channels))
            for i in range(self.output_channels):
                for j in range(h):
                    for k in range(w):
                        for l in range(self.input_channels):
                            for q in range(kw):
                                for p in range(kw):
                                    a[i*h*w + j*w + k][i*kw*kw*self.input_channels + l*kw*kw + q*kw + p] = \
                                    inputd[c][l][j+q][k+p]
            grad.append(a.copy())
        return np.array(grad)
    def grad_bias(self, input_data):
        grad = []
        kw = self.kernel_size
        b = input_data.shape[0]
        h = (input_data.shape[2]-1)*self.stride - 2*self.padding + kw
        w = (input_data.shape[3]-1)*self.stride - 2*self.padding + kw
        a = np.zeros((self.output_channels*h*w, self.output_channels))
        for i in range(self.output_channels):
            for j in range(h*w):
                a[i*h*w + j][i] = 1
        for i in range(b):
            grad.append(a.copy())
        return np.array(grad)
    def update_kernel(self, grad, learning_rate):
        self.kernel -= learning_rate * np.mean(grad, axis=0).reshape(self.kernel.shape)
    
    def update_bias(self, grad,  learning_rate):
        self.bias -= learning_rate * np.mean(grad, axis=0)
        
    def update_param(self, params_grad, learning_rate):
        self.update_kernel(params_grad[0], learning_rate)
        self.update_bias(params_grad[1], learning_rate)
    
    def grad_param(self, input_data):
        return [self.grad_kernel(input_data), self.grad_bias(input_data)]

In [116]:
class Conv2DLayer1(Layer):
    def __init__(self, kernel_size=3, input_channels=2, output_channels=3, 
                 padding='valid', stride=1, K_init=None, b_init=None, random_state=42):
        # padding: 'same' или 'valid'
        # Работаем с квадратными ядрами, поэтому kernel_size - одно число
        # Работаем с единообразным сдвигом, поэтому stride - одно число
        # Фильтр размерности [kernel_size, kernel_size, input_channels, output_channels]
        self.name = 'Conv2D'
        self.kernel_size = kernel_size
        self.input_channels = input_channels
        self.output_channels = output_channels
        if K_init is None or b_init is None:
            self.kernel = np.random.RandomState(random_state).random((output_channels, input_channels, \
                                                                      kernel_size, kernel_size))
            self.bias = np.zeros(output_channels, 'float32')
        else:
            self.kernel = K_init
            self.bias = b_init
        self.padding = padding
        self.stride = stride
    def forward(self, input_data):
        # На входе - четырехмерный тензор вида [batch, input_channels, height, width]
        # Вначале нужно проверить на согласование размерностей входных данных и ядра!
        # Нужно заполнить Numpy-тензор out
        if input_data.shape[1]!=self.input_channels:
            raise ValueError('mismatched channels')
        oshape = (input_data.shape[0], self.output_channels, int((input_data.shape[2] - self.kernel_size)/self.stride)+1, \
                  int((input_data.shape[3] - self.kernel_size)/self.stride)+1) 
        out = np.zeros(oshape)
        for k in range(oshape[0]):
            for l in range(oshape[1]):
                for i in range(oshape[2]):
                    for j in range(oshape[3]):
                        for lk in range(self.input_channels):
                            for ik in range(self.kernel_size):
                                for jk in range(self.kernel_size):
                                    out[k][l][i][j]+=input_data[k][lk][i*self.stride+ik][j*self.stride+jk]*\
                                    self.kernel[l][lk][ik][jk]
                        out[k][l][i][j]+=self.bias[l]
        return out
            
    def grad_x(self, input_data):
        oshape = (input_data.shape[0], self.output_channels, int((input_data.shape[2] - self.kernel_size)/self.stride)+1, \
                  int((input_data.shape[3] - self.kernel_size)/self.stride)+1) 
        out = np.zeros((oshape[0], oshape[1]*oshape[2]*oshape[3], input_data[0].size))
        for k in range(oshape[0]):
            for l in range(oshape[1]):
                for i in range(oshape[2]):
                    for j in range(oshape[3]):
                        for lk in range(self.input_channels):
                            for ik in range(self.kernel_size):
                                for jk in range(self.kernel_size):
                                    out[k][l*oshape[2]*oshape[3] + i*oshape[3] + j]\
                                    [lk*input_data[0][0].size + (i*self.stride+ik)*input_data[0][0][0].size + \
                                     (j*self.stride+jk)] = self.kernel[l][lk][ik][jk]
        return out
    def grad_kernel(self, input_data):
        oshape = (input_data.shape[0], self.output_channels, int((input_data.shape[2] - self.kernel_size)/self.stride)+1, \
                  int((input_data.shape[3] - self.kernel_size)/self.stride)+1) 
        out = np.zeros((oshape[0], oshape[1]*oshape[2]*oshape[3], self.kernel.size))
        for k in range(oshape[0]):
            for l in range(oshape[1]):
                for i in range(oshape[2]):
                    for j in range(oshape[3]):
                        for lk in range(self.input_channels):
                            for ik in range(self.kernel_size):
                                for jk in range(self.kernel_size):
                                    out[k][l*oshape[2]*oshape[3] + i*oshape[3] + j]\
                                    [l*self.kernel[0].size + lk*self.kernel[0][0].size +\
                                     ik*self.kernel_size + jk] = input_data[k][lk][i*self.stride + ik][j*self.stride + jk]
        return out
    def grad_bias(self, input_data):
        grad = []
        kw = self.kernel_size
        b = input_data.shape[0]
        h = input_data.shape[2]
        oh = int((h-kw)/self.stride) + 1
        w = input_data.shape[3]
        ow = int((w-kw)/self.stride) + 1
        if self.padding == 'same':
            a = np.zeros((self.output_channels*h*w, self.output_channels))
            for i in range(self.output_channels):
                for j in range(h*w):
                    a[i*h*w + j][i] = 1
        else:
            a = np.zeros((self.output_channels*oh*ow, self.output_channels))
            for i in range(self.output_channels):
                for j in range(oh*ow):
                    a[i*oh*ow + j][i] = 1
        for i in range(b):
            grad.append(a.copy())
        return np.array(grad)
    def update_kernel(self, grad, learning_rate):
        self.kernel -= learning_rate * np.mean(grad, axis=0).reshape(self.kernel.shape)
    
    def update_bias(self, grad,  learning_rate):
        self.bias -= learning_rate * np.mean(grad, axis=0)
        
    def update_param(self, params_grad, learning_rate):
        self.update_kernel(params_grad[0], learning_rate)
        self.update_bias(params_grad[1], learning_rate)
    
    def grad_param(self, input_data):
        return [self.grad_kernel(input_data), self.grad_bias(input_data)]

#### 1.4 Теперь настало время теста. 
#### Если вы всё сделали правильно, то запустив следующие ячейки у вас должна появиться надпись: Test PASSED

Переходить к дальнейшим заданиям не имеем никакого смысла, пока вы не добьётесь прохождение теста
    

#### Чтение данных

In [33]:
import numpy as np
np.random.seed(123)  # for reproducibility
from keras.utils import np_utils
from keras.datasets import mnist
 
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
 

Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape)

(60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)


#### Подготовка моделей

In [34]:
import keras
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Activation, Flatten, Input
from keras.layers import Convolution2D, Conv2D, MaxPooling2D, Conv2DTranspose

print(keras.__version__)

def get_keras_model():
    input_image = Input(shape=(28, 28, 1))
    pool1 = MaxPooling2D(pool_size=(2,2))(input_image)
    flatten = Flatten()(pool1)
    dense1 = Dense(10, activation='softmax')(flatten)
    model = Model(inputs=input_image, outputs=dense1)

    from keras.optimizers import Adam, SGD
    sgd = SGD(lr=0.01, momentum=0.9, nesterov=True)
    model.compile(loss='categorical_crossentropy',
                  optimizer=sgd,
                  metrics=['accuracy'])

    history = model.fit(X_train.transpose((0,2,3,1)), Y_train, validation_split=0.25, 
                        batch_size=32, epochs=2, verbose=1)
    return model

2.11.0


In [35]:
def get_our_model(keras_model):
    maxpool = MaxPooling()
    flatten = FlattenLayer()
    dense = DenseLayer(196, 10, W_init=keras_model.get_weights()[0],
                       b_init=keras_model.get_weights()[1])
    softmax = Softmax()
    net = Network([maxpool, flatten, dense, softmax])
    return net

In [36]:
keras_model = get_keras_model()
print(keras_model.get_weights()[0].shape)
print(keras_model.get_weights()[1].shape)
our_model = get_our_model(keras_model)

Epoch 1/2
Epoch 2/2
(196, 10)
(10,)


In [37]:
keras_prediction = keras_model.predict(X_test.transpose((0,2,3,1)))
our_model_prediction = our_model.predict(X_test)



In [38]:
if np.sum(np.abs(keras_prediction - our_model_prediction)) < 0.01:
    print('Test PASSED')
else:
    print('Something went wrong!')

Test PASSED


### 2. Вычисление производных по входу для слоёв нейронной сети

В данном задании запрещено использовать численные формулы для вычисления производных.

#### 2.1  Реализуйте метод forward для класса CrossEntropy
Напоминание: $$ crossentropy = L(p, y) =  - \sum\limits_i y_i log p_i, $$
где вектор $(p_1, ..., p_k) $ -  выход классификационного алгоритма, а $(y_1,..., y_k)$ - правильные метки класса в унарной кодировке (one-hot encoding)

In [81]:
class CrossEntropy(object):
    def __init__(self, eps=0.00001):
        self.name = 'CrossEntropy'
        self.eps = eps
    
    def forward(self, input_data, labels):
        out = []
        b = input_data.shape[0]
        for i in range(b):
            out.append(-np.sum(labels[i,...]*np.log((input_data[i,...]+1e-12))))
        return np.array(out)
    
    def calculate_loss(self,input_data, labels):
        return self.forward(input_data, labels)
    
    def grad_x(self, input_data, lables):
        grad = []
        b = input_data.shape[0]
        for i in range(b):
            grad.append(-lables[i,...]/(input_data[i,...]+1e-12))
        return np.array(grad)

#### 2.2  Реализуйте метод grad_x класса CrossEntropy, который возвращает $\frac{\partial L}{\partial p}$

Проверить работоспособность кода поможет следующий тест:

In [70]:
def numerical_diff_net(net, x, labels):
    eps = 0.00001
    right_answer = []
    for i in range(len(x[0])):
        delta = np.zeros(len(x[0]))
        delta[i] = eps
        diff = (net.calculate_loss(x + delta, labels) - net.calculate_loss(x-delta, labels)) / (2*eps)
        right_answer.append(diff)
    return np.array(right_answer).T

def test_net(net):
    x = np.array([[1, 2, 3], [2, 3, 4]])
    labels = np.array([[0.3, 0.2, 0.5], [0.3, 0.2, 0.5]])
    num_grad = numerical_diff_net(net, x, labels)
    grad = net.grad_x(x, labels)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)
        
loss = CrossEntropy()
test_net(loss)

(2, 3)
(2, 3)
Test PASSED


#### 2.3  Реализуйте метод grad_x класса Softmax, который возвращает $\frac{\partial Softmax}{\partial x}$

Проверить работоспособность кода поможет следующий тест:

In [41]:
def numerical_diff_layer(layer, x):
    eps = 0.00001
    right_answer = []
    for i in range(len(x[0])):
        delta = np.zeros(len(x[0]))
        delta[i] = eps
        diff = (layer.forward(x + delta) - layer.forward(x-delta)) / (2*eps)
        right_answer.append(diff.T)
    return np.array(right_answer).T

def test_layer(layer):
    x = np.array([[1, 2, 3], [2, -3, 4]])
    num_grad = numerical_diff_layer(layer, x)
    grad = layer.grad_x(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)
        
layer = Softmax()
test_layer(layer)

(2, 3, 3)
(2, 3, 3)
Test PASSED


#### 2.4  Реализуйте метод grad_x для классов ReLU и DenseLayer

In [42]:
layer = ReLU()
test_layer(layer)

(2, 3, 3)
(2, 3, 3)
Test PASSED


In [43]:
layer = DenseLayer(3,4)
test_layer(layer)

(2, 4, 3)
(2, 4, 3)
Test PASSED


#### 2.5 (4 балла) Для класса Network реализуйте метод grad_x, который должен реализовывать взятие производной от лосса по входу

In [44]:
net = Network([DenseLayer(3, 10), ReLU(), DenseLayer(10, 3), Softmax()], loss=CrossEntropy())
test_net(net)

(2, 3)
(2, 3)
Test PASSED


In [45]:
def numerical_diff_net1(net, x, labels):
    eps = 0.00001
    right_answer = []
    for i in range(x.shape[1]):
        for j in range(x.shape[2]):
            for k in range(x.shape[3]):
                delta = np.zeros((x.shape[1], x.shape[2], x.shape[3]))
                delta[i][j][k] = eps
                diff = (net.calculate_loss(x + delta, labels) - net.calculate_loss(x-delta, labels)) / (2*eps)
                right_answer.append(diff)
    return np.array(right_answer).T

def test_net1(net):
    x = X_train[::6000]
    labels = Y_train[::6000]
    num_grad = numerical_diff_net1(net, x, labels)
    grad = net.grad_x(x, labels)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)
net = Network([Conv2DLayer(padding='valid', stride=3, input_channels=1, output_channels=1),\
               Conv2DTrLayer(stride=2, input_channels=1, output_channels=1), \
               FlattenLayer(), DenseLayer(361, 10), ReLU(), DenseLayer(10, 10), Softmax()], loss=CrossEntropy())
test_net1(net)

(10, 784)
(10, 784)
Test PASSED


### 3. Реализация градиентов по параметрам и метода обратного распространения ошибки с обновлением парметров сети

#### 3.1  Реализуйте функции grad_b и grad_W. При подготовке теста grad_W предполагается, что W является отномерным вектором.

In [46]:
def numerical_grad_b(input_size, output_size, b, W, x):
    eps = 0.00001
    right_answer = []
    for i in range(len(b)):
        delta = np.zeros(b.shape)
        delta[i] = eps
        dense1 = DenseLayer(input_size, output_size, W_init=W, b_init=b+delta)
        dense2 = DenseLayer(input_size, output_size, W_init=W, b_init=b-delta)
        diff = (dense1.forward(x) - dense2.forward(x)) / (2*eps)
        right_answer.append(diff.T)
    return np.array(right_answer).T

def test_grad_b():
    input_size = 3
    output_size = 4 
    W_init = np.random.random((input_size, output_size))
    b_init = np.random.random((output_size,))
    x = np.random.random((2, input_size))
    
    dense = DenseLayer(input_size, output_size, W_init, b_init)
    grad = dense.grad_b(x)

    num_grad = numerical_grad_b(input_size, output_size, b_init, W_init, x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)

test_grad_b()

(2, 4, 4)
(2, 4, 4)
Test PASSED


In [47]:
def numerical_grad_W(input_size, output_size, b, W, x):
    eps = 0.00001
    right_answer = []
    for i in range(W.shape[0]):
        for j in range(W.shape[1]):
            delta = np.zeros(W.shape)
            delta[i, j] = eps
            dense1 = DenseLayer(input_size, output_size, W_init=W+delta, b_init=b)
            dense2 = DenseLayer(input_size, output_size, W_init=W-delta, b_init=b)
            diff = (dense1.forward(x) - dense2.forward(x)) / (2*eps)
            right_answer.append(diff.T)
    return np.array(right_answer).T

def test_grad_W():
    input_size = 3
    output_size = 4 
    W_init = np.random.random((input_size, output_size))
    b_init = np.random.random((4,))
    x = np.random.random((2, input_size))
        
    dense = DenseLayer(input_size, output_size, W_init, b_init)
    grad = dense.grad_W(x)

    num_grad = numerical_grad_W(input_size, output_size, b_init, W_init, x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)

test_grad_W()

(2, 4, 12)
(2, 4, 12)
Test PASSED


In [48]:
def numerical_grad(input_size, output_size, b, W, x, l):
    eps = 0.00001
    right_answer = []
    for i in range(W.shape[0]):
        for j in range(W.shape[1]):
            delta = np.zeros(W.shape)
            delta[i, j] = eps
            net1 = Network([DenseLayer(3, 3), ReLU(), DenseLayer(input_size, output_size, W_init=W+delta, b_init=b), ReLU(),DenseLayer(4, 4), ReLU(), Softmax()], loss=CrossEntropy())
            net2 = Network([DenseLayer(3, 3), ReLU(), DenseLayer(input_size, output_size, W_init=W-delta, b_init=b), ReLU(),DenseLayer(4, 4), ReLU(), Softmax()], loss=CrossEntropy())
            diff = (net1.calculate_loss(x, l) - net2.calculate_loss(x, l)) / (2*eps)
            right_answer.append(diff.T)
    return np.array(right_answer).T

def test_grad():
    input_size = 3
    output_size = 4 
    W_init = np.random.random((input_size, output_size))
    b_init = np.random.random((4,))
    x = np.random.random((2, input_size))
    l = np.array([[0,0,1,0],[0,0,0,1]])    
    dense = DenseLayer(input_size, output_size, W_init, b_init)
    grad = Network([DenseLayer(3, 3), ReLU(), dense, ReLU(), DenseLayer(4, 4), ReLU(), Softmax()], loss=CrossEntropy()).grad_param(x, l)[2][0]

    num_grad = numerical_grad(input_size, output_size, b_init, W_init, x, l)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad - grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)

test_grad()

(2, 12)
(2, 12)
Test PASSED


In [49]:
def my_init(shape, dtype=None):
    kernel = np.load('kernel.npy')
    return kernel
def my_initb(shape, dtype=None):
    kernel = np.load('bias.npy')
    return kernel
tens = np.load('tensor.npy')
kernel = np.load('kernel.npy')
bias = np.load('bias.npy')
layer = Conv2DLayer(kernel_size=3, input_channels=5, output_channels=2, 
                 padding='valid', stride=2, K_init=kernel.transpose((3,2,0,1)), b_init=bias)
res = layer.forward(tens.transpose((0,3,1,2)))
resk = Conv2D(2, 3, strides=2, padding="valid", kernel_initializer=my_init, bias_initializer=my_initb, input_shape=tens.shape[1:])(tens)
if np.sum(np.abs(res.transpose((0, 2, 3, 1)) - resk)) < 0.001:
        print('Test PASSED')
layer = Conv2DLayer(kernel_size=3, input_channels=5, output_channels=2, 
                 padding='same', stride=1, K_init=kernel.transpose((3,2,0,1)), b_init=bias)
res = layer.forward(tens.transpose((0,3,1,2)))
resk = Conv2D(2, 3, strides=1, padding="same", kernel_initializer=my_init, bias_initializer=my_initb, input_shape=tens.shape[1:])(tens)
if np.sum(np.abs(res.transpose((0, 2, 3, 1)) - resk)) < 0.01:
        print('Test PASSED')

Test PASSED
Test PASSED


In [50]:
def my_init(shape, dtype=None):
    kernel = np.load('kernel.npy')
    return kernel
def my_initb(shape, dtype=None):
    kernel = np.load('bias.npy')
    return kernel
tens = np.load('tensor.npy')
kernel = np.load('kernel.npy')
bias = np.load('bias.npy')
layer = Conv2DLayer1(kernel_size=3, input_channels=5, stride=2, output_channels=2, K_init=kernel.transpose((3,2,0,1)), b_init=bias)
res = layer.forward(tens.transpose((0,3,1,2)))
resk = Conv2D(2, 3, strides=2, padding="valid", kernel_initializer=my_init, bias_initializer=my_initb, input_shape=tens.shape[1:])(tens)
if np.sum(np.abs(res.transpose((0, 2, 3, 1)) - resk)) < 0.001:
        print('Test PASSED')
layer = Conv2DLayer(kernel_size=3, input_channels=5, output_channels=2, 
                 padding='same', stride=1, K_init=kernel.transpose((3,2,0,1)), b_init=bias)
res = layer.forward(tens.transpose((0,3,1,2)))
resk = Conv2D(2, 3, strides=1, padding="same", kernel_initializer=my_init, bias_initializer=my_initb, input_shape=tens.shape[1:])(tens)
if np.sum(np.abs(res.transpose((0, 2, 3, 1)) - resk)) < 0.01:
        print('Test PASSED')

Test PASSED
Test PASSED


In [95]:
def numerical_diff_layert(layer, x):
    eps = 0.00001
    right_answer = []
    for i in range(x.shape[1]):
        for j in range(x.shape[2]):
            for k in range(x.shape[3]):
                delta = np.zeros((x.shape[1], x.shape[2], x.shape[3]))
                delta[i][j][k] = eps
                diff = (layer.forward(x + delta) - layer.forward(x-delta)) / (2*eps)
                right_answer.append(diff.reshape(2,-1).T)
    return np.array(right_answer).T

def test_layert(layer, cin):
    x = np.arange(72*cin).reshape((2, cin, 6, 6))
    num_grad = numerical_diff_layert(layer, x)
    grad = layer.grad_x(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad-grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad[0])
        print('Your gradient is ')
        print(grad[0])
        print(np.max(num_grad[0]-grad[0]))
layer = Conv2DLayer(padding='valid', stride=1, input_channels=2, output_channels=1)
test_layert(layer, 2)
layer = Conv2DLayer(padding='same', stride=1, input_channels=2, output_channels=1)
test_layert(layer, 2)

(2, 16, 72)
(2, 16, 72)
Test PASSED
(2, 36, 72)
(2, 36, 72)
Test PASSED


In [96]:
layer = FlattenLayer()
test_layert(layer, 2)

(2, 72, 72)
(2, 72, 72)
Test PASSED


In [97]:
layer = MaxPooling()
test_layert(layer, 2)

(2, 18, 72)
(2, 18, 72)
Test PASSED


In [100]:
layer = ReLU()
test_layert(layer, 2)

(2, 72, 72)
(2, 72, 72)
Test PASSED


In [120]:
def numerical_diff_layertb(layer, x):
    eps = 0.00001
    kernel = np.ones((2,2,3,3))
    bias = np.arange(2)
    right_answer = []
    for i in range(len(bias)):
        delta = np.zeros(len(bias))
        delta[i] = eps
        diff = (Conv2DLayer(padding='valid', stride=1, input_channels=2, output_channels=2, K_init=kernel, b_init=bias+delta)\
                .forward(x) - Conv2DLayer(padding='valid', stride=1, input_channels=2, \
                                           output_channels=2, K_init=kernel, b_init=bias-delta)\
                .forward(x)) / (2*eps)
        right_answer.append(diff.reshape(2,-1).T)
    return np.array(right_answer).T

def test_layertb(layer):
    x = np.arange(144).reshape((2, 2, 6, 6))
    num_grad = numerical_diff_layertb(layer, x)
    grad = layer.grad_bias(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad-grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad[0])
        print('Your gradient is ')
        print(grad)
kernel = np.ones((2,2,3,3))
bias = np.arange(2)
layer = Conv2DLayer(padding='valid', stride=1, input_channels=2, output_channels=2, K_init=kernel, b_init=bias)
test_layertb(layer)

(2, 32, 2)
(2, 32, 2)
Test PASSED


In [121]:
def numerical_diff_layertk(layer, x):
    eps = 0.00001
    kernel = np.ones((2,2,3,3))
    bias = np.arange(2)
    right_answer = []
    for k in range(kernel.shape[0]):
        for l in range(kernel.shape[1]):
            for i in range(kernel.shape[2]):
                for j in range(kernel.shape[3]):
                    delta = np.zeros(kernel.shape)
                    delta[k][l][i][j] = eps
                    diff = (Conv2DLayer(padding='valid', stride=1,\
                                        input_channels=2, output_channels=2, K_init=kernel+delta, b_init=bias)\
                            .forward(x) - Conv2DLayer(padding='valid', stride=1, input_channels=2, \
                                                              output_channels=2, K_init=kernel-delta, b_init=bias)\
                            .forward(x)) / (2*eps)
                    right_answer.append(diff.reshape(2,-1).T)
    return np.array(right_answer).T

def test_layertk(layer):
    x = np.arange(144).reshape((2, 2, 6, 6))
    num_grad = numerical_diff_layertk(layer, x)
    grad = layer.grad_kernel(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad-grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad[0][0])
        print('Your gradient is ')
        print(grad[0][0])
kernel = np.ones((2,2,3,3))
bias = np.arange(2)
layer = Conv2DLayer(padding='valid', stride=1, input_channels=2, output_channels=2, K_init=kernel, b_init=bias)
test_layertk(layer)

(2, 32, 36)
(2, 32, 36)
Test PASSED


In [122]:
layer = Conv2DTrLayer(padding=1, stride=2, input_channels=2, output_channels=2)
test_layert(layer, 2)

(2, 242, 72)
(2, 242, 72)
Test PASSED


In [123]:
def numerical_diff_layertb1(layer, x):
    eps = 0.00001
    kernel = np.ones((3,3,2,2))
    bias = np.arange(2)
    right_answer = []
    for i in range(len(bias)):
        delta = np.zeros(len(bias))
        delta[i] = eps
        diff = (Conv2DTrLayer(padding=1, stride=2, input_channels=2, output_channels=2, K_init=kernel, b_init=bias+delta)\
                .forward(x) - Conv2DTrLayer(padding=1, stride=2, input_channels=2, output_channels=2, \
                                                    K_init=kernel, b_init=bias-delta)\
                .forward(x)) / (2*eps)
        right_answer.append(diff.reshape(2,-1).T)
    return np.array(right_answer).T

def test_layertb1(layer):
    x = np.arange(64).reshape((2, 2, 4, 4))
    num_grad = numerical_diff_layertb1(layer, x)
    grad = layer.grad_bias(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad-grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)
kernel = np.ones((3,3,2,2))
bias = np.arange(2)
layer = Conv2DTrLayer(padding=1, stride=2, input_channels=2, output_channels=2, K_init=kernel, b_init=bias)
test_layertb1(layer)

(2, 98, 2)
(2, 98, 2)
Test PASSED


In [125]:
def numerical_diff_layertk1(layer, x):
    eps = 0.00001
    kernel = np.ones((3,3,2,2))
    bias = np.arange(2)
    right_answer = []
    for k in range(kernel.shape[3]):
        for l in range(kernel.shape[2]):
            for i in range(kernel.shape[0]):
                for j in range(kernel.shape[1]):
                    delta = np.zeros(kernel.shape)
                    delta[i][j][l][k] = eps
                    diff = (Conv2DTrLayer(padding=1, stride=2,\
                                        input_channels=2, output_channels=2, K_init=kernel+delta, b_init=bias)\
                            .forward(x) - Conv2DTrLayer(padding=1, stride=2, input_channels=2, \
                                                              output_channels=2, K_init=kernel-delta, b_init=bias)\
                            .forward(x)) / (2*eps)
                    right_answer.append(diff.reshape(2,-1).T)
    return np.array(right_answer).T

def test_layertk1(layer):
    x = np.arange(64).reshape((2, 2, 4, 4))
    num_grad = numerical_diff_layertk1(layer, x)
    grad = layer.grad_kernel(x)
    print(num_grad.shape)
    print(grad.shape)
    if np.sum(np.abs(num_grad-grad)) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad[0][1])
        print('Your gradient is ')
        print(grad[0][1])
kernel = np.ones((3,3,2,2))
bias = np.arange(2)
layer = Conv2DTrLayer(padding=1, stride=2, input_channels=2, output_channels=2, K_init=kernel, b_init=bias)
test_layertk1(layer)

(2, 98, 36)
(2, 98, 36)
Test PASSED


#### 3.2 Полностью реализуйте метод обратного распространения ошибки в функции train_step класса Network


Рекомендуем реализовать сначала функцию Network.grad_param(), которая возвращает список длиной в количество слоёв и элементом которого является список градиентов по параметрам.
После чего, имея список градиентов, написать функцию обновления параметров для каждого слоя. 

Совет: рекомендуем написать тест для кода подсчета градиента по параметрам, чтобы быть уверенным в том, что градиент через всю сеть считается правильно
    

In [59]:
def numerical_grad_nW(input_size, output_size, b, W, x, labels):
    eps = 0.00001
    right_answer = []
    for i in range(W.shape[0]):
        for j in range(W.shape[1]):
            delta = np.zeros(W.shape)
            delta[i, j] = eps
            dense1 = Network([FlattenLayer(), DenseLayer(input_size, output_size), \
                              ReLU(), DenseLayer(20, 10, W_init=W+delta, b_init=b), Softmax()], loss=CrossEntropy())
            dense2 = Network([FlattenLayer(), DenseLayer(input_size, output_size), \
                              ReLU(), DenseLayer(20, 10, W_init=W-delta, b_init=b), Softmax()], loss=CrossEntropy())
            diff = (dense1.calculate_loss(x, labels) - dense2.calculate_loss(x, labels)) / (2*eps)
            right_answer.append(diff.T)
    return np.array(right_answer).T

def test_grad_nW():
    input_size = 784
    output_size = 20 
    #W_init = np.random.random((input_size, output_size))
    #b_init = np.random.random((output_size))
    W_init = np.random.random((20, 10))
    b_init = np.random.random((10))
    x = X_train[:3]
    labels = Y_train[:3]    
    dense = Network([FlattenLayer(), DenseLayer(input_size, output_size), \
                     ReLU(), DenseLayer(20, 10, W_init, b_init), Softmax()], loss=CrossEntropy())
    grad = dense.grad_param(x, labels)

    num_grad = numerical_grad_nW(input_size, output_size, b_init, W_init, x, labels)
    #print(num_grad.shape)
    #print(grad.shape)
    if np.sum(np.abs(num_grad - grad[3][0])) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad)

test_grad_nW()

Test PASSED


In [86]:
def numerical_grad_nW1(input_size, output_size, b, W, x, labels):
    eps = 0.00001
    right_answer = []
    for k in range(W.shape[0]):
        for l in range(W.shape[1]):
            for i in range(W.shape[2]):
                for j in range(W.shape[3]):
                    delta = np.zeros(W.shape)
                    delta[k, l, i, j] = eps
                    dense1 = Network([Conv2DLayer(padding='valid', stride=1, input_channels=1, output_channels=1, K_init=W+delta, b_init=b),\
                                      FlattenLayer(), DenseLayer(input_size, output_size), \
                                      ReLU(), DenseLayer(20, 10), Softmax()], loss=CrossEntropy())
                    dense2 = Network([Conv2DLayer(padding='valid', stride=1, input_channels=1, output_channels=1, K_init=W-delta, b_init=b),\
                                      FlattenLayer(), DenseLayer(input_size, output_size), \
                                      ReLU(), DenseLayer(20, 10), Softmax()], loss=CrossEntropy())
                    diff = (dense1.calculate_loss(x, labels) - dense2.calculate_loss(x, labels)) / (2*eps)
                    right_answer.append(diff.T)
    return np.array(right_answer).T

def test_grad_nW1():
    input_size = 676
    output_size = 20 
    #W_init = np.random.random((input_size, output_size))
    #b_init = np.random.random((output_size))
    K_init = np.random.random((1,1,3,3))
    b_init = np.random.random((1))
    x = X_train[::12000]
    labels = Y_train[::12000]    
    dense = Network([Conv2DLayer(padding='valid', stride=1, input_channels=1, output_channels=1, K_init=K_init, b_init=b_init),\
                     FlattenLayer(), DenseLayer(input_size, output_size), \
                     ReLU(), DenseLayer(20, 10), Softmax()], loss=CrossEntropy())
    grad = dense.grad_param(x, labels)

    num_grad = numerical_grad_nW1(input_size, output_size, b_init, K_init, x, labels)
    #print(num_grad.shape)
    #print(grad.shape)
    if np.sum(np.abs(num_grad - grad[0][0])) < 0.01:
        print('Test PASSED')
    else:
        print('Something went wrong!')
        print('Numerical grad is')
        print(num_grad)
        print('Your gradient is ')
        print(grad[0][0])

test_grad_nW1()

Test PASSED


#### 3.3 Ознакомьтесь с реализацией функции fit класса Network. Запустите обучение модели. Если всё работает правильно, то точность на валидации должна будет возрастать

In [84]:
net = Network([DenseLayer(784, 10), Softmax()], loss=CrossEntropy())
trainX = X_train.reshape(len(X_train), -1)
net.fit(trainX[::3], Y_train[::3], validation_split=0.25, 
            batch_size=16, nb_epoch=5, learning_rate=0.01)

100%|████████████████████████████████████████████████████████████████████████████████| 937/937 [02:06<00:00,  7.40it/s]


1 epoch: val 0.72


100%|████████████████████████████████████████████████████████████████████████████████| 937/937 [02:07<00:00,  7.32it/s]


2 epoch: val 0.79


100%|████████████████████████████████████████████████████████████████████████████████| 937/937 [02:08<00:00,  7.31it/s]


3 epoch: val 0.82


100%|████████████████████████████████████████████████████████████████████████████████| 937/937 [02:06<00:00,  7.38it/s]


4 epoch: val 0.84


100%|████████████████████████████████████████████████████████████████████████████████| 937/937 [02:07<00:00,  7.36it/s]


5 epoch: val 0.85


In [62]:
net = Network([DenseLayer(784, 20), ReLU(), DenseLayer(20, 10), Softmax()], loss=CrossEntropy())
trainX = X_train.reshape(len(X_train), -1)
net.fit(trainX[::6], Y_train[::6], validation_split=0.25, 
            batch_size=16, nb_epoch=5, learning_rate=0.001)    

100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [03:05<00:00,  2.53it/s]


1 epoch: val 0.12


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [03:08<00:00,  2.48it/s]


2 epoch: val 0.21


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [03:03<00:00,  2.55it/s]


3 epoch: val 0.30


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [03:01<00:00,  2.58it/s]


4 epoch: val 0.36


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [03:03<00:00,  2.55it/s]

5 epoch: val 0.44





#### 3.5 Продемонстрируйте, что ваша реализация позволяет обучать более глубокие нейронные сети 

In [102]:
net = Network([Conv2DLayer(padding='valid', stride=1, input_channels=1, output_channels=1), ReLU(),\
               FlattenLayer(), DenseLayer(676, 10), Softmax()], loss=CrossEntropy())
net.fit(X_train[::60], Y_train[::60], validation_split=0.25, 
            batch_size=16, nb_epoch=5, learning_rate=0.001)

100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:38<00:00,  2.15s/it]


1 epoch: val 0.04


100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:35<00:00,  2.08s/it]


2 epoch: val 0.06


100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:34<00:00,  2.07s/it]


3 epoch: val 0.07


100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:33<00:00,  2.03s/it]


4 epoch: val 0.08


100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:33<00:00,  2.03s/it]


5 epoch: val 0.10


In [110]:
net = Network([Conv2DLayer(padding='same', stride=1, input_channels=1, output_channels=1), ReLU(),\
               FlattenLayer(), DenseLayer(784, 10), Softmax()], loss=CrossEntropy())
net.fit(X_train[:600], Y_train[:600], validation_split=0.25, 
            batch_size=15, nb_epoch=5, learning_rate=0.001)

100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:11<00:00,  2.37s/it]


1 epoch: val 0.11


100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:10<00:00,  2.35s/it]


2 epoch: val 0.13


100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:10<00:00,  2.35s/it]


3 epoch: val 0.13


100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:09<00:00,  2.33s/it]


4 epoch: val 0.16


100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:09<00:00,  2.30s/it]


5 epoch: val 0.17


In [108]:
net = Network([Conv2DLayer(padding='valid', stride=3, input_channels=1, output_channels=1), ReLU(),\
               Conv2DTrLayer(stride=2, input_channels=1, output_channels=1), ReLU(), \
               FlattenLayer(), DenseLayer(361, 10), ReLU(), DenseLayer(10, 10), Softmax()], loss=CrossEntropy())
net.fit(X_train[::6], Y_train[::6], validation_split=0.25, 
            batch_size=16, nb_epoch=5, learning_rate=0.001)

100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [14:58<00:00,  1.92s/it]


1 epoch: val 0.10


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [14:05<00:00,  1.81s/it]


2 epoch: val 0.11


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [14:05<00:00,  1.81s/it]


3 epoch: val 0.12


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [13:56<00:00,  1.79s/it]


4 epoch: val 0.13


100%|████████████████████████████████████████████████████████████████████████████████| 468/468 [13:52<00:00,  1.78s/it]


5 epoch: val 0.14
