# Сверточные нейронные сети

## Что такое сверточный слой?

Сверточный слой - более упрощенный слой, который позволяет сократить количество параметров сети. Важной особенностью слоя является то, что производится операция свертки слоя c набором весов. 

Формально, математическая модель определена следующим образом:

рассмотрим двумерный канал изображения размером $W \times H$, определим сверточный слой с ядром размера $K \times K$ как операцию свертки для каждого квадрата размером $K \times K$ с матрицей весов.

Параметры сверточного слоя:

* количество входных фильтров - $in$
* количество выходных фильтров - $out$.

Для каждого фильтра для пикселя выход определяется следующим образом:

$$
   out_{i,j} = \sum_{s = -[(k-1)/2]}^{[k/2]} \sum_{t=-[(k-1)/2]}^{[k/2]} W_{s, t} I_{i +s, j + t} + b,
$$

где $W$ - матрица весов для фильтра, $b$ - смещение (bias), $I$ - фильтр (двумерный массив размера $W \times H$), к которому применяется свертка.

**Вопрос**. Какое количество тренируемых параметров используется в сверточном слое?

**Ответ** $(K \times K + 1) \times in \times out$.



## Дополнительные параметры сверточного слоя

Дополнительно необходимо определить следующие параметры сверточного слоя:
* stride - шаг, с каким производится свертка
* padding - начальное и конечное положение, с которого начинается свертка.

**Вопрос.** Какой будет размер выходного фильтра, если используется свертка с ядром $K \times K$, stride - (1, 1), начало и конец находятся в вершинах изображения?

**Ответ.** $ (W - K + 1) \times (H - K + 1)$.

Чтобы размер фильтра не менялся, применяется следующая стратегия: входной фильтр дополняется нулями таким образом, чтобы размер выходного слоя был $W \times H$. Такая стратегия называется same padding. Изначальная стратегия называется valid padding.

Приступим к реализации сверточного слоя

In [0]:
import numpy as np
def conv2d_one_filter(X, W, padding='same', stride=(1, 1)):
    """
        @param X: input image, [w \times h]
        @param W: weights, [K \times K]
        @param padding: padding type - same or full
    """
    
    kernel_y, kernel_x = W.shape[:2]
    
    # Calculating shape of new pad
    
    if padding == 'same':
        y_shape = X.shape[0] + kernel_y - 1
        x_shape = X.shape[1] + kernel_x - 1
    else:
        y_shape, x_shape = X.shape[:2]
    
    x_padded = np.zeros((y_shape, x_shape), dtype=X.dtype)
    print(x_padded.shape)
    
    if padding == 'valid':
        padding_left = 0
        padding_top = 0
    else:
        padding_left = (kernel_x - 1) // 2
        padding_top = (kernel_y - 1) // 2
    
    x_padded[
        padding_top:padding_top + X.shape[0],
        padding_left:padding_left + X.shape[1]
    ] = X

    result = np.zeros((x_padded.shape[0] - kernel_y + 1, x_padded.shape[1] - kernel_x + 1))
    
    for y in range(x_padded.shape[0]):
        for x in range(x_padded.shape[1]):
            if y + kernel_y > x_padded.shape[0] or x + kernel_x > x_padded.shape[1]:
                continue
            result[y, x] = np.sum(x_padded[y:y + kernel_y, x:x + kernel_x] * W)
    return result
  

In [0]:
import scipy.signal

In [0]:
conv2d_one_filter(np.array([
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
]), np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ])
)

(5, 5)


array([[ 94., 154., 106.],
       [186., 285., 186.],
       [106., 154.,  94.]])

In [0]:
import tensorflow as tf

In [0]:
sess = tf.InteractiveSession()
a = tf.placeholder(tf.float32, [1, 3, 3, 1])
w = tf.placeholder(tf.float32, [3, 3, 1, 1])
out_same = tf.nn.conv2d(a, w, padding='SAME')
out_valid = tf.nn.conv2d(a, w, padding='VALID')

In [0]:
sess.run(out_same, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
    w: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((3, 3, 1, 1))
})

array([[[[ 94.],
         [154.],
         [106.]],

        [[186.],
         [285.],
         [186.]],

        [[106.],
         [154.],
         [ 94.]]]], dtype=float32)

In [0]:
sess.run(out_valid, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
    w: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((3, 3, 1, 1))
})

array([[[[285.]]]], dtype=float32)

##  Pooling

In [0]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))

In [0]:
pool_valid = tf.nn.max_pool2d(a, ksize=(2, 2), strides=(1, 1), padding='VALID')
pool_same = tf.nn.max_pool2d(a, ksize=(2, 2), strides=(1, 1), padding='SAME')

In [0]:
sess.run(pool_valid, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
})

array([[[[5.],
         [6.]],

        [[8.],
         [9.]]]], dtype=float32)

In [0]:
sess.run(pool_same, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
})

array([[[[5.],
         [6.],
         [6.]],

        [[8.],
         [9.],
         [9.]],

        [[8.],
         [9.],
         [9.]]]], dtype=float32)

In [0]:
import math

In [0]:
def max_pooling(X, kernel_size=(2, 2), padding='same', strides=(2, 2)):
    height, width = X.shape[:2]
    if padding == 'same':
        out_height = math.ceil(height / strides[0])
        out_width = math.ceil(width / strides[1])
    else:
        out_height = (height - kernel_size[0] + 1) // strides[0]
        out_width = (width - kernel_size[1] + 1) // strides[1]
    
    result = np.zeros((out_height, out_width), dtype=X.dtype)
    
    for y in range(out_height):
        for x in range(out_width):
            start_y = y * strides[0]
            start_x = x * strides[1]
            
            result[y, x] = np.max(
                X[
                    start_y:start_y + kernel_size[0],
                    start_x:start_x + kernel_size[1]
                ]
            )
    return result

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ])
)

array([[5, 6],
       [8, 9]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='valid'
)

array([[5]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='valid',
    strides=(1, 1)
)

array([[5, 6],
       [8, 9]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='same',
    strides=(1, 1)
)

array([[5, 6, 6],
       [8, 9, 9],
       [8, 9, 9]])

## Базовые блоки

In [0]:
import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

def conv_layer(
        input_tensor,
        output_channels,
        name='conv',
        kernel_size=(3, 3),
        strides=(1, 1),
        padding='SAME'
    ):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        input_shape = input_tensor.get_shape().as_list()
        
        input_channels = input_shape[-1]
        
        print(input_channels, output_channels)
        
        weights = tf.get_variable(name='weights', shape=[
            kernel_size[0], kernel_size[1], input_channels, output_channels
        ])
        
        bias = tf.get_variable(
            name='bias',
            shape=[output_channels],
            initializer=tf.zeros_initializer()
        )
        
        conv = tf.nn.conv2d(
            input=input_tensor,
            filter=weights,
            strides=strides,
            padding='SAME',
            name='conv'
        )
        
        output = tf.nn.bias_add(conv, bias, name='output')
    return output

In [4]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))
b = conv_layer(a, 3)
example = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).reshape((1, 3, 3, 1))
sess.run(tf.global_variables_initializer())
sess.run(b, feed_dict={
    a: example
})

W0731 05:55:08.455413 140287620593536 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


1 3


array([[[[-0.03561094, -2.5681984 , -0.18274444],
         [ 1.199398  , -2.0457542 ,  0.7028016 ],
         [ 0.37761962, -0.17797205,  3.029345  ]],

        [[-0.6255276 , -3.8368087 , -0.383636  ],
         [ 1.5125254 , -1.5389371 ,  0.21473914],
         [ 0.53970647,  0.22062658,  3.9709496 ]],

        [[-0.5383489 ,  1.4607241 , -0.7147733 ],
         [ 0.690181  ,  3.8535058 , -1.4720438 ],
         [-1.4785525 ,  0.9838684 , -1.1894754 ]]]], dtype=float32)

In [0]:
def max_pool(
    input_tensor,
    kernel_size=(2, 2),
    strides=(2, 2),
    padding='SAME',
    name='pool'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        output = tf.nn.max_pool2d(input_tensor, ksize=kernel_size, strides=strides, padding=padding, name='pool')
    return output

In [6]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))
b = max_pool(a)
example = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).reshape((1, 3, 3, 1))
sess.run(tf.global_variables_initializer())
sess.run(b, feed_dict={
    a: example
})

array([[[[5.],
         [6.]],

        [[8.],
         [9.]]]], dtype=float32)

In [0]:
def flatten(
    input_tensor,
    name='flatten'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        shape = input_tensor.get_shape().as_list()[1:]
        num_elements = np.prod(shape)
        return tf.reshape(input_tensor, [-1, num_elements], name='reshape')

In [8]:
a = tf.zeros([1, 3, 3, 1])
b = flatten(a)
sess.run(b)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

In [0]:
def dense(
    input_tensor,
    output_neurons,
    name='fc'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        input_neurons = input_tensor.get_shape().as_list()[1]
        
        weights = tf.get_variable(
            name='weights',
            shape=[input_neurons, output_neurons]
        )
        
        bias = tf.get_variable(
            name='bias',
            shape=[output_neurons]
        )
        
        product = tf.matmul(input_tensor, weights, name='product')
        
        output = tf.nn.bias_add(product, bias, name='output')
    return output

In [0]:
a = tf.zeros([1, 9])
b = dense(a, 18, name='fc')

In [11]:
sess.run(tf.global_variables_initializer())
sess.run(b)

array([[ 0.3247372 ,  0.266706  , -0.03641906,  0.34226698,  0.15396553,
        -0.2293282 ,  0.18490332, -0.01789281, -0.29601657, -0.02632582,
         0.35386634, -0.3678915 ,  0.02026698,  0.25488275, -0.29531676,
        -0.313573  ,  0.3309955 ,  0.3230583 ]], dtype=float32)

##Архитектуры сетей

### LeNet

In [0]:
def conv_block(x, output_channels, name, kernel_size=(3, 3), padding='SAME'):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        conv_out = conv_layer(x, output_channels, kernel_size=kernel_size, padding=padding)
        activation = tf.nn.relu(conv_out, name='relu')
    return activation

In [13]:
a = tf.zeros((1, 3, 3, 1))
b = conv_block(a, 6, 'conv1')

1 6


In [14]:
sess.run(tf.global_variables_initializer())
sess.run(b).shape

(1, 3, 3, 6)

In [0]:
def le_net(input_tensor):
    with tf.variable_scope('le_net', reuse=tf.AUTO_REUSE):
        conv1_out = conv_block(
            input_tensor, 6,
            name='conv1',
            kernel_size=(5, 5),
            padding='VALID'
        )
        pool1_out = max_pool(conv1_out, name='pool1')
        conv2_out = conv_block(
            pool1_out, 16,
            name='conv2',
            kernel_size=(5, 5),
            padding='VALID'
        )
        
        pool2_out = max_pool(conv2_out, name='pool2')
        
        flatten_out = flatten(pool2_out)
        
        fc1_out = dense(flatten_out, 120, name='fc1')
        fc2_out = dense(fc1_out, 84, name='fc2')
        
        output = dense(fc2_out, 10, name='fc3')
    return output
        

In [16]:
digits_placeholder = tf.placeholder(tf.float32, [None, 32, 32, 3])
logits = le_net(digits_placeholder)

3 6
6 16


In [0]:
labels_placeholder = tf.placeholder(tf.float32, [None, 10], name='le_net_labels')

In [18]:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    labels=labels_placeholder,
    logits=logits
)
)

W0731 05:55:12.950386 140287620593536 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [0]:
le_net_trainable_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='le_net')
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=le_net_trainable_variables)


In [0]:
le_net_predictions = tf.argmax(logits, axis=1)
le_net_target = tf.argmax(labels_placeholder, axis=1)

In [0]:
with tf.name_scope('le_net/metrics/train/'):
    le_net_accuracy_train, le_net_accuracy_train_op = tf.metrics.accuracy(
        labels=le_net_target,
        predictions=le_net_predictions
    )
    le_net_train_loss, le_net_train_loss_op = tf.metrics.mean(values=loss, name='loss')
with tf.name_scope('le_net/metrics/val/'):
    le_net_accuracy_val, le_net_accuracy_val_op = tf.metrics.accuracy(
        labels=le_net_target,
        predictions=le_net_predictions
    )
    le_net_val_loss, le_net_val_loss_op = tf.metrics.mean(values=loss, name='loss')

In [44]:
tf.local_variables()

[<tf.Variable 'le_net/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/accuracy_1/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/accuracy_1/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss_1/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss_1/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metric

In [0]:
from keras.datasets import cifar10
from keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train_labels = to_categorical(y_train)
y_test_labels = to_categorical(y_test)

In [0]:
def reset_metrics(scope):
#     print(tf.local_variables())
    stream_variables = [v for v in tf.local_variables() if scope in v.name]
    sess.run(tf.variables_initializer(stream_variables))

In [47]:
# Check that data is ready

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run([loss, le_net_accuracy_train], feed_dict={
    digits_placeholder: X_train[:10],
    labels_placeholder: y_train_labels[:10]
})

[37.533615, 0.0]

In [48]:
le_net_trainable_variables

[<tf.Variable 'le_net/conv1/conv/weights:0' shape=(5, 5, 3, 6) dtype=float32_ref>,
 <tf.Variable 'le_net/conv1/conv/bias:0' shape=(6,) dtype=float32_ref>,
 <tf.Variable 'le_net/conv2/conv/weights:0' shape=(5, 5, 6, 16) dtype=float32_ref>,
 <tf.Variable 'le_net/conv2/conv/bias:0' shape=(16,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc1/weights:0' shape=(1024, 120) dtype=float32_ref>,
 <tf.Variable 'le_net/fc1/bias:0' shape=(120,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc2/weights:0' shape=(120, 84) dtype=float32_ref>,
 <tf.Variable 'le_net/fc2/bias:0' shape=(84,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc3/weights:0' shape=(84, 10) dtype=float32_ref>,
 <tf.Variable 'le_net/fc3/bias:0' shape=(10,) dtype=float32_ref>]

In [0]:
def iterate_batches(X, y, batch_size, shuffle=True):
    assert len(X) == len(y)
    
    indices = np.arange(len(X))
    if shuffle:
        np.random.shuffle(indices)
    
    for index in range(0, len(X), batch_size):
        yield X[index:index + batch_size], y[index:index + batch_size]

In [50]:
for epoch_num in range(50):
    reset_metrics('le_net/metrics/train')
    reset_metrics('le_net/metrics/val')
    for X_batch, y_batch in iterate_batches(X_train, y_train_labels, 500):
        _, _, _ = sess.run([optimizer, le_net_accuracy_train_op, le_net_train_loss_op], feed_dict={
            digits_placeholder: X_batch,
            labels_placeholder: y_batch
        })
        # print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([le_net_accuracy_train, le_net_train_loss]))
    
    for X_batch, y_batch in iterate_batches(X_test, y_test_labels, 500, shuffle=False):
        _, _ = sess.run([le_net_accuracy_val_op, le_net_val_loss_op], feed_dict = {
            digits_placeholder: X_batch,
            labels_placeholder: y_batch
        })
    print(f'Epoch {epoch_num + 1} val [acc, loss]:', sess.run([le_net_accuracy_val, le_net_val_loss]))

Epoch 1 train [acc, loss]: [0.12302, 3.3215249]
Epoch 1 val [acc, loss]: [0.1512, 0.36490077]
Epoch 2 train [acc, loss]: [0.16608, 0.35136887]
Epoch 2 val [acc, loss]: [0.187, 0.33783063]
Epoch 3 train [acc, loss]: [0.1994, 0.3302531]
Epoch 3 val [acc, loss]: [0.219, 0.32274953]
Epoch 4 train [acc, loss]: [0.23234, 0.3169203]
Epoch 4 val [acc, loss]: [0.2463, 0.31234303]
Epoch 5 train [acc, loss]: [0.26328, 0.30720618]
Epoch 5 val [acc, loss]: [0.2731, 0.3045999]
Epoch 6 train [acc, loss]: [0.28178, 0.2989803]
Epoch 6 val [acc, loss]: [0.2922, 0.2977122]
Epoch 7 train [acc, loss]: [0.30046, 0.29220814]
Epoch 7 val [acc, loss]: [0.3115, 0.29127377]
Epoch 8 train [acc, loss]: [0.31316, 0.28656602]
Epoch 8 val [acc, loss]: [0.3266, 0.2857734]
Epoch 9 train [acc, loss]: [0.32808, 0.28179651]
Epoch 9 val [acc, loss]: [0.3318, 0.2822675]
Epoch 10 train [acc, loss]: [0.3384, 0.27842686]
Epoch 10 val [acc, loss]: [0.3347, 0.27977246]
Epoch 11 train [acc, loss]: [0.34684, 0.27575427]
Epoch 11 v

### Batch Norm

Идея была высказана в 2014 году. Говорится,  что из-за смещенности градиентов нарушаются общие правила нормальности, применимые после входного слоя. Поэтому предлагается производить смещение в новый масштаб.

Иными словами,

$$
    out = \gamma \cdot \frac{x - \mathrm{E}x}{\sqrt{\mathrm{D}x + \varepsilon}} + \beta,
$$

где $\gamma$ и $\beta$ являются обучаемыми параметрами. 


**Вопрос** Как вычислять значение $\mathrm{E}x$, $\mathrm{D}x$?

**Ответ** Во время обучения: вычислять по batch-у, во время валидации - вычислять скользящее среднее по $\mathrm{E}$ и $\mathrm{D}$.

### AlexNet

### VGG

Идея - использовать свертки 3 x 3 в большом количестве.

### ResNet

Идея - использовать residual-блоки

### Inception

Идея - использовать inception блоки и дополнительный loss для классификации

### EffNet?

ToDo

### Аугментации датасетов

Повороты, развороты, crop-ы, масштабирование