## [CS231n](http://cs231n.github.io/convolutional-networks/)
**Local Connectivity.** When dealing with high-dimensional inputs such as images, as we saw above it is impractical to connect neurons to all neurons in the previous volume. Instead, we will connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron (equivalently this is the filter size). The extent of the connectivity along the depth axis is always equal to the depth of the input volume. It is important to emphasize again this asymmetry in how we treat the spatial dimensions (width and height) and the depth dimension: The connections are local in space (along width and height), but always full along the entire depth of the input volume.

`Example 1.` For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5*5*3 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.

`Example 2.` Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3*3*20 = 180 connections to the input volume. Notice that, again, the connectivity is local in space (e.g. 3x3), but full along the input depth (20).

We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

# $ W' = \frac{W−F+2P}{S} +1$

In [1]:
import numpy as np

In [78]:
def single_conv(feature_map, kernel, bias=0, stride=1):
    if kernel.shape[:2] != bias.shape:
        print('Kernel size and bias size are not same')
        return None
    #if feature_map.shape[-1] != kernel.shape[-1]:
    #    print('Kernel depth and feature depth are not same')
    #    return None
    #new feature size or steps for convolution
    h_step = (feature_map.shape[1] -  kernel.shape[1]) // stride + 1
    v_step = (feature_map.shape[0] - kernel.shape[0]) // stride + 1
    print(h_step, v_step)
    new_feature = np.zeros((v_step, h_step), dtype=feature_map.dtype)
    for v in range(0, v_step):
        for h in range(0, h_step):
            # inner product
            print('(v, h):({}, {})'.format(v, h))
            print('f:\n', feature_map[v:v+kernel.shape[1], h:h+kernel.shape[0]])
            print('k:\n', kernel)
            print('dot:\n', feature_map[v:v+kernel.shape[1], h:h+kernel.shape[0]] * kernel)
            print('b:\n', bias)
            print('tmp_result', feature_map[v:v+kernel.shape[1], h:h+kernel.shape[0]] * kernel + bias)
            print('sum:', np.sum(np.sum(feature_map[v:v+kernel.shape[1], h:h+kernel.shape[0]] * kernel + bias, axis=0), axis=0))
            new_feature[v, h] = np.sum(np.sum(feature_map[v:v+kernel.shape[1], h:h+kernel.shape[0]] * kernel + bias, axis=0), axis=0)
            print('result:', new_feature[v, h])
            # relu activation
            new_feature[v, h] = np.max(new_feature[v, h], 0) 
    return new_feature        

In [79]:
input_feature = np.array([[1, 1, 1, 1, 1, 1, 1],
                          [2, 2, 2, 2, 2, 2, 2],
                          [1, 1, 1, 1, 1, 1, 1],
                          [3, 3, 3, 3, 3, 3, 3],
                          [1, 1, 1, 1, 1, 1, 1],
                          [4, 4, 4, 4, 4, 4, 4],
                          [1, 1, 1, 1, 1, 1, 1],
                         ])
kernel = np.array([[0, 0, 0],
                   [1, 1, 1],
                   [0, 0, 0]
                  ])
bias = np.zeros(kernel.shape).astype(input_feature.dtype)

new_feature = single_conv(input_feature, kernel, bias)

5 5
(v, h):(0, 0)
f:
 [[1 1 1]
 [2 2 2]
 [1 1 1]]
k:
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
dot:
 [[0 0 0]
 [2 2 2]
 [0 0 0]]
b:
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
tmp_result [[0 0 0]
 [2 2 2]
 [0 0 0]]
sum: 6
result: 6
(v, h):(0, 1)
f:
 [[1 1 1]
 [2 2 2]
 [1 1 1]]
k:
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
dot:
 [[0 0 0]
 [2 2 2]
 [0 0 0]]
b:
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
tmp_result [[0 0 0]
 [2 2 2]
 [0 0 0]]
sum: 6
result: 6
(v, h):(0, 2)
f:
 [[1 1 1]
 [2 2 2]
 [1 1 1]]
k:
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
dot:
 [[0 0 0]
 [2 2 2]
 [0 0 0]]
b:
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
tmp_result [[0 0 0]
 [2 2 2]
 [0 0 0]]
sum: 6
result: 6
(v, h):(0, 3)
f:
 [[1 1 1]
 [2 2 2]
 [1 1 1]]
k:
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
dot:
 [[0 0 0]
 [2 2 2]
 [0 0 0]]
b:
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
tmp_result [[0 0 0]
 [2 2 2]
 [0 0 0]]
sum: 6
result: 6
(v, h):(0, 4)
f:
 [[1 1 1]
 [2 2 2]
 [1 1 1]]
k:
 [[0 0 0]
 [1 1 1]
 [0 0 0]]
dot:
 [[0 0 0]
 [2 2 2]
 [0 0 0]]
b:
 [[0 0 0]
 [0 0 0]
 [0 0 0]]
tmp_result [[0 0 0]
 [2 2 2]
 [0 0 0]]
sum: 6
result

In [80]:
new_feature

array([[ 6,  6,  6,  6,  6],
       [ 3,  3,  3,  3,  3],
       [ 9,  9,  9,  9,  9],
       [ 3,  3,  3,  3,  3],
       [12, 12, 12, 12, 12]])