**<h2>Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb</h2>**

![Deep Learning Algorithm Structure](https://showme.redstarplugin.com/d/d:MvQXLjhj)

Certainly! This code is implementing a basic CNN (Convolutional Neural Network) in NumPy. CNNs are especially useful for image-related tasks. The following parts are new compared to the previous version of the code and are relevant to the implementation of a CNN:

### Kernel Initialization

```python
kernel_rows = 3
kernel_cols = 3
num_kernels = 16
```

Here, you initialize the kernels (also known as filters). Each kernel is a 3x3 matrix, and there are 16 such kernels. These kernels are what will "slide over" the input image to produce feature maps.


```python
hidden_size = ((input_rows - kernel_rows) *  (input_cols - kernel_cols)) * num_kernels
```


hidden_size变量用于定义卷积操作后的后续层的维度。让我们分解一下为什么要这样计算：

卷积图像的空间维度：当您用3x3内核卷积28x28图像时，输出特征图的尺寸将为（28-3 +1）x（28-3 +1）= 26x26。这是一种计算卷积层输出尺寸的基本公式，不带填充和步幅：O =（W-K +1）x（H-K +1），其中W和H是输入层的宽度和高度，K是内核的大小。在您的情况下，宽度和高度都减小到26。

多个内核：该代码使用16个不同的内核（num_kernels设置为16）。当与输入图像卷积时，每个内核都会产生自己的26x26特征图。

展平输出：为了将这个卷积层连接到后续的全连接层，我们通常会展平输出。每个26x26输出将被展平为大小为676（26 x 26 = 676）的1D数组。

连接多个展平输出：由于有16个这样的展平输出（每个内核一个），因此总大小将为16 x 676 = 10816。

因此，hidden_size的计算方式为（（input_rows-kernel_rows）（input_cols-kernel_cols）） num_kernels =（26 * 26）* 16 = 10816。这将是通过完全连接层之前卷积并通过的第一层中每个特征向量的大小。

The `hidden_size` variable is used to define the dimensions of the subsequent layers after the convolutional operation. Let's break down why it is calculated the way it is:

1. **Spatial Dimensions of Convolved Image**: When you convolve a 28x28 image with a 3x3 kernel, the dimensions of the output feature map would be `(28 - 3 + 1) x (28 - 3 + 1)` = 26x26. This is a basic formula for calculating the dimensions of the output of a convolutional layer without padding and stride: `O = (W - K + 1) x (H - K + 1)`, where `W` and `H` are the width and height of the input layer, and `K` is the size of the kernel. In your case, both the width and the height are reduced to 26.

2. **Multiple Kernels**: The code uses 16 different kernels (`num_kernels` is set to 16). Each of these kernels will produce its own 26x26 feature map when convolved with the input image.

3. **Flattening the Output**: To connect this convolved layer to subsequent fully-connected layers, we usually flatten the output. Each 26x26 output would be flattened to a 1D array of size 676 (`26 x 26 = 676`).

4. **Concatenating Multiple Flattened Outputs**: Since there are 16 such flattened outputs (one for each kernel), the overall size would be `16 x 676 = 10816`.


Therefore, the `hidden_size` is calculated as `((input_rows - kernel_rows) * (input_cols - kernel_cols)) * num_kernels = (26 * 26) * 16 = 10816`. This will be the size of each feature vector in `layer_1` after the convolution and before being passed through the fully-connected layer.

```python
kernels = 0.02*np.random.random((kernel_rows*kernel_cols, num_kernels))-0.01
```

**<em>Q:why there is only 'weights_1_2' be used not 'weights_0_1' be used in this code?**</em>

In a typical neural network with fully connected layers, weights between each layer, often denoted as `weights_0_1`, `weights_1_2`, etc., are used to map from one layer to the next. However, in the provided code, convolutional operations have replaced the fully connected layer that would have otherwise existed between `layer_0` (input layer) and `layer_1` (hidden layer). This is a key feature of Convolutional Neural Networks (CNNs).

Here's how it works in the given code:

1. The input images (stored in `layer_0`) are not mapped to `layer_1` using a weight matrix `weights_0_1`. Instead, they are convolved with a set of kernels. These kernels effectively replace `weights_0_1`. 

2. After convolution and activation, the output (stored in `layer_1`) is then mapped to `layer_2` (output layer) using a weight matrix, which is `weights_1_2`.

So, in summary, `weights_0_1` is effectively replaced by the set of kernels used for the convolutional operation, and that's why only `weights_1_2` appears in the code.

-----------------------------------------------------------------------------------------------------------------

### Image Section Extraction

```python
def get_image_section(layer,row_from, row_to, col_from, col_to):
    section = layer[:,row_from:row_to,col_from:col_to]
    return section.reshape(-1,1,row_to-row_from, col_to-col_from)
```

This function extracts a section of the image to apply the kernel. It reshapes the extracted section so that it can be mu

Certainly! The code snippet is from the function `get_image_section`, which extracts a specific region from the 2D input array `layer`. This region is defined by the rows `[row_from:row_to]` and columns `[col_from:col_to]`. Let's break down each line:

### Line 1: Extracting the Region

```python
section = layer[:,row_from:row_to,col_from:col_to]
```

- `layer`: This is a 3D array where the first dimension usually represents the batch size, the second represents rows, and the third represents columns of the image.
  
- `[:, row_from:row_to, col_from:col_to]`: This is NumPy slicing syntax, which is extracting a section of the array `layer`.
  - `:` means to take all elements along the first dimension (usually the batch dimension in this case).
  - `row_from:row_to` means to take all rows from `row_from` to `row_to-1`.
  - `col_from:col_to` means to take all columns from `col_from` to `col_to-1`.

Certainly! The code snippet is from the function `get_image_section`, which extracts a specific region from the 2D input array `layer`. This region is defined by the rows `[row_from:row_to]` and columns `[col_from:col_to]`. Let's break down each line:

### Line 1: Extracting the Region

```python
section = layer[:,row_from:row_to,col_from:col_to]
```

- `layer`: This is a 3D array where the first dimension usually represents the batch size, the second represents rows, and the third represents columns of the image.
  
- `[:, row_from:row_to, col_from:col_to]`: This is NumPy slicing syntax, which is extracting a section of the array `layer`.
  - `:` means to take all elements along the first dimension (usually the batch dimension in this case).
  - `row_from:row_to` means to take all rows from `row_from` to `row_to-1`.
  - `col_from:col_to` means to take all columns from `col_from` to `col_to-1`.


### Line 2: Reshaping the Region

```python
return section.reshape(-1,1,row_to-row_from, col_to-col_from)
```

- `reshape(-1, 1, row_to-row_from, col_to-col_from)`: This reshapes the `section` array. Here's what each argument does:
  - `-1`: The size of this dimension is automatically calculated. This usually would be `batch_size * num_of_sections`, where `num_of_sections` is the number of distinct sections you're taking from each image in the batch.
  - `1`: Adds an extra dimension. This is useful for keeping the shape consistent with potential future operations, like concatenation.
  - `row_to-row_from`: The height of the section. It's the difference between `row_to` and `row_from`.
  - `col_to-col_from`: The width of the section. It's the difference between `col_to` and `col_from`.

This reshaped section is then returned by the function. The reshaping is necessary for future operations, such as dot products with the kernels. The shape after reshaping essentially stacks all the individual sections and makes it ready for subsequent matrix operations.
ltiplied by the kernel.

### The `-1` in `reshape`

When using `-1` as a dimension in `numpy.reshape`, it's essentially a "wildcard" that tells NumPy to automatically calculate the size of that dimension. The size is determined such that the total number of elements in the array remains unchanged.

For example, suppose you have an array of shape `(2, 3, 4)`; this array has `2 * 3 * 4 = 24` total elements. If you reshape it with dimensions `(2, -1, 2)`, NumPy will automatically calculate that the `-1` should be `6`, because `2 * 6 * 2 = 24`. So, the shape becomes `(2, 6, 2)`.

In your code, the `-1` allows the function to be more flexible with respect to the batch size and the number of sections you're taking from each image. It ensures that the reshaping works correctly, regardless of these sizes, as long as the total number of elements remains constant.

### The `1` in `reshape`

Adding an extra dimension with size `1` can be useful for several reasons:

1. **Broadcasting:** NumPy allows broadcasting of dimensions with size `1` when performing array operations. This can be useful if you want to perform element-wise operations between arrays that mostly have matching dimensions, except for one that is missing or has a size of `1`.

2. **Shape Compatibility:** Some machine learning libraries and functions require input arrays to have a specific number of dimensions. Adding a dimension with size `1` allows the array to meet this requirement without changing the overall semantics of the data.

3. **Concatenation and Stacking:** If you plan to concatenate or stack arrays along a new dimension, initializing that dimension with a size of `1` is useful.

4. **Clearer Semantics:** Sometimes an extra dimension is added to make it clear what each dimension represents. For example, a 3D array where the first dimension represents different samples, the second represents the height, and the third represents the width can be made 4D to indicate that there's only one "channel" (like grayscale in image data), even though this channel contains only a single slice.

In your code, the extra dimension helps mainly with shape compatibility and potential future concatenation operations.

Certainly! Adding an extra dimension with size `1` is often done in NumPy using the `reshape` method. Here's a simple example:

### Original Array

Let's say we have a 1D array with shape `(4,)` and it looks like this:

```python
import numpy as np

arr = np.array([1, 2, 3, 4])
print("Original array:", arr)
print("Shape:", arr.shape)
```

Output:

```
Original array: [1 2 3 4]
Shape: (4,)
```

### Adding an Extra Dimension

Now, we'll add an extra dimension with size `1`:

#### Row Vector

For making it a row vector:

```python
reshaped_arr = arr.reshape((1, 4))
print("Reshaped array:", reshaped_arr)
print("Shape:", reshaped_arr.shape)
```

Output:

```
Reshaped array: [[1 2 3 4]]
Shape: (1, 4)
```

#### Column Vector

For making it a column vector:

```python
reshaped_arr = arr.reshape((4, 1))
print("Reshaped array:", reshaped_arr)
print("Shape:", reshaped_arr.shape)
```

Output:

```
Reshaped array: [[1]
                 [2]
                 [3]
                 [4]]
Shape: (4, 1)
```

In both of these reshaped arrays, we added an extra dimension with size `1`. The reshaped array still represents the same data but in a form that may be more suitable for certain types of operations.

For example, this kind of reshaping is often useful when you're working with machine learning libraries that expect inputs to have a certain number of dimensions.

----------------------------------------------------------------------------------------------------------------------------------

```python
        batch_start, batch_end=((i * batch_size),((i+1)*batch_size))
```

This sets up the indices for batching the dataset. batch_start and batch_end determine the beginning and end of each batch.

```python
        layer_0 = images[batch_start:batch_end]
        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
```

layer_0 contains the batch of input images. They are reshaped to 28x28 pixels each.
In the reshaped version `layer_0.reshape(layer_0.shape[0], 28, 28)`, `layer_0` becomes a 3D array or tensor, not a 1D matrix. Each of the 1000 elements along the first dimension is a 2D matrix of shape `28x28`, representing an image. Here's the breakdown:

1. Initially, `layer_0` is a 2D array with shape `(1000, 784)`, representing 1000 images where each image is a 1D array of 784 pixels.
2. After the reshape operation, `layer_0` becomes a 3D array with shape `(1000, 28, 28)`. Now it represents 1000 images, where each image is a 2D array (or matrix) of shape `28x28`.

So, in the reshaped `layer_0`, each image has been reorganized from a 1D array of 784 pixels into a 2D array of `28x28` pixels, which can be easier to work with or visualize, especially if the images are meant to be 28 pixels in height and 28 pixels in width.

-------------------------------------------------------------------------------------------------------------------------------

### Feature Map Creation

```python
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):
    for col_start in range(layer_0.shape[2] - kernel_cols):
        sect = get_image_section(layer_0,
                                 row_start,
                                 row_start+kernel_rows,
                                 col_start,
                                 col_start+kernel_cols)
        sects.append(sect)
```

This loop iterates over the image, and at each iteration, it calls `get_image_section` to obtain a section of the image. These sections are what the kernel will be applied to.

The line `sects = list()` initializes an empty list and assigns it to the variable `sects`. This is preparation for populating this list later in the code.
This empty list will hold the sections of the image to which the kernel (convolution filter) will be applied.

In the context of the given code, `sects` is meant to hold "sections" of the input images that are processed through the convolutional kernels. Specifically, the code loops through different positions of each image in the mini-batch to extract subsections of the image, defined by the `kernel_rows` and `kernel_cols`. These subsections are then appended to the `sects` list.

Here's a simplified explanation:

- `sects` starts as an empty list.
- The code extracts subsections of the image and appends them to `sects`.
- Eventually, `sects` contains all the different portions of the input image that the convolutional kernel will scan through.

By initializing `sects` as an empty list, you ensure that it's ready to accept these subsections as they are computed.

In one iteration of 'i', sects will hold all patches that extracted from 128(batch size) images.

For a single 28x28 image and a 3x3 kernel, you can slide the kernel across the image in strides of 1 pixel both horizontally and vertically. 

The number of ways you can fit a 3x3 kernel into a 28x28 image can be calculated as follows:

1. Horizontally: You can start the kernel at columns 0 to 25, making it 26 possible starting positions.
2. Vertically: You can start the kernel at rows 0 to 25, making it 26 possible starting positions as well.

So for each image, you'll have 26 (horizontal positions) x 26 (vertical positions) = 676 patches.

Therefore, for a single 28x28 image with a 3x3 kernel, the list `sects` will hold 676 patches.

**<em>Q:why this for loop "for row_start in range(layer_0.shape[1]-kernel_rows):" start with 'layer_0.shape[1]' not 'layer_0.shape[0]'?</em>**

In the code, `layer_0` is reshaped as `[batch_size, 28, 28]`, where `batch_size` is the number of images in a mini-batch, and 28x28 is the dimension of each image.

When you look at `layer_0.shape[1]`, it refers to the number of rows in each image, which is 28 in this case. Similarly, `layer_0.shape[2]` would refer to the number of columns in each image, which is also 28.

The reason for using `layer_0.shape[1]` instead of `layer_0.shape[0]` is that the for-loop is iterating over the rows of each individual image, not over the batch size. It's moving the kernel over each 28x28 image, so it needs to start at row 0 and go up to row 25 (`28 - 3 = 25`) to cover all the positions where the 3x3 kernel can fit. This ensures that you can slide the 3x3 kernel over each row of each image in the batch.

------------------------------------------------------------------------------------------------------------------------------------------------------

### Convolution

```python
expanded_input = np.concatenate(sects,axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1],-1)

kernel_output = flattened_input.dot(kernels)
```

The `expanded_input` holds all the sections of the image, flattened and concatenated连接. The dot product with `kernels` computes the convolution operation, creating the feature map.

Certainly, let's break down these lines of code one by one:

1. `expanded_input = np.concatenate(sects,axis=1)`: 
   - `sects` is a list of patches extracted from the input images. Each patch has dimensions `[batch_size, 1, kernel_rows, kernel_cols]`.
   - The `np.concatenate` function concatenates连接 these patches along `axis=1`. After concatenation, you'll have a tensor where each "column" corresponds to a patch from the input images.
   
2. `es = expanded_input.shape`: 
   - This line stores the shape of `expanded_input` in `es`. Let's say, if you had 100 patches and your batch size is 128, `es` could look like `[128, 100, 3, 3]` (assuming 3x3 kernels).

3. `flattened_input = expanded_input.reshape(es[0]*es[1],-1)`:
   - The reshape operation is flattening the tensor along the first two dimensions (`batch_size` and number of patches).
   - `es[0]*es[1]` would give you `128 * 100 = 12800` if we follow the earlier example. The `-1` in reshape means that the remaining dimensions are calculated automatically. In this case, the last two dimensions are 3x3, so they would be flattened into a single dimension of size `3 * 3 = 9`.

4. `kernel_output = flattened_input.dot(kernels)`:
   - Here, you have `flattened_input`, which has a shape like `[12800, 9]`, being matrix-multiplied (dot product) with `kernels`, which has a shape like `[9, num_kernels]` (let's say `[9, 16]` for 16 kernels).
   - The output, `kernel_output`, will have a shape of `[12800, 16]`. Each row in `kernel_output` contains the activations produced by all 16 kernels for a specific patch in the input images.
   
Overall, these operations are aimed at preparing the input in a way that allows you to perform convolution operations effectively, applying multiple kernels over multiple patches and flattening them for easier calculations.

### Activation and Dropout

```python
layer_1 = tanh(kernel_output.reshape(es[0],-1))
dropout_mask = np.random.randint(2,size=layer_1.shape)
layer_1 *= dropout_mask * 2
```

Here, the feature map goes through a tanh activation function. Then a dropout is applied, which is a regularization technique.

### Backpropagation for CNN

```python
        layer_2_delta = (labels[batch_start:batch_end]-layer_2)\
                        / (batch_size * layer_2.shape[0])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * \
                        tanh2deriv(layer_1)
        layer_1_delta *= dropout_mask
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
        k_update = flattened_input.T.dot(l1d_reshape)
        kernels -= alpha * k_update
```

In the backpropagation step, the gradients are calculated with respect to each kernel. Then, the kernels are updated.

```python
layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size * layer_2.shape[0])
```
1. `layer_2_delta`: This is the error term for the output layer (`layer_2`). 
   - The error is computed as the difference between the true labels and the predicted values (`labels[batch_start:batch_end] - layer_2`).
   - The division by `(batch_size * layer_2.shape[0])` is a normalization term. It averages the error over the number of instances in the batch.

```python
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * tanh2deriv(layer_1)
```
2. `layer_1_delta`: This is the error term for the hidden layer (`layer_1`). 
   - It's calculated by taking the dot product of `layer_2_delta` and the transpose of `weights_1_2`. This backpropagates the error from the output layer to the hidden layer.
   - This is then element-wise multiplied by `tanh2deriv(layer_1)`, which is the derivative of the activation function (tanh in this case) applied at `layer_1`. This gives you the gradient of the loss with respect to the activations in `layer_1`.

```python
layer_1_delta *= dropout_mask
```
3. This line applies the dropout mask to `layer_1_delta`.
   - Dropout is used during training to randomly set some activations to zero, preventing overfitting. This line ensures that the same activations that were zeroed during the forward pass also have zero gradients during the backpropagation.

```python
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
```
4. This updates the weights between `layer_1` and `layer_2`.
   - `alpha` is the learning rate, controlling how much we want to update the weights.
   - `layer_1.T.dot(layer_2_delta)` calculates the gradient of the loss with respect to `weights_1_2`.

```python
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
```
5. The error term for `layer_1` (`layer_1_delta`) is reshaped back into the shape of `kernel_output`. This is done because you want to distribute this error back to each patch you originally extracted.

```python
k_update = flattened_input.T.dot(l1d_reshape)
```
6. `k_update`: This is the gradient for updating the kernels.
   - The dot product is taken between the transpose of `flattened_input` and `l1d_reshape` to compute how much each kernel contributed to the overall error.

```python
kernels -= alpha * k_update
```
7. The kernels are updated.
   - The gradients stored in `k_update` are used to update each kernel. The learning rate `alpha` controls the magnitude of this update.

Each line of code in the backpropagation section plays a crucial role in updating the model's weights and biases, with the ultimate aim of minimizing the loss function.


In [9]:
import numpy as np, sys
np.random.seed(1)

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images, labels = (x_train[0:1000].reshape(1000,28*28) / 255,
                  y_train[0:1000])


one_hot_labels = np.zeros((len(labels),10))
for i,l in enumerate(labels):
    one_hot_labels[i][l] = 1
labels = one_hot_labels

test_images = x_test.reshape(len(x_test),28*28) / 255
test_labels = np.zeros((len(y_test),10))
for i,l in enumerate(y_test):
    test_labels[i][l] = 1

def tanh(x):
    return np.tanh(x)

def tanh2deriv(output):
    return 1 - (output ** 2)

def softmax(x):
    temp = np.exp(x)
    return temp / np.sum(temp, axis=1, keepdims=True)

alpha, iterations = (2, 300)
pixels_per_image, num_labels = (784, 10)
batch_size = 128

input_rows = 28
input_cols = 28

kernel_rows = 3
kernel_cols = 3
num_kernel = 16

hidden_size = ((input_rows - kernel_rows) * (input_cols - kernel_cols)) * num_kernel

kernels = 0.02*np.random.random((kernel_rows * kernel_cols, num_kernel)) - 0.01

weights_1_2 = 0.2*np.random.random((hidden_size, num_labels)) - 0.1


def get_image_section(layer,row_from, row_to, col_from, col_to):
    section = layer[:,row_from:row_to,col_from:col_to]
    return section.reshape(-1,1,row_to-row_from, col_to-col_from)

for j in range(iterations):
    correct_cnt = 0
    for i in range(int(len(images)/batch_size)):
        batch_start, batch_end = ((i*batch_size),((i+1)*batch_size)) #Batching Initalizaiton
        #Reshaping Input Layer
        layer_0 = images[batch_start:batch_end]
        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
        
        #Convolution Initialization
        sects = list() #This empty list will hold the sections of the image to which the kernel (convolution filter) will be applied.
        #Convolution Process
        for row_start in range(layer_0.shape[1] - kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0, row_start, row_start + kernel_rows, col_start, col_start + kernel_cols)
                sects.append(sect)

        #Preparation for dot production
        expanded_input = np.concatenate(sects,axis=1) #连接所有patches
        es = expanded_input.shape #连接patches后的形状 (128,676,28,28)128batchSize 676pathes
        flattened_input = expanded_input.reshape(es[0]*es[1],-1) #沿着batchSize128和patches676两个维度展平张量积即这个参数为128*676 -1表示剩下参数自动计算 

        #Hidden Layer Activation and Dropout
        kernel_output = flattened_input.dot(kernels)#kernels是[9,16] 点乘的结果[128*676*9,16]
        layer_1 = tanh(kernel_output.reshape(es[0],-1)) #将patches结果按batchSize128展开 代入激活函数
        drop_mask = np.random.randint(2,size=layer_1.shape) 
        layer_1 *= drop_mask * 2 #The hidden layer (layer_1) is activated using a tanh function after the convolution. Dropout is applied for regularization
        layer_2 = softmax(layer_1.dot(weights_1_2)) #The output layer (layer_2) is activated using a softmax function to produce probabilities for each class label.

        #Accuracy Counting
        for k in range(batch_size):#counts the number of correct predictions within the batch.
            labelset = labels[batch_start+k:batch_start+k+1]
            _inc = int(np.argmax(layer_2[k:k+1]) == np.argmax(labelset))
            correct_cnt += _inc 

        #Backpropogation
        layer_2_delta = (labels[batch_start:batch_end] - layer_2) / (batch_size * layer_2.shape[0])
        #The division by (batch_size * layer_2.shape[0]) is a normalization term. It averages the error over the number of instances in the batch.
        layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
        layer_1_delta *= drop_mask
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        #kernels其实就是weights01 因为是把patches当layer1的输入 所以后面调整kernels也要使用patches同形的量
        l1d_reshape = layer_1_delta.reshape(kernel_output.shape) #把上面点乘加dropout后的量 转型为之前给layer1输入的量的形
        k_update = flattened_input.T.dot(l1d_reshape) #给之前layer1的输入量点乘上面变形的结果（需要变化的量）
        kernels -= alpha * k_update

    #test 流程和training相似 先调整输入数据形于kernels相适应后只forward train 
    test_correct_cnt = 0

    for i in range(len(test_images)):

        layer_0 = test_images[i:i+1]
#          layer_1 = tanh(np.dot(layer_0, weights_0_1))
        layer_0 = layer_0.reshape(layer_0.shape[0], 28,28)

        sects = list()
        for row_start in range(layer_0.shape[1]-kernel_rows):
            for col_start in range(layer_0.shape[2]-kernel_cols):
                sect = get_image_section(layer_0, row_start, row_start+kernel_rows, col_start, col_start+kernel_cols)
                sects.append(sect)
                
        expanded_input = np.concatenate(sects,axis=1)        
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0]*es[1],-1)

        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0],-1))
        layer_2 = np.dot(layer_1, weights_1_2)

        test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))

    if(j % 1 == 0):
        sys.stdout.write("\n" + "I:" + str(j) + " Test-Acc:" + \
                         str(test_correct_cnt/float(len(test_images))) + \
                         " Train-Acc:" + str(correct_cnt/float(len(images)))) 
        
        


I:0 Test-Acc:0.0288 Train-Acc:0.055
I:1 Test-Acc:0.0273 Train-Acc:0.037
I:2 Test-Acc:0.028 Train-Acc:0.037
I:3 Test-Acc:0.0292 Train-Acc:0.04
I:4 Test-Acc:0.0339 Train-Acc:0.046
I:5 Test-Acc:0.0478 Train-Acc:0.068
I:6 Test-Acc:0.076 Train-Acc:0.083
I:7 Test-Acc:0.1316 Train-Acc:0.096
I:8 Test-Acc:0.2137 Train-Acc:0.127
I:9 Test-Acc:0.2941 Train-Acc:0.148
I:10 Test-Acc:0.3563 Train-Acc:0.181
I:11 Test-Acc:0.4023 Train-Acc:0.209
I:12 Test-Acc:0.4358 Train-Acc:0.238
I:13 Test-Acc:0.4473 Train-Acc:0.286
I:14 Test-Acc:0.4389 Train-Acc:0.274
I:15 Test-Acc:0.3951 Train-Acc:0.257
I:16 Test-Acc:0.2222 Train-Acc:0.243
I:17 Test-Acc:0.0613 Train-Acc:0.112
I:18 Test-Acc:0.0266 Train-Acc:0.035
I:19 Test-Acc:0.0127 Train-Acc:0.026
I:20 Test-Acc:0.0133 Train-Acc:0.022
I:21 Test-Acc:0.0185 Train-Acc:0.038
I:22 Test-Acc:0.0363 Train-Acc:0.038
I:23 Test-Acc:0.0928 Train-Acc:0.067
I:24 Test-Acc:0.1994 Train-Acc:0.081
I:25 Test-Acc:0.3086 Train-Acc:0.154
I:26 Test-Acc:0.4276 Train-Acc:0.204
I:27 Test-Acc

In [8]:
import numpy as np, sys
np.random.seed(1)

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images, labels = (x_train[0:1000].reshape(1000,28*28) / 255,
                  y_train[0:1000])


one_hot_labels = np.zeros((len(labels),10))
for i,l in enumerate(labels):
    one_hot_labels[i][l] = 1
labels = one_hot_labels

test_images = x_test.reshape(len(x_test),28*28) / 255
test_labels = np.zeros((len(y_test),10))
for i,l in enumerate(y_test):
    test_labels[i][l] = 1

def tanh(x):
    return np.tanh(x)

def tanh2deriv(output):
    return 1 - (output ** 2)

def softmax(x):
    temp = np.exp(x)
    return temp / np.sum(temp, axis=1, keepdims=True)

alpha, iterations = (2, 300)
pixels_per_image, num_labels = (784, 10)
batch_size = 128

input_rows = 28
input_cols = 28

kernel_rows = 3
kernel_cols = 3
num_kernels = 16

hidden_size = ((input_rows - kernel_rows) * 
               (input_cols - kernel_cols)) * num_kernels

# weights_0_1 = 0.02*np.random.random((pixels_per_image,hidden_size))-0.01
kernels = 0.02*np.random.random((kernel_rows*kernel_cols,
                                 num_kernels))-0.01

weights_1_2 = 0.2*np.random.random((hidden_size,
                                    num_labels)) - 0.1



def get_image_section(layer,row_from, row_to, col_from, col_to):
    section = layer[:,row_from:row_to,col_from:col_to]
    return section.reshape(-1,1,row_to-row_from, col_to-col_from)

for j in range(iterations):
    correct_cnt = 0
    for i in range(int(len(images) / batch_size)):
        batch_start, batch_end=((i * batch_size),((i+1)*batch_size))
        layer_0 = images[batch_start:batch_end]
        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
        layer_0.shape

        sects = list()
        for row_start in range(layer_0.shape[1]-kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0,
                                         row_start,
                                         row_start+kernel_rows,
                                         col_start,
                                         col_start+kernel_cols)
                sects.append(sect)

        expanded_input = np.concatenate(sects,axis=1)
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0]*es[1],-1)

        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0],-1))
        dropout_mask = np.random.randint(2,size=layer_1.shape)
        layer_1 *= dropout_mask * 2
        layer_2 = softmax(np.dot(layer_1,weights_1_2))

        for k in range(batch_size):
            labelset = labels[batch_start+k:batch_start+k+1]
            _inc = int(np.argmax(layer_2[k:k+1]) == 
                               np.argmax(labelset))
            correct_cnt += _inc

        layer_2_delta = (labels[batch_start:batch_end]-layer_2)\
                        / (batch_size * layer_2.shape[0])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * \
                        tanh2deriv(layer_1)
        layer_1_delta *= dropout_mask
        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
        l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
        k_update = flattened_input.T.dot(l1d_reshape)
        kernels -= alpha * k_update
    
    test_correct_cnt = 0

    for i in range(len(test_images)):

        layer_0 = test_images[i:i+1]
#         layer_1 = tanh(np.dot(layer_0,weights_0_1))
        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)
        layer_0.shape

        sects = list()
        for row_start in range(layer_0.shape[1]-kernel_rows):
            for col_start in range(layer_0.shape[2] - kernel_cols):
                sect = get_image_section(layer_0,
                                         row_start,
                                         row_start+kernel_rows,
                                         col_start,
                                         col_start+kernel_cols)
                sects.append(sect)

        expanded_input = np.concatenate(sects,axis=1)
        es = expanded_input.shape
        flattened_input = expanded_input.reshape(es[0]*es[1],-1)

        kernel_output = flattened_input.dot(kernels)
        layer_1 = tanh(kernel_output.reshape(es[0],-1))
        layer_2 = np.dot(layer_1,weights_1_2)

        test_correct_cnt += int(np.argmax(layer_2) == 
                                np.argmax(test_labels[i:i+1]))
    if(j % 1 == 0):
        sys.stdout.write("\n"+ \
         "I:" + str(j) + \
         " Test-Acc:"+str(test_correct_cnt/float(len(test_images)))+\
         " Train-Acc:" + str(correct_cnt/float(len(images))))


I:0 Test-Acc:0.0288 Train-Acc:0.055
I:1 Test-Acc:0.0273 Train-Acc:0.037
I:2 Test-Acc:0.028 Train-Acc:0.037
I:3 Test-Acc:0.0292 Train-Acc:0.04
I:4 Test-Acc:0.0339 Train-Acc:0.046
I:5 Test-Acc:0.0478 Train-Acc:0.068
I:6 Test-Acc:0.076 Train-Acc:0.083
I:7 Test-Acc:0.1316 Train-Acc:0.096
I:8 Test-Acc:0.2137 Train-Acc:0.127
I:9 Test-Acc:0.2941 Train-Acc:0.148
I:10 Test-Acc:0.3563 Train-Acc:0.181
I:11 Test-Acc:0.4023 Train-Acc:0.209
I:12 Test-Acc:0.4358 Train-Acc:0.238
I:13 Test-Acc:0.4473 Train-Acc:0.286
I:14 Test-Acc:0.4389 Train-Acc:0.274
I:15 Test-Acc:0.3951 Train-Acc:0.257
I:16 Test-Acc:0.2222 Train-Acc:0.243
I:17 Test-Acc:0.0613 Train-Acc:0.112
I:18 Test-Acc:0.0266 Train-Acc:0.035
I:19 Test-Acc:0.0127 Train-Acc:0.026
I:20 Test-Acc:0.0133 Train-Acc:0.022
I:21 Test-Acc:0.0185 Train-Acc:0.038
I:22 Test-Acc:0.0363 Train-Acc:0.038
I:23 Test-Acc:0.0928 Train-Acc:0.067
I:24 Test-Acc:0.1994 Train-Acc:0.081
I:25 Test-Acc:0.3086 Train-Acc:0.154
I:26 Test-Acc:0.4276 Train-Acc:0.204
I:27 Test-Acc