The whole idea behind a "neural network" is to imitate how humans learn. So how do humans learn?

A neurons collects signals from input channels named "dendrites," processes information in its nucleus, and then generates an output in a long thin branch called an axon. Humans learn as they change the strength of the bonds between neurons. A neural network is an attempt to represent this all mathematically.

We can represent the biological nucleus of a neuron as a summation. We can think of the summation as the total of all of the inputs (training data) multiplied by "weights" -- essentially representing the strength of the bonds between neurons in a human brain. Simply stated, larger sums equate to stronger bonds. Traditional neural networks also include a bias term that can shift the summation value up or down. The summation is then filtered through what is called an activation function (common ones include sigmoid (binary classification), softmax (multi-class classification), relu, etc). This activation function is kind of the crux of deep learning; without it, the weights and bias would simply do a linear transformation (essentially just a linear regression model). The activation function allows the network to learn complex non-linear patterns. Understanding which activation function to use is a critical step in designing any neural network.

Imagine taking thousands of these "artifical neurons" connected together. This is a neural network in something called a "hidden layer." The hidden layer is connected to an output layer which tells the network what it should predict for a given set of inputs.

Let's talk a little bit more about the inputs. How could a neural netowrk take for example an image and generate a prediction for the type of article of clothing in the image? You cannot just take an image and feed it directly to a network in the same way you could hold up a picture to a human person. The pipeline from image to network input requires some legwork and goes something like this: take an image, say for example of a shoe. The shoe image undergoes a process called "convolution" (to be defined later), "pooling" (essentially downsampling), and "flattening" before it is ready to be fed into a CNN as an input. 

#### Feature Detector

Convolutions use a kernel matrix to scan a given image and apply a filter to obtain a certain effect

An image kernel is a matrix used to apply effects such as blurring and sharpening. They are used in ML for feature extraction to select the _most important_ pixels of an image. Convolutions preserve the spatial relationship between pixels.

feature maps: outputs of convolution (the process of running a feature detector on an actual image)

The feature map will be the same size as the feature detector, the feature map will be a new version of the image over which the feature detector was applied. The feature map might be a blurred version, a sharpened version, etc depending on the values in the feature detector used to multiply the pixel values in your image. 

#### Pooling (downsampling layer)

Helps avoid overfitting by reducing feature map dimensionality. This improves computational efficiency while preserving the features. Max pooling works by returning the maximum feature response within a given sample size in a feature map. (Min, avg pooling etc also exist). This allows us to move from say 40x40 to 20x20. We keep the prominent features but represent it in a much more condensed form.

#### Flattening

Converts a 2x2 for example into a vector that can be used as an input to a CNN!

#### Training the model

In [3]:
from sklearn.model_selection import train_test_split

In [None]:
X_train = training[:, 1:]/255
y_train = training[:, 0]

In [None]:
X_test = testing[:, 1:]/255
y_test = testing[:, 0]

In [None]:
X_train, X_validate, y_train, y_validate = train_test_split(X_train, y_train test_size = .2, random_state = 12345)

In [4]:
X_train = X_train.reshape(X_train.shape[0], *(28, 28, 1))
X_test = X_test.reshape(X_test.shape[0], *(28, 28, 1))
X_validate = X_validate.reshape(X_validate.shape[0], *(28,28,1))

NameError: name 'X_train' is not defined

In [9]:
import keras 
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.optimizers import Adam
from keras.callbacks import TensorBoard

In [24]:
cnn_model = Sequential()

In [25]:
cnn_model.add(Conv2D(32, 3, 3, input_shape = (28, 28, 1), activation = 'relu'))

  """Entry point for launching an IPython kernel.


In [26]:
cnn_model.add(MaxPooling2D(pool_size = (2,2)))

In [27]:
cnn_model.add(Flatten())

In [32]:
cnn_model.add(Dense(output_dim = 32, activation = 'relu'))

  """Entry point for launching an IPython kernel.


In [None]:
cnn_model.add(Dense(output_dim = 10, activation = 'softmax'))

In [31]:
cnn_model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam(lr = 0.001), metrics = ['accuracy'])

In [None]:
epochs = 50

In [None]:
cnn_model.fit(X_train, y_train,
             batch_size = 512,
             nb_epochs = epochs,
             verbose = 1, 
             validation_data = (X_validate, y_validate))

#### Evaluating the Model