#### 21st June 2022 
* Have found a potential task and method, but struggling to get predictions in the right size
* Need to learn model dimensions, and why CNNs etc are not giving the right shape

### Notes on Hands-on, Ch10
#### Example network, FashionMNIST NN

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

What does this do?
* Takes 28x28 image
* Flatten produces 784 = 28*28 image
* Dense layer has 300 nodes, and each take 784 inputs plus a bias term, so 784*300 + 300 = 235,500 parameters
* Next layer takes output of previous 300 nodes, plus 100 bias terms, 100*300 + 100 = 30,100
* Last layer takes output of previous 100 nodes, plus 10 bias terms, 100*10 + 10 = 1,010
* Softmax uses to get which output has highest probability (?)

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

* Loss function for categories
* Optimizer is stochastic gradient descent
* Will optimise loss function, but also will report on accuracy

#### More complex models using functional API

In [None]:
input_ = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation="relu")(input_)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
concat = keras.layers.Concatenate()([input_, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.Model(inputs=[input_], outputs=[output])

* We create an input_ based on the shape of the training data
* We create the next layer as a function called hidden1, then we define hidden2 by passing hidden1 as an argument
* Using the previous layer as an object is why this is the functional approach
* Concat is passed as arguments both hidden2 and input_, so input_ can go directly to concat
* Only hidden1 and hidden2 have activation functions

In [None]:
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1e-3))

# Split the data into the first 5 features, and all the features after the second
X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]

# Some examples from the test set 
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

# Fit on the tuples A and B, with y as the dependent variable
history = model.fit((X_train_A, X_train_B), y_train, 
                    epochs=20, 
                    validation_data=((X_valid_A, X_valid_B), y_valid))

mse_test = model.evaluate((X_test_A, X_test_B), y_test)
y_pred = model.predict((X_new_A, X_new_B))

#### Hyperparameters

* Lower hidden layers model low-level structures, e.g. line segments, intermediate ones combine them, e.g. shapes, and high-level ones model structures
* More layers helps DNNs converge faster and helps them generalise to new datasets
* Neurons per layer - used to decrease in pyramid shape as you go down layers, but now kept fairly constant
* In general you do better to increase layers than neurons per layer 

### Notes on hands-on, ch11

* Vanishing/exploding gradients is because different layers can learn at different speeds, so all the updates don't get applied if a higher layer learns quickly
* 2015 paper suggests batch normalization as a way to deal with gradient problems, which zero-centres and scales all inputs then scales and shifts the result
* BN adds four parameters per input 

### Notes on hands-on, ch14

* Each filter in a CNN is a different convolutional filter, so filter=32 means 32 different filters are passed over
* An example of a filter is one that picks out horizontal or vertical lines


In [None]:
conv = keras.layers.Conv2D(filters=32, kernel_size=3, strides=1, padding="same", activation="relu")

* Pooling layers shrink the input image which reduces the computational load and the number of parameters
