# Advanced Optimization

Our current understanding of Gradient Descent isn't very performant. Let's fix this.

Addressing the learning rate, $\alpha$, we introduce a dynamic update (step size) which creates a smooth path to the center. Called the *Adam* algorithm.

Calling in `tensorflow`:

In [3]:
import keras

model = keras.Sequential([
    keras.layers.Dense(units=25, activation='sigmoid'),
    keras.layers.Dense(units=15, activation='sigmoid'),
    keras.layers.Dense(units=10, activation='linear')
])

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)

# model.fit(X,Y,epochs=100)

## Additional Layer Types

As supposed to the `Dense` layer type, there are other layers with different properties.

### Dense 

Defines the activation as a function of *all* the possible inputs

### Convolutional

Defines the activation as a function of a subset of the possible inputs.

#### Example - EKG

Since this is a time series signal, processing is better when looking at a window of time (repeating signal). Although it's good to look at multiple repititions of the signal in the following windows.

## Back-Propagation

Computes all the PDEs simulaetenously for the cost function via the computation graph and has a computation time of $N_{(nodes)} + P_{(params)}$ instead of the direct method which has $N_{(nodes)} x P_{(params)}$.