In [1]:
import tensorflow as tf

# Loss functions
* In principle losses can be defined manually using tensor datatypes. 
* However as it is so common to use losses in order to train a model via optimization (reducing loss), tensor flow provides the common loss functions
* [`tf.losses`](https://www.tensorflow.org/api_docs/python/tf/losses)

# Optimizers
* Tensor flow strongly follows the neural nets paradigm
- Provide training data (`tf.data` or `tf.placeholder`)
- Set up model (`tf.layers`)
- Set up loss (`tf.losses`)
- Find model parameters by optimizing the loss evaluated on training data

* What's the "output" of an optimizer? 
- a single step
- iterations are controlled by `batch size` and `epochs` in `Keras`
- iterations need to be controlled via `repeat` _within Datasets_ for `Estimators`
- How to track optimization History for `Estimators`?

* So the last ingredient are optimizers!
- `tf.train`
- The simplest optimizer one can imagine is gradient descent: `tf.train.GradientDescentOptimizer`
- They have to be instantiated and thrown on a loss function (that in turn depends on input data).

~~~~(.python)
loss = ??? 
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
~~~~



# Keras
* **3 Ways of constructing a model:** 
    - sequenital API 
    - functional API 
    - model subclassing APIs
* **Fitting:**
    - `fit()` method (native keras)
    - `gradientTape` (tensorflow addon)
    
For all of the following properties Keras offers a rich familiy of built in implementations but also allows to write costum ones:    
* Layers
* Weight intializtion
* Optimizers
* Regularizers
* Activation functions
* Loss functions (= objective function for optimizers)
* Metrics (for judging model performance)


### Transfer learning

### Regularization.
At a layer the following structure $a(Wx + b)$, where $W$ are called weights and $b$ bias and $a$ ist the activation function. According to these 3 ingredients there are three types of [regularizations in Keras](https://keras.io/regularizers/):
* Kernel regularizer: act on $W$
* Bias regularizer: act on $b$
* activation regularizer: act on (output of) $a$

*Questions:*
* How are these 3 regularizers combined


### Keras Backend
* Keras ist compatible with [various backends](https://keras.io/backend/) (tensorflow, theano, CNTK)
* Mathematical calls via the backend ensure consistency. 
* This is mainly used for custom regularization functions or loss functions
```
import Keras.backend as K
K.sum(x)
```
instead of the tensorflow specific call 
```
tf.math.reduce_sum(x)
```

### Costum Layers
* [Dokumentation](https://keras.io/layers/writing-your-own-keras-layers/)

### Costum Regulariziations
* [Dokumentation](https://keras.io/regularizers/)
* Each of the 3 regularizers can be implemented manually and then passed to keras
* How is a regularizer implemented that combines e.g. weights and biases in a non-additive fashion?
* Signature for weight regularization
`def regularizer(weight_matrix) -> float:`
* Example

```
from keras import backend as K

def l1_reg(weight_matrix):
    return 0.01 * K.sum(K.abs(weight_matrix))

model.add(Dense(64, input_dim=64,
                kernel_regularizer=l1_reg))
```
And similar for bias and activation regularization functions. 

**Question:**
How to write regularization functions that take e.g. both weights and biases as input and operate in non-addidtive fashion.


### Custom Loss functions
* Apart from the [available loss](https://keras.io/losses/) functions in Keras it is possible to construct costum ones
* The have to obey the signature `def regularizer(y_true, y_pred) -> float:`
* So it takes the true and the predicted labels as input.

**Question:**
How to write loss functions that also depend on the layer weights?


