# Tensorflow 2.0 

Google Colaboratory link: 
https://colab.research.google.com/drive/1MTakxD_oF6gwwhrcNs03LA1eDjYiSSL5

In [None]:
!pip install tensorflow-gpu==2.0.0a0

In [None]:
import tensorflow as tf

import numpy as np

from sklearn import datasets
import tensorflow_datasets as tfds

We'll now be exploring a brand-new tensorflow 2.0, which still in testing stage. 

Tensorflow 2.0 has gone through a lot of changes compared to 1.\*, most notable of which is **eager execution** paradigm.


Excitement!

As in PyTorch, we'll start with the Iris dataset.

In [None]:
iris = datasets.load_iris()
X_train = iris['data']
y_train = iris['target']
# We'll train on the whole dataset - don't ever do that - but for ilustrating behaviour it's good enough!

In [None]:
# of course, you can also define your own operations - tensorflow's syntax is in many ways similar to numpy's 
def relu(activation):
    return activation * tf.cast((activation > 0), dtype=tf.float32)

In [None]:
D_in, H, D_out = 4, 10, 3


W1 = tf.Variable(tf.random.uniform((D_in, H)))
W2 = tf.Variable(tf.random.uniform((H, D_out)))

  
def loss(y, y_pred):
  labels = tf.one_hot(y, D_out)
  return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=y_pred))

def train_step(X, y, lr, W1, W2):
  with tf.GradientTape() as t: 
    # gradient tape is needed by eager mode to keep track of what operations to calculate gradients of 
    y_pred = relu(X @ W1) @ W2
    current_loss = loss(y, y_pred)
  dW1, dW2 = t.gradient(current_loss, [W1, W2])
  W1.assign_sub(lr * dW1)
  W2.assign_sub(lr * dW2)
  return current_loss


In [None]:
for epoch in range(10):
  current_loss = train_step(X_train, y_train, 0.1, W1, W2)
  print(current_loss)
  


If this looks intuitive, it's because *it is*. 

In TF 1.0 eager execution was not the default way to go. Instead, you would have had to construct a *computation graph* of $W_1$ and $W_2$ and *placeholders* to which inputs to the computation graph would be put.

This allowed for building of *static computation graphs*, which, after constructing could have been better-optimized by TF's insides, but not changed afterwards.

**Benefits of static graphs**
* define once, run many times
* when the graph is defined, framework can optimize it
* we know how everything is connected, so gradients are easy to compute
* a once-defined graph can be optimized by TF's internals
* more functional approach (if you like functional programming. It's awesome, really.)
* graph can be serialized and run without the code that defined it

**Flaws of static graphs**
* code for defining the structure of the graph and for running computations through the graph is in different places, which may be confusing
* once-defined graphs cannot be changed in any way
* if-statements and for-loops are not supported and there are ugly workarounds to enable similiar stuff
  * theoretically you could (and probably should) aim to replace them with mapping/reducing/filtering, but ML is a highly iterative process, so good luck with that :(
  
**Dynamic graphs**
* constructed from scratch with each usage
* you need the code that defined the graph
* the execution of graph is much more like ordinary code (loops, conditional statements)
  * useful e.g. in RNNs, where the output of the graph is input back into it




## As for CIFAR-10

Let's now try to train a network on a more serious dataset!

Thanks to all-new **Tensorflow Datasets API** we can download the dataset we want and build a processing pipeline for our data.

In [None]:
ds_train, ds_test = tfds.load(name="cifar10", split=["train", "test"])


In [None]:
def pipeline(ds):
  return ds.repeat().shuffle(1024).batch(128).map(lambda batch: (batch["image"] / 255, batch["label"]))

In [None]:
cifar_train = pipeline(ds_train)
cifar_test = pipeline(ds_test)


As opposed to PyTorch, TF 1.0 didn't povide one single way to minimize the amount of written code when creating the model.

TF 2.0 changed that, with tf.keras (https://keras.io/) becoming the first-class citizen. The tf.Estimator API has also bben kept, mostly because guys at Google are too lazy to refactor all of their code :P

Keras lets you define and train models much more simply. 

In [None]:
import tensorflow.keras as k

Firts, let's define our own layer:

In [None]:
def conv_block(out_channels: int):
  return k.models.Sequential([
      k.layers.Conv2D(out_channels, 3, padding='same', activation='relu'),
      k.layers.BatchNormalization(),
      k.layers.Dropout(0.3)
  ])

In [None]:
model = k.models.Sequential([
    k.layers.Conv2D(32, 3, activation='relu', padding='same', input_shape=(32, 32, 3)),
    conv_block(32),
    conv_block(32),
    k.layers.Conv2D(32, 3, padding='same', strides=2),
    conv_block(64),
    conv_block(64),
    conv_block(64),
    
    k.layers.Conv2D(64, 3, padding='same', strides=2),
    conv_block(128),
    conv_block(128),
    conv_block(128),
    
    k.layers.Conv2D(128, 3, padding='same', strides=2),
    conv_block(256),
    conv_block(256),

    k.layers.Flatten(),

    k.layers.Dense(10, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy', 
              optimizer=k.optimizers.Adam(6e-4), 
              metrics=['accuracy']) 

Let's train!

In [None]:
model.fit_generator(cifar_train, steps_per_epoch=400, epochs=10, validation_data=cifar_test, validation_steps=10)