# Difference between Pytorch and Tensorflow

https://towardsdatascience.com/pytorch-vs-tensorflow-in-code-ada936fd5406

http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture07.pdf

http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture08.pdf

## Preprocessing

In [None]:
import io
import requests
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

url = 'https://raw.githubusercontent.com/TimS-ml/DataMining/master/z_Other/tweets.csv'

f = requests.get(url).content
df = pd.read_csv(io.StringIO(f.decode('utf-8')))
df = df.iloc[:, 1:]
df.columns = ['sentiments', 'tweets']

# df.shape  # (31962, 2)
df.head()

In [None]:
# instantiate and fit tokenizer
tokenizer = Tokenizer(num_words=20000, oov_token='<00v>')
tokenizer.fit_on_texts(df.tweets)

# transform tweets into sequences of integers
sequences = tokenizer.texts_to_sequences(df.tweets)

# pad sequences so that they have uniform lenth
padded = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=42)
assert(padded.shape==(31962, 42))

seq = padded
labels = np.array(df.sentiments)

# Pytorch

There are two ways to build a neural network model in PyTorch.



## Two ways of building NN in PT

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

### [1] Model Subclassing
Similar to TensorFlow, in PyTorch you subclass the `nn.Model` module and define your layers in the `__init__()` method. 

The only difference is that you create the `forward` pass in a method named forward *instead of `call`*.

Difference to the Keras model: <u>There’s only an average-pooling layer in PyTorch so it needs to have the right kernel size in order the make it global average-pooling.</u>

In [None]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=20000,
                                            embedding_dim=50)
        self.pooling_layer = nn.AvgPool1d(kernel_size=50)
        self.fc_layer = nn.Linear(in_features=42, out_features=1)
    
    def forward(self, inputs):
        x = self.embedding_layer(inputs)
        x = self.pooling_layer(x).view(32, 42)
        return torch.sigmoid(self.fc_layer(x))
    
model = Model()

### [2] Sequential
PyTorch also offers a `Sequential` module that looks almost equivalent to TensorFlow’s.

Many layers do not work with PyTorch’s `nn.Sequential`

In [None]:
# PyTorch nn.Sequential
model = nn.Sequential(
    nn.Embedding(num_embeddings=20000, embedding_dim=50),
    nn.AvgPool1d(kernel_size=50),
    nn.Flatten(start_dim=1),
    nn.Linear(in_features=42, out_features=1),
    nn.Sigmoid()
)

## Training a NN in PT

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

Training loop needs to be implemented from scratch

In oder to process the data in batches, a dataloader must be created. The dataloader returns one batch at a time in a dictionary format.

Short description of the training loop: 
- For each batch, we calculate the loss and then call loss.backward() to backpropagate the gradient through the layers. 
- In addition, we call optimizer.step() to tell the optimizer to update the parameters. 


In [None]:
# define the loss fn and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# initialize empty list to track batch losses
batch_losses = []

# train the neural network for 5 epochs
for epoch in range(5):
    # reset iterator
    dataiter = iter(dataloader)
    
    for batch in dataiter:
        # reset gradients
        optimizer.zero_grad()
        
        # forward propagation through the network
        out = model(batch["tweets"])
        
        # calculate the loss
        loss = criterion(out, batch["sentiments"])
        
        # track batch loss
        batch_losses.append(loss.item())
        
        # backpropagation
        loss.backward()
        
        # update the parameters
        optimizer.step()

# Tensorflow

TensorFlow is a lot like Scikit-Learn thanks to its `fit` function, which makes training a model super easy and quick.

There are three ways to build a neural network model in Keras.

## Three ways of building NN in TF

In [None]:
import tensorflow as tf


### [1] Model subclassing

You can create your own fully-customizable models by subclassing the `tf.keras.Model` class and implementing the forward pass in the `call` method. 

Put differently, layers are defined in the __init__() method and the logic of the forward pass in the call method.


In [None]:
class Model(tf.keras.Model):
    def __init__(self):
        super(Model, self).__init__()
        self.embedding_layer = tf.keras.layers.Embedding(input_dim=20000,
                                                         output_dimension=50,
                                                         input_length=42,
                                                         mask_zero=True)
        self.flatten_layer = tf.keras.layers.Flatten()
        self.fc1_layer =  tf.keras.layers.Dense(128, activation='relu')
        self.fc2_layer =  tf.keras.layers.Dense(1, activation='sigmoid')
        
    def call(self, inputs):
        x = self.embedding_layer(inputs)
        x = self.flatten_layer(x)
        x = self.fc1_layer(x)
        return self.fc2_layer(x)
        
model = Model()


### [2] Functional API
Given some input tensor(s) and output tensor(s), you can also instantiate 实例化 a `Model`. 

With this approach, you essentially define a layer and immediately pass it the input of the previous layer. 


In [None]:
inputs = tf.keras.layers.Input(shape=(42,))
x = tf.keras.layers.Embedding(input_dim=20000,
                              output_dimension=50,
                              input_length=42,
                              mask_zero=True)(inputs)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

### [3] Sequential model API
Typically consisting of just a few common layers — kind of a shortcut to a trainable model. 

Too inflexible if you wish to implement more sophisticated ideas.



In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=20000,
                              output_dimension=50,
                              input_length=42,
                              mask_zero=True),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')])

## Two useful functions of TF

First, calling `model.summary`() prints a compact summary of the model and the number of parameters

Second, by calling `tf.keras.utils.plot_model()` you get a graphical summary of the model.

## Training a NN in Keras

Before you can train a Keras model, it must be compiled by running the `model.compile()` function, which is also where you specify the loss function and optimizer.

```python
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
```

Keras models have a convenient `model.fit()` function for training a model (just like Scikit-Learn), which also takes care of batch processing and even evaluates the model on the run (if you tell it to do so).

```python
model.fit(x=X, y, batch_size=32, epochs=5, verbose=2, validation_split=0.2)
```

## Keras Tutorial

This short introduction uses [Keras](https://www.tensorflow.org/guide/keras/overview) to:

1. Build a neural network that classifies images.
2. Train this neural network.
3. And, finally, evaluate the accuracy of the model.

In [None]:
import tensorflow as tf

Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). Convert the samples from integers to floating-point numbers:

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Build the `tf.keras.Sequential` model by stacking layers. Choose an optimizer and loss function for training:

tf.keras.models.Sequential

https://www.tensorflow.org/guide/keras/sequential_model

https://www.tensorflow.org/api_docs/python/tf/keras/Sequential


tf.keras.layers.Flatten

Flattens the input. Does not affect the batch size.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense


tf.keras.layers.Dense

Just your regular densely-connected NN layer.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense

Dense implements the operation: 
- output = activation(dot(input, kernel) + bias) 
- where activation is the element-wise activation function passed as the activation argument,
- kernel is a weights matrix created by the layer, 
- bias is a bias vector created by the layer (only applicable if use_bias is True).

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "[logits](https://developers.google.com/machine-learning/glossary#logits)" or "[log-odds](https://developers.google.com/machine-learning/glossary#log-odds)" scores, one for each class.

In [None]:
predictions = model(x_train[:1]).numpy()
predictions

The `tf.nn.softmax` function converts these logits to "probabilities" for each class: 

In [None]:
tf.nn.softmax(predictions).numpy()

Note: It is possible to bake this `tf.nn.softmax` in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to
provide an exact and numerically stable loss calculation for all models when using a softmax output. 

The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class:
It is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.log(1/10) ~= 2.3`.

In [None]:
loss_fn(y_train[:1], predictions).numpy()

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

The `Model.fit` method adjusts the model parameters to minimize the loss: 

In [None]:
model.fit(x_train, y_train, epochs=5)

The `Model.evaluate` method checks the models performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

In [None]:
model.evaluate(x_test,  y_test, verbose=2)

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:

In [None]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

In [None]:
probability_model(x_test[:5])