<a href="https://colab.research.google.com/github/BStricks/NLP_practice/blob/master/tensorflow_2_0_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip uninstall tensorflow
!pip install tensorflow==2.0.0

Uninstalling tensorflow-1.15.0:
  Would remove:
    /usr/local/bin/estimator_ckpt_converter
    /usr/local/bin/freeze_graph
    /usr/local/bin/saved_model_cli
    /usr/local/bin/tensorboard
    /usr/local/bin/tf_upgrade_v2
    /usr/local/bin/tflite_convert
    /usr/local/bin/toco
    /usr/local/bin/toco_from_protos
    /usr/local/lib/python3.6/dist-packages/tensorflow-1.15.0.dist-info/*
    /usr/local/lib/python3.6/dist-packages/tensorflow/*
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/*
Proceed (y/n)? y
  Successfully uninstalled tensorflow-1.15.0
Collecting tensorflow==2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/46/0f/7bd55361168bb32796b360ad15a25de6966c9c1beb58a8e30c01c8279862/tensorflow-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl (86.3MB)
[K     |████████████████████████████████| 86.3MB 118kB/s 
Collecting tensorflow-estimator<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93

In [7]:
import tensorflow
print(tensorflow.__version__)

2.0.0


## **Step 1: define the model**

In [0]:
model = Sequential()

## **Step 2: compile the model**

In [0]:
"""
Select an algorithm to perform the optimisation procedure:

Adadelta: a stochastic gradient descent method that is based on adaptive 
learning rate per dimension to address two drawbacks: 1) the continual decay of 
learning rates throughout training 2) the need for a manually selected global 
learning rate
Adagrad: Adagrad is an optimizer with parameter-specific learning rates, 
which are adapted relative to how frequently a parameter gets updated during 
training. The more updates a parameter receives, the smaller the updates.
Adam: Adam optimization is a stochastic gradient descent method that is based 
on adaptive estimation of first-order and second-order moments. 
Adamax: It is a variant of Adam based on the infinity norm. Default parameters 
follow those provided in the paper. Adamax is sometimes superior to adam, 
specially in models with embeddings.
Ftrl: 
Nadam: Much like Adam is essentially RMSprop with momentum, Nadam is Adam with 
Nesterov momentum.
Optimizer: 
RMSprop: 
SGD: Stochastic gradient descent and momentum optimizer.

Select a loss function to use with the optimizer:

BinaryCrossentropy: Computes the cross-entropy loss between true labels and predicted labels.
CategoricalCrossentropy: Computes the crossentropy* loss between the labels and predictions.
CategoricalHinge: Computes the categorical hinge* loss between y_true and y_pred.
CosineSimilarity: Computes the cosine similarity between y_true and y_pred.
Hinge*:
Huber: Computes the Huber loss between y_true and y_pred.
KLDivergence: Computes Kullback-Leibler divergence loss between y_true and y_pred
LogCosh: Computes the logarithm of the hyperbolic cosine of the prediction error.
Loss: Loss base class.
MeanAbsoluteError: Computes the mean of absolute difference between labels and predictions.
MeanAbsolutePercentageError: Computes the mean absolute percentage error between y_true and y_pred.
MeanSquaredError: Computes the mean of squares of errors between labels and predictions.
MeanSquaredLogarithmicError: Computes the mean squared logarithmic error between y_true and y_pred.
Poisson: Computes the Poisson loss between y_true and y_pred.
Reduction: Types of loss reduction.
SparseCategoricalCrossentropy*: Computes the crossentropy loss between the labels and predictions.
SquaredHinge*: Computes the squared hinge loss between y_true and y_pred.
*The main difference between the hinge loss and the cross entropy loss is that the former 
arises from trying to maximize the margin between our decision boundary and data points - 
thus attempting to ensure that each point is correctly and confidently classified, 
while the latter comes from a maximum likelihood estimate of our model’s parameters. 
The softmax function, whose scores are used by the cross entropy loss, allows us to 
interpret our model’s scores as relative probabilities against each other.

Select a metric to monitor the algorithm with during training:

AUC: Computes the approximate AUC (Area under the curve) via a Riemann sum.
Accuracy: Calculates how often predictions matches labels.
BinaryAccuracy: Calculates how often predictions matches labels.
BinaryCrossentropy: Computes the crossentropy metric between the labels and predictions.
CategoricalAccuracy: Calculates how often predictions matches labels.
CategoricalCrossentropy: Computes the crossentropy metric between the labels and predictions.
CategoricalHinge: Computes the categorical hinge metric between y_true and y_pred.
CosineSimilarity: Computes the cosine similarity between the labels and predictions.
FalseNegatives: Calculates the number of false negatives.
FalsePositives: Calculates the number of false positives.
Hinge: Computes the hinge metric between y_true and y_pred.
KLDivergence: Computes Kullback-Leibler divergence metric between y_true and y_pred.
LogCoshError: Computes the logarithm of the hyperbolic cosine of the prediction error.
Mean: Computes the (weighted) mean of the given values.
MeanAbsoluteError: Computes the mean absolute error between the labels and predictions.
MeanAbsolutePercentageError: Computes the mean absolute percentage error between y_true and y_pred.
MeanIoU: Computes the mean Intersection-Over-Union metric.
MeanRelativeError: Computes the mean relative error by normalizing with the given values.
MeanSquaredError: Computes the mean squared error between y_true and y_pred.
MeanSquaredLogarithmicError: Computes the mean squared logarithmic error between y_true and y_pred.
MeanTensor: Computes the element-wise (weighted) mean of the given tensors.
Metric: Encapsulates metric logic and state.
Poisson: Computes the Poisson metric between y_true and y_pred.
Precision: Computes the precision of the predictions with respect to the labels.
Recall: Computes the recall of the predictions with respect to the labels.
RootMeanSquaredError: Computes root mean squared error metric between y_true and y_pred.
SensitivityAtSpecificity: Computes the sensitivity at a given specificity.
SparseCategoricalAccuracy: Calculates how often predictions matches integer labels.
SparseCategoricalCrossentropy: Computes the crossentropy metric between the labels and predictions.
SparseTopKCategoricalAccuracy: Computes how often integer targets are in the top K predictions.
SpecificityAtSensitivity: Computes the specificity at a given sensitivity.
SquaredHinge: Computes the squared hinge metric between y_true and y_pred.
Sum: Computes the (weighted) sum of the given values.
TopKCategoricalAccuracy: Computes how often targets are in the top K predictions.
TrueNegatives: Calculates the number of true negatives.
TruePositives: Calculates the number of true positives.
"""
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='binary_crossentropy',metrics=['accuracy'])

## **Step 3: fit the model**

In [0]:
"""
Neural networks are trained using gradient descent where the estimate of the error 
used to update the weights is calculated based on a subset of the training dataset.

The number of examples from the training dataset used in the estimate of the error 
gradient is called the batch size and is an important hyperparameter that influences 
the dynamics of the learning algorithm.

1) Batch size controls the accuracy of the estimate of the error gradient when training neural networks.
2) Batch, Stochastic, and Minibatch gradient descent are the three main flavors of the learning algorithm.
3) There is a tension between batch size and the speed and stability of the learning process.
"""
model.fit(X, y, epochs=100, batch_size=32, verbose=2)

## **Step 4: evaluate the model**

In [0]:
"""
This should be data not used in the training process so that we can get an 
unbiased estimate of the performance of the model when making predictions 
on new data.
"""
#where x and y are a hold out set
loss = model.evaluate(X, y, verbose=0)

## **Method 1 example: sequential model building API**

In [0]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# define the model
model = Sequential()
model.add(Dense(100, input_shape=(8,)))
model.add(Dense(80))
model.add(Dense(30))
model.add(Dense(10))
model.add(Dense(5))
model.add(Dense(1))

## **Method 2 example: functional model building API**

In [0]:
from tensorflow.keras import Model
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
# define the layers
x_in = Input(shape=(8,))
x = Dense(10)(x_in)
x_out = Dense(1)(x)
# define the model
model = Model(inputs=x_in, outputs=x_out)

# **Deep Learning model examples: MLP**

In [0]:
#This dataset involves predicting whether a structure is in the atmosphere given radar returns.
"""
MLP - a standard fully connected neural network model.

It is comprised of layers of nodes where each node is connected to all outputs 
from the previous layer and the output of each node is connected to all inputs for 
nodes in the next layer.

An MLP is created by with one or more Dense layers. This model is appropriate for 
tabular data, that is data as it looks in a table or spreadsheet with one column 
for each variable and one row for each variable. There are three predictive modeling 
problems you may want to explore with an MLP; they are binary classification, 
multiclass classification, and regression.
"""
# mlp for binary classification
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from sklearn.metrics import classification_report

In [3]:
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
df = read_csv(path, header=None)
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# determine the number of input features
n_features = X_train.shape[1]

(235, 34) (116, 34) (235,) (116,)


In [0]:
# define model
"""
activation function - 
kernel initializer - 
"""
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1, activation='sigmoid'))

In [0]:
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [0]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

In [16]:
# evaluate the model
y_predictions = model.predict_classes(X_test)
print(classification_report(y_test,y_predictions))

              precision    recall  f1-score   support

           0       0.97      0.80      0.88        41
           1       0.90      0.99      0.94        75

    accuracy                           0.92       116
   macro avg       0.94      0.90      0.91       116
weighted avg       0.93      0.92      0.92       116



In [17]:
# load the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
df = read_csv(path, header=None)
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# determine the number of input features
n_features = X_train.shape[1]

(100, 4) (50, 4) (100,) (50,)


In [0]:
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(3, activation='softmax'))

In [0]:
# compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [0]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

In [21]:
# evaluate the model
y_predictions = model.predict_classes(X_test)
print(classification_report(y_test,y_predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       0.95      0.95      0.95        19
           2       0.95      0.95      0.95        19

    accuracy                           0.96        50
   macro avg       0.96      0.96      0.96        50
weighted avg       0.96      0.96      0.96        50



In [22]:
# mlp for regression
from numpy import sqrt
# load the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
df = read_csv(path, header=None)
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# determine the number of input features
n_features = X_train.shape[1]

(339, 13) (167, 13) (339,) (167,)


In [0]:
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense(8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1))

In [0]:
# compile the model
model.compile(optimizer='adam', loss='mse')

In [0]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

In [27]:
# evaluate the model
error = model.evaluate(X_test, y_test, verbose=0)
print('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error)))

MSE: 36.022, RMSE: 6.002


# **Deep Learning model examples: CNN**

In [2]:
"""
Convolutional Neural Networks - a type of network designed for image input, 
they are comprised of convolutional layers that extract features (called feature maps) 
and pooling layers that distill features down to the most salient elements.
"""
# example of a cnn for image classification
from numpy import unique
from numpy import argmax
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
# load dataset
(x_train, y_train), (x_test, y_test) = load_data()
# reshape data to have a single channel
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2], 1))
# determine the shape of the input images
in_shape = x_train.shape[1:]
# determine the number of classes
n_classes = len(unique(y_train))
print(in_shape, n_classes)
# normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
(28, 28, 1) 10


In [0]:
# define model
model = Sequential()
model.add(Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

In [0]:
# define loss and optimizer
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [5]:
# fit the model
model.fit(x_train, y_train, epochs=10, batch_size=128, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f521f81f438>

In [6]:
# evaluate the model
loss, acc = model.evaluate(x_test, y_test, verbose=0)
print('Accuracy: %.3f' % acc)

Accuracy: 0.987


# **Deep Learning model examples: RNN**

In [8]:
"""
Recurrent Neural Networks, or RNNs for short, are designed to operate upon sequences of data.
"""

# lstm for time series forecasting
from numpy import sqrt
from numpy import asarray
from pandas import read_csv
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
 
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the sequence
		if end_ix > len(sequence)-1:
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return asarray(X), asarray(y)
 
# load the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv'
df = read_csv(path, header=0, index_col=0, squeeze=True)
# retrieve the values
values = df.values.astype('float32')
# specify the window size
n_steps = 5
# split into samples
X, y = split_sequence(values, n_steps)
# reshape into [samples, timesteps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
# split into train/test
n_test = 12
X_train, X_test, y_train, y_test = X[:-n_test], X[-n_test:], y[:-n_test], y[-n_test:]
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(91, 5, 1) (12, 5, 1) (91,) (12,)


In [0]:
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', kernel_initializer='he_normal', input_shape=(n_steps,1)))
model.add(Dense(50, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(50, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(1))

In [0]:
# compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

In [0]:
# fit the model
model.fit(X_train, y_train, epochs=350, batch_size=32, verbose=2, validation_data=(X_test, y_test))

In [14]:
# evaluate the model
mse, mae = model.evaluate(X_test, y_test, verbose=0)
print('MSE: %.3f, RMSE: %.3f, MAE: %.3f' % (mse, sqrt(mse), mae))

MSE: 16946080.000, RMSE: 4116.562, MAE: 3303.697


### **How to visualise**

In [0]:
model.summary()

### **How to Reduce Overfitting With Dropout**

In [0]:
"""
Dropout is a clever regularization method that reduces overfitting of the training 
dataset and makes the model more robust. This is achieved during training, 
where some number of layer outputs are randomly ignored or “dropped out.” 
This has the effect of making the layer look like – and be treated like – a layer 
with a different number of nodes and connectivity to the prior layer.
Dropout has the effect of making the training process noisy, forcing nodes 
within a layer to probabilistically take on more or less responsibility for the inputs.
"""

### **How to Accelerate Training With Batch Normalization**

In [0]:
"""
Batch normalization is a technique for training very deep neural networks that 
standardizes the inputs to a layer for each mini-batch. This has the effect of 
stabilizing the learning process and dramatically reducing the number of training 
epochs required to train deep networks.
"""
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))