# TensorFlow

- Tensorflow has endges and nodes. Edges are tensors and nodes are operations. 
- You can define constants and variables. They can be zero dimensional, 1 or 2 dimensional vectors. 
- **add()** performs elementwise additional and both vectors must be same shape. 
- **multiply()** does element wise multiplication. Need to be same size/shape.
- **matmul()** does matrix multiplication. matmul(A,B) - columns of A must be equal to rows of B. 
- **reduce_sum(A)** operator sums over all dimensions of A. reduce_sum(A,i) sums over ith dimension i.e. (0, 1 or 2)

- **gradient()** operator calculates gradient of a function at a point. Can be used to find optimum values (min or max)
- **reshape()** operator reshapes a tensor
- **random()** fills tensor with entries popluated from a probability distribution

In [None]:
# Reshape model from a 1x3 to a 3x1 tensor
model = reshape(model, (3, 1))

# Multiply letter by model
output = matmul(letter, model)

# Sum over output and print prediction using the numpy method
prediction = reduce_sum(output)
print(prediction.numpy())

'''Linear Regression'''
intercept = tf.Variable(0.1)
slope = tf.Variable(0.1)

# Define a linear regression model
def linear_regression(intercept, slope, features = size_log):
	return intercept + slope*features

# Set loss_function() to take the variables as arguments
def loss_function(intercept, slope, features = size_log, targets = price_log):
	# Set the predicted values
	predictions = linear_regression(intercept, slope, features)
    
    # Return the mean squared error loss
	return keras.losses.mse(targets,predictions)

# Initialize an Adam optimizer
opt = keras.optimizers.Adam(0.5)

for j in range(100):
	# Apply minimize, pass the loss function, and supply the variables
	opt.minimize(lambda: loss_function(intercept, slope), var_list=[intercept, slope])


# Plot data and regression line
plot_results(intercept, slope
             
'''Deep Learning'''
##### Low Level
# From previous step
bias1 = Variable(1.0)
weights1 = Variable(ones((3, 2)))
product1 = matmul(borrower_features, weights1)
dense1 = keras.activations.sigmoid(product1 + bias1)

# Initialize bias2 and weights2
bias2 = Variable(1.0)
weights2 = Variable(ones((2, 1)))

# Perform matrix multiplication of dense1 and weights2
product2 = matmul(dense1, weights2)
             
############ High level
             
# Define the first dense layer
dense1 = keras.layers.Dense(7, activation='sigmoid')(borrower_features)

# Define a dense layer with 3 output nodes
dense2 = keras.layers.Dense(3, activation='sigmoid')(dense1)

# Define a dense layer with 1 output node
predictions = keras.layers.Dense(1, activation='sigmoid')(dense2)

## Deeplearning with tensor flow:
- High level approach uses API and simpler function calls such as: keras.layers.Dense(#of nodes, activation = 'function')
- Low level approach uses linear algebriac functions such as: prod = matmul(inputs,weights), dense = keras.activations.sigmoid(prod)

**Three most common activation functions**
- Sigmoid: Primarily used in the outlput layer of binary classification problems. Between 0 and 1 (can define probability)
- tanh: Like sigmoid in shape but between -1 and 1
- ReLu (Rectified Linear Unit) : Mostly used in hidden layers. 0 to infinity. Never negative
- leaky_relu: will let some negative values. 
- Softmax: When output layer had more than 2 classes. Ensures that the outputs will sum to one so we can interpret them as probabilities

**Three most common minimization/optimization funcitons**
- SGD: Stochastic Gradient Descent. learning rate needs to be optimized
- RMS Propogation Optimizer: 
    - Root mean square. 
    - Allows different learning rates to each feature. Good for high dimensional problems.
    - Lets build **momentum** and allows **decay**
- Adam: 
    - Adaptive moment
    -  Good first choice
    -  Can set decay faster rate by lowering **beta1** parameter
    -  Performs better (compared to RMS) with parameter default values which are commonly used.

In [None]:
# Construct input layer from borrower features
inputs = constant(inputs_arrays, float32)

# Define first dense layer
dense1 = keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = keras.layers.Dense(8, activation='relu')(dense1)

# Define output layer
outputs = keras.layers.Dense(6, activation='softmax')(dense2)

# Print first five predictions
print(outputs.numpy()[:5])

**Initialization** is important in deeplearning. Some methods include:
- tf.random.normal([500,500]) - Using normal distribution to initialize variables for weights
- The default dense layer function in keras (keras.layers.Dense(#, activation='func')) uses golrot uniform initializer
- Can use kernel_initilizer parameter. For example to initialize zeros: keras.layers.Dense(#, activation='func', kernel_initilizer='zeros')

To avoid overfitting, you can use **dropout** which randomly drops weights related to certain node during training operation.Will improve out-of-sample performance.

In [None]:
# Define the layer 1 weights
w1 = Variable(random.normal([23, 7]))

# Initialize the layer 1 bias
b1 = Variable(ones([7]))

# Define the layer 2 weights
w2 = Variable(random.normal([7, 1]))

# Define the layer 2 bias
b2 = Variable([0])

def model(w1, b1, w2, b2, features = borrower_features):
	# Apply relu activation functions to layer 1
	layer1 = keras.activations.relu(matmul(features, w1) + b1)
    # Apply dropout rate of 0.25
	dropout = keras.layers.Dropout(0.25)(layer1)
	return keras.activations.sigmoid(matmul(dropout, w2) + b2)

# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
	predictions = model(w1, b1, w2, b2)
	# Pass targets and predictions to the cross entropy loss
	return keras.losses.binary_crossentropy(targets, predictions)

# Train the model
for j in range(100):
    # Complete the optimizer
	opt.minimize(lambda: loss_function(w1, b1, w2, b2), 
                 var_list=[w1, b1, w2, b2])

# Make predictions with model using test features
model_predictions = model(w1, b1, w2, b2, test_features)

# Construct the confusion matrix
confusion_matrix(test_targets, model_predictions)

### Defining a Sequential API
Easiest way to make a neural network
- from tf.keras.utils import to_categorical for one-hot encoding of the output column
- model = keras.Sequential()
- model.add('each layer')
- model.add(layers.Dropout(0.#)) to drop some percentage of nodes to avoid overfitting
- model.summary() to review architecture
- model.compile('optimizer_unc', loss = 'loss_func') to optimize
- model.get_weights() to get weights of the model at any given time

Can use function form to use (2) models to predict the same set outputs instead of sequential. 

- model.fit(features, labels, epochs = #, validation_split = 0.#, callbacks= [early_stopping_monitor])

### Fine tuning
- Can fine tune models by defining your own optimizers. Using keras.optimizers, import SGD (constant learning rate) or modify other optimizers
- **Callbacks** can use different functions to stop running model for different reasons. early stopping will stop if validation accuracy is not improving. 
    - from tensorflow.keras.callbacks import EarlyStopping
- **Checkpoints** can be used to store parameters in between steps. Specially to store best values. 
    - from keras.callback import ModelCheckpoint

### Saving and Loading a model to use it
- model.save('model_name.h5')
- To load a model, use method: from tf.keras.models import load_model

In [None]:
'''Example'''
# Define a Keras sequential model
model = keras.models.Sequential()

# Define the first dense layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Apply dropout to the first layer's output
model.add(keras.layers.Dropout(0.25))

# Define the second dense layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Print the model architecture
print(model.summary())

'''Using 2 models in functional form'''
# For model 1, pass the input layer to layer 1 and layer 1 to layer 2
m1_layer1 = keras.layers.Dense(12, activation='sigmoid')(m1_inputs)
m1_layer2 = keras.layers.Dense(4, activation='softmax')(m1_layer1)

# For model 2, pass the input layer to layer 1 and layer 1 to layer 2
m2_layer1 = keras.layers.Dense(12, activation='relu')(m2_inputs)
m2_layer2 = keras.layers.Dense(4, activation='softmax')(m2_layer1)

# Merge model outputs and define a functional model
merged = keras.layers.add([m1_layer2, m2_layer2])
model = keras.Model(inputs=[m1_inputs, m2_inputs], outputs=merged)

# Print a model summary
print(model.summary())

'''Validation'''

# Define sequential model
model = keras.Sequential()

# Define the first layer
model.add(keras.layers.Dense(32, activation='sigmoid', input_shape=(784,)))

# Add activation function to classifier
model.add(keras.layers.Dense(4, activation='softmax'))

# Set the optimizer, loss function, and metrics
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Add the number of epochs and the validation split
model.fit(sign_language_features, sign_language_labels, epochs=10, validation_split=0.10)

### Estimators API
- Much faster deployment
- Define feature_column
- Define input_function
- Define estimator (use pre defined)

In [None]:
# Define feature columns for bedrooms and bathrooms
bedrooms = feature_column.numeric_column("bedrooms")
bathrooms = feature_column.numeric_column("bathrooms")

# Define the list of feature columns
feature_list = [bedrooms, bathrooms]

def input_fn():
	# Define the labels
	labels = np.array(housing['price'])
	# Define the features
	features = {'bedrooms':np.array(housing['bedrooms']), 
                'bathrooms':np.array(housing['bathrooms'])}
	return features, labels

# Define the model and set the number of steps
model = estimator.DNNRegressor(feature_columns=feature_list, hidden_units=[2,2])
model.train(input_fn, steps=1)

TensorFlow Hub
- Has pretrained models
- Transfer learning. Their models are trained on much larger samples so will give better predictions, specially for image models. 

TensorFlow Probability
- More statistical distributions
- Trainable distributions
- Extended set of optimizers

## Improving the model

### Learning Curves
- As number of epochs increase, learning curve decreases or accuracy curve increases.
- They can be unstable in some areas at times. Variables like:
    - Optimizer
    - Learning Rate
    - Batch size
    - Network architecture
    - weight initialization etc.

#### Choosing the correct activation function
- In general, start with relu, it generalizes well in most cases and train fast. Avoid sigmoids

#### Batch size and normalization
- Mini-batch is a set of samples
- Usually, weights get updated at the end of every epoch. If we divide training data to mini-batches, they get updated more often. 
- Networks train faster on mini-batches
- Less RAM is used, so we can use larger data size
- Will need more iterations and have to find a good batch size. Batch size of 1 is stochastic gradient descent. 
- Larger dataset, larger batch size

#### Regularization

**Dropout**
- Randomly drops a subset of units during forward and backward propogation. This reduces sensitivity to noise in the data and the network to become overly correlated. 
- **from keras.layer import Dropout** 
- Add dropout at a layer and assign percentage of units to be dropped out from the last layer

**Batch normalization**
makes sure consecutive layers get normalized data as inputs to avoid issues with optimization and gradient descent. 
- Allows higher learning rate
- Reduces dependence on weight initializations
- Improves gradient flow
- Limits internal covariance shift

**Batch normalization and Dropout layers sometimes DO NOT WORK WELL TOGETHER**

#### Review Model:
- Use model history to review how it has improved accuracy during its training. 

In [None]:
# Store initial model 
weightsinit_weights = model.get_weights()
# Lists for storing accuracies
train_accs = []
tests_accs = []

for train_size in train_sizes:
    # Split a fraction according to train_size    
    X_train_frac, _, y_train_frac, _ = train_test_split(X_train, y_train, train_size=train_size)
    # Set model initial weights    
    model.set_weights(initial_weights)
    # Fit model on the training set fraction    
    model.fit(X_train_frac, y_train_frac, 
              epochs=100,               #epoch is everytime the model goes through all the training samples
              batch_size = 128          #size of each mini-batch. default 32, power of 2 is used. 
              verbose=0,                #how many messages come out during training a model
              validation_data = (X_test, y_test),       #self explanatory
              validation_split = 0.x,                   #Can be used instead of validation data
              callbacks=[EarlyStopping(monitor='loss', patience=1), 
                         ModelCheckpoint('weights.hdf5', monitor='val_loss',save_best_only=True)]
             )
    
    # Get the accuracy for this training set fraction    
    train_acc = model.evaluate(X_train_frac, y_train_frac, verbose=0)[1]    
    train_accs.append(train_acc)
    
    # Get the accuracy on the whole test set    
    test_acc = model.evaluate(X_test, y_test, verbose=0)[1]    
    test_accs.append(test_acc)
    print("Done with size: ", train_size)
#######################REVIEW MODEL ##########################

import matplotlib.pyplot as plt

# Extract the history from the training object
history = training.history

# Plot the training loss 
plt.plot(history['loss'])
# Plot the validation loss
plt.plot(history['val_loss'])

# Show the figure
plt.show()

#################### Batch Normalization #####################

from tensorflow.keras.layers import BatchNormalization
model.add(BatchNormalization()) # can be added between layers as its own layer in Sequential API

### Using RandomSearchCV from scikit-learn
- Use Keras SKlearn wrapper: tensorflow.keras.wrappers.scikit_learn to make your model usable by sklearn
- Use less number of epochs
- Use randomsearch CV instead of grid search
- Use smaller sample of dataset

In [None]:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Import sklearn wrapper from keras
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Create a model as a sklearn estimator
model = KerasClassifier(build_fn=create_model) #defined a different create_model function below. Basically a new keras sequential model everytime the function is called

# Import cross_val_score
from sklearn.model_selection import cross_val_score
# Check how your keras model performs with 5 fold crossvalidation
kfold = cross_val_score(model, X, y, cv=5)

def create_model(nl=1,nn=256, optimizer, loss_func):
    model = Sequential()
    model.add(Dense(16, input_shape=(2,), activation='relu'))
    # Add as many hidden layers as specified in nl
    for i inrange(nl):
        # Layers have nn neurons        
        model.add(Dense(nn, activation='relu'))
        # End defining and compiling your model...
        model.add(Dense(1, activation = 'sigmoid'))
        model.compile(optimizer=optimizer, loss=loss_func)
        
    return model

# Define parameters, named just like in create_model()
params = dict(nl=[1, 2, 9], 
              optimizer=['sgd', 'adam'], 
              epochs=3,
              batch_size=[5, 10, 20], 
              activation=['relu','tanh']
              nn=[128,256,1000])

# Repeat the random search
random_search = RandomizedSearchCV(model, params_dist=params, cv=3)
random_search_results = random_search.fit(X, y)

# Print results
print("Best: %f using %s".format(random_search_results.best_score_,random_search_results.best_params_))



## Autoencoders
An autoencoderis a neural network that is trained to attempt to copy its inputto its output.
- They are designed to be unable to learn to copy perfectly. 
- Usually they are restricted in ways that allow them to copy only approximately, and to copy only input that resembles the training data. 
- Because the model is forced to prioritize which aspects of the input should be copied, it often learns useful properties of the data.

#### Used for
- Dimensionality reduction
- Anomaly detection
- noise removal etc. 


In [None]:
'''Backend function
    Can act like a layer would act at a certain stage of training
'''
# Import tensorflow.keras backend
import tensorflow.keras.backend as K

# Input tensor from the 1st layer of the model
inp = model.layers[0].input

# Output tensor from the 1st layer of the model
out = model.layers[0].output

# Define a function from inputs to outputs
inp_to_out = K.function([inp], [out])

# Print the results of passing X_test through the 1st layer
print(inp_to_out([X_test]))

'''Autoencoder'''


# Start with a sequential model
autoencoder = Sequential()

# Add a dense layer with input the original image pixels and neurons the encoded representation (MINST)
autoencoder.add(Dense(32, input_shape=(784, ), activation="relu"))

# Add an output layer with as many neurons as the orginal image pixels
autoencoder.add(Dense(784, activation = "sigmoid"))

# Compile your model with adadelta
autoencoder.compile(optimizer = 'adadelta', loss = 'binary_crossentropy')

# Summarize your model structure
autoencoder.summary()

#### CHECKING THE ENCODER

# Build your encoder by using the first layer of your autoencoder
encoder = Sequential()
encoder.add(autoencoder.layers[0])

# Encode the noisy images and show the encodings 
encodings = encoder.predict(X_test_noise)

## CNN
- Convolution is a mathematical operation that preserves spatial relationships. 
- Convolution applies filter to each array to reduce dimensions
- (Convolution + Relu) + Pooling -> Flattern -> Fully Connected -> Classification

### Convolution: Kernels
- Act as feature mapping operator over the input array(images).
- Need to be flattened to integrate into fully connected network. 
- Conv2D is used for images. input shape to be same as image since we dont want to loose spatial relationships.
- **Padding**
    - Use "zero-padding" so that the size of the output of feature mapping is the same as input array.
    - padding = "valid" - No zero padding is added. padding = "same" - zero padding is added. 
- **Strides**
    - Step taken by Kernel in each step as it slides across the image.
    - Stride = 1 is default. If stride is > 1, convolution output will be smaller. 
 
 **O = ((I−K+2P)/S)+1**
 
 I = size of the input
 
 K = size of the kernel
 
 P = size of the zero padding
 
 S = strides
 
### Pooling:
- Reduces the number of parameters after convolution. 
- Takes a small matrix, and replaces that matrix by one single value (example: maximum pooling)
- Different types of pooling operations: MaxPool2D 

In [None]:
#Need to import flatten and conv from keras
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten

# Instantiate your model as usual
model = Sequential()
# Add a convolutional layer with 32 filters of size 3x3
model.add(Conv2D(filters=32,                 
                 kernel_size=3,                 
                 input_shape=(28, 28, 1),                 
                 activation='relu'))

#Pooling layer to reduce number of parameters
model.add(MaxPool2D(2))

# Add another convolutional layer
model.add(Conv2D(8, kernel_size=3, activation='relu'))

# Add a dropout layer (explained earlier)
model.add(Dropout(0.20)) #20% of units from previous layer

# Flatten the output of the previous layer
model.add(Flatten())
# End this multiclass model with 3 outputs and softmax
model.add(Dense(3, activation='softmax'))

'''Pre-processingimagesforResNet50'''

# Import image from keras preprocessing
from tensorflow.keras.preprocessing import image
# Import preprocess_input from tensorflow keras applications resnet50
from tensorflow.keras.applications.resnet50 import preprocess_input

# Load the image with the right target size for your model
img = image.load_img(img_path, target_size=(224, 224))
# Turn it into an array
img = image.img_to_array(img)
# Expand the dimensions so that it's understood by our network:
# img.shape turns from (224, 224, 3) into (1, 224, 224, 3)
img = np.expand_dims(img, axis=0)
# Pre-process the img in the same way training images were
img = preprocess_input(img)

# Import ResNet50 and decode_predictions from tensorflow.keras.applications.resnet50
from tensorflow.keras.applications.resnet50 import ResNet50, decode_predictions

# Instantiate a ResNet50 model with imagenet weights
model = ResNet50(weights='imagenet')

# Predict with ResNet50 on our img
preds = model.predict(img)
# Decode predictions and print it
print('Predicted:', decode_predictions(preds, top=1)[0])


**Functional API**

In [None]:
# Input/dense/output layers
from tensorflow.keras.layers import Input, Dense
input_tensor = Input(shape=(1,))
output_tensor = Dense(1)(input_tensor)

# Build the model
from tensorflow.keras.models import Model
model = Model(input_tensor, output_tensor)

# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')

# Fit the model
model.fit(games_tourney_train[['seed_diff', 'pred']],
  		  games_tourney_train[['score_1', 'score_2']],
  		  verbose=True,
  		  epochs = 100,
  		  batch_size = 16384)

# Import the plotting function
from tensorflow.keras.utils import plot_model
import matplotlib.pyplot as plt

# Summarize the model
model.summary()

# Plot the model
plot_model(model, to_file='model.png')

# Display the image
data = plt.imread('model.png')
plt.imshow(data)
plt.show()

# Print the model's weights
print(model.get_weights())

# Print the column means of the training data
print(games_tourney_train.mean())

### Embedding Layer
- It is similar to PCA. A low dimensional representation of high dimensional data. 
- Used specially when there is high cardinality in data.
- Embedding layer creates an additonal dimensional to the input. This is important for text and image data. For categorical data, need to flatten the embedded layer output

In [None]:
# Imports
from tensorflow.keras.layers import Embedding
from numpy import unique

# Imports
from tensorflow.keras.layers import Input, Embedding, Flatten
from tensorflow.keras.models import Model

# Create an input layer for the team ID
input_tensor = Input(shape=(1,))

# Create an embedding layer
embedding_layer = Embedding(input_dim=n_teams,  #number of categorical variables
                        output_dim=1,
                        input_length=1,
                        name='Team-Strength')

# Lookup the input in the team strength embedding layer
strength_lookup = embedding_layer(input_tensor)

# Flatten the output
strength_lookup_flat = Flatten()(strength_lookup)

# Combine the operations into a single, re-usable model
team_strength_model = Model(teamid_in, strength_lookup_flat, name='Team-Strength-Model')

#### Regularization

**Dropout**
- Randomly drops a subset of units during forward and backward propogation. This reduces sensitivity to noise in the data and the network to become overly correlated. 
- **from keras.layer import Dropout** 
- Add dropout at a layer and assign percentage of units to be dropped out from the last layer


