## Constants and variables

### What is TensorFlow?
- Open-soure library for graph-based numerical computation
    - Developed by the Google Brain Team
- Low and high level APIs
    - Addition, multiplication, differentiation
    - Machine learning models
- Important changes in TensorFlow 2.0
    - Eager execution by default
    - Model building with Keras and Estimators

### What is a tensor?
- Generalization of vectors and matrices
- Collection of numbers
- Specific shape

### Defining tensors in TensorFlow

In [1]:
import tensorflow as tf

# 0D Tensor
d0 = tf.ones((1,))

In [2]:
# 1D Tensor
d1 = tf.ones((2,))

# 2D Tensor
d2 = tf.ones((2, 2))

# 3D Tensor
d3 = tf.ones((2, 2, 2))

In [7]:
print(d3)

Tensor("ones_3:0", shape=(2, 2, 2), dtype=float32)


In [3]:
# Print the 3D tensor
# print(d3.numpy())    tensorflow 2.0부터 있는듯...

AttributeError: 'Tensor' object has no attribute 'numpy'

### Defining constants in TensorFlow
- A constant is the simplest category of tensor
    - Not trainable
    - Can have any dimention

In [4]:
from tensorflow import constant

In [8]:
# Define a 2x3 constant.
a = constant(3, shape=[2, 3])

# Define a 2x2 constant.
b = constant([1, 2, 3, 4], shape=[2, 2])

### Using convenience functions to define constants

```
Operation           |  Example
--------------------------------------------
tf.constant()         constant([1, 2, 3])
tf.zeros()            zeros([2, 2])
tf.zeros_like()       zeros_like(input_tensor)
tf.ones()             ones([2, 2])
tf.ones_like()        ones_like(input_tensor)
tf.fill()             fill([3, 3], 7)
```

### Defining and initializing variables

In [9]:
# Define a variable
a0 = tf.Variable([1, 2, 3, 4, 5, 6], dtype=tf.float32)
a1 = tf.Variable([1, 2, 3, 4, 5, 6], dtype=tf.int16)

In [10]:
# Define a constant
b = tf.constant(2, tf.float32)

In [11]:
# Compute their product
c0 = tf.multiply(a0, b)
c1 = a0 * b

### Basic Operations
Graph Based..

In [12]:
from tensorflow import constant, add

# Define 0-dimensional tensors
A0 = constant([1])
B0 = constant([2])

In [13]:
# Define 1-dimensional tensors
A1 = constant([1, 2])
B1 = constant([3, 4])

# Define 2-dimensional tensors
A2 = constant([[1, 2], [3, 4]])
B2 = constant([[5, 6], [7, 8]])

In [14]:
# Perform tensor addition with add()
C0 = add(A0, B0)
C1 = add(A1, B1)
C2 = add(A2, B2)

print(C0)
print(C1)
print(C2)

Tensor("Add:0", shape=(1,), dtype=int32)
Tensor("Add_1:0", shape=(2,), dtype=int32)
Tensor("Add_2:0", shape=(2, 2), dtype=int32)


- The add() operation performs **element-wise addtion** with two tensors.
- Elemen-wise addition requires both tensors to have the same shape.
- Overloaded 돼있기 때문에 +기호로도 사용 가능??

In [15]:
print(A0 + B0)

Tensor("add:0", shape=(1,), dtype=int32)


- Element-wise multiplication performed using multiply() operation.
- Matrix multiplication performed with matmul() operator.
    - matmul(A, B) operation multiplies A by B

In [16]:
from tensorflow import ones, matmul, multiply

# Define tensors
A0 = ones(1)
A31 = ones([3, 1])
A34 = ones([3, 4])
A43 = ones([4, 3])

### Summing over tensor dimensions
- The reduce_sum() operator sums over the dimensions of a tensor
    - reduce_sum(A) sums over all dimensions of A
    - reduce_sum(A, i) sums over dimension i

In [17]:
from tensorflow import ones, reduce_sum

# Define a 2x3x4 tensor of ones
A = ones([2, 3, 4])

In [19]:
# Sum over all dimensions
B = reduce_sum(A)

# Sum over dimensions 0, 1, and 2
B0 = reduce_sum(A, 0)
B1 = reduce_sum(A, 1)
B2 = reduce_sum(A, 2)

In [20]:
print(B)
print(B0)
print(B1)
print(B2)

Tensor("Sum_3:0", shape=(), dtype=float32)
Tensor("Sum_4:0", shape=(3, 4), dtype=float32)
Tensor("Sum_5:0", shape=(2, 4), dtype=float32)
Tensor("Sum_6:0", shape=(2, 3), dtype=float32)


### Advanced operations

```
Operation     |                Use
--------------|-----------------------------------------------------------------
gradient()    |    Computes the slope of a function at a point
reshape()     |    Reshapes a tensor(e.g. 10x10 to 100x1)
random()      |    Populates tensor with entries drawn from a probability distribution
```

### Find the optimum
- In many problems, we will want to fine the optimum of a function.
    - Minimum: Lowest value of a loss function.
    - Maximum: Highest value of objective function.
- We can do this using the gradient() operation.
    - Optimum: Find a point where gradient = 0
    - Minimum: Change in gradient > 0
    - Maximum: Change in gradient < 0

In [21]:
# Define x
x = tf.Variable(-1.0)

# Define y within instance of GradientTape... 2.0버전에 있는것..
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.multiply(x, x)

AttributeError: module 'tensorflow' has no attribute 'GradientTape'

In [None]:
# Evaluate the gradient of y at x = -1
g = tape.gradient(y, x)
print(g.numpy())

### Images as tensors
#### How to reshape a grayscale image

In [22]:
# Generage grayscale image
gray = tf.random.uniform([2, 2], maxval=255, dtype='int32')    # tf.random도 2.0버전에 생긴거네..

# Reshape grayscale image
gray = tf.reshape(gray, [2*2, 1])

AttributeError: module 'tensorflow' has no attribute 'random'

#### How to reshape a color image

In [None]:
# Generage color image
color = tf.random.uniform([2, 2, 3], maxval=255, dtype='int32')

# Reshape color image
color = tf.reshape(color, [2*2, 3])

#### Input data

#### Importing data for use in TensorFlow
- Data can be imported using tensorflow
    - Useful for managing complex pipelines
- Simpler option
    - Import data using pandas
    - Convert data to numpy array
    - Use in tensorflow without modification

In [None]:
import numpy as np
import pandas as pd

# Load data from csv
housing = pd.read_csv('kc_housing.csv')

# Convert to numpy array
housing = np.array(housing)

- pandas also has methods for handling data in other formats
    - E.g. read_json(), read_html(), read_excel()

### Using mixed type datasets

In [None]:
# Load KC dataset
housing = pd.read_csv('kc_housing.csv')

# Convert price column to float32
price = np.array(housing['price'], np.float32)

# Convert waterfront column to Boolean
waterfront = np.array(housing['waterfront'], np.bool)

In [None]:
# Cast approach
price = tf.cast(housing['price'], tf.float32)
waterfront = tf.cast(housing['waterfront'], tf.bool)

## Loss functions

- Fundamental tenorflow operation
    - Used to train a model
    - Measure of model fit
- Higher value -> worse fit
    - Minimize the loss function
  
- TensorFlow has operations for common loss functions
    - Mean Squared Error(MSE)
    - Mean Absolute Error(MAE)
    - Huber Error
- Loss functions are accessible from tf.keras.losses()
    - tf.keras.losses.mse()
    - tf.keras.losses.mae()
    - tf.keras.losses.Huber()

- Other loss functions
    - Mean Absolute Percentage Error(MAPE) / tf.keras.losses.mape()
    - Mean Squared Logarighmic Error(MSLE) / tf.keras.losses.msle()

In [None]:
# Compute the MSE loss
loss = tf.keras.losses.mse(target, predictions)

In [None]:
# Define a loss function to compute the MSE
def loss_function(intercept, slope, target, features):
    # Compute the predictions for a linear model
    predictions = intercept + features * slope
    # Return the loss
    return tf.keras.losses.mse(target, predictions)

In [None]:
# Compute the loss for given input data and model parameters
loss_function(intercept, slope, prices, size)

## Linear regression

- A linear regression model assumes a linear relationship:
    - price = intercept + size \* slope + error
- This is an example of a univariate regression.
    - There is only on feature, size
- Multiple regression models have more than one feature.
    - E.g. size and location    

In [None]:
# Define the targets and features
price = np.array(housing['price'], np.float32)
size = np.array(housing['sqft_living'], np.float32)

# Define the intercept and slope
intercept = tf.Variable(0.1, np.float32)
slope = tf.Variabel(0.1, np.float32)

In [None]:
# Compute the predicted values and loss function
def loss_function(intercept, slope, size, price):
    predictions = intercept + size * slope
    return tf.keras.losses.mse(price, predictions)

In [None]:
# Define an optimization operation
opt = tf.keras.optimizers.Adam()

# Minimize the loss function and print the loss
for j in range(1000):
    opt.minimize(lambda: loss_function(intercept, slope, size, price), var_list=[interept, slope])
    print(loss_function(intercept, slope, size, price))
    
# Print the trained parameters
print(intercept.numpy(), slope.numpy())

## Batch training

### What is batch training?
데이터를 여러개의 batch로 나눠서 한번에 하나의 batch만 처리하는것.  
- pd.read_csv() allows us to load data in batches
    - Avoid loading entire dataset
    - chunksize parameter provides batch size

In [None]:
import pandas as pd
import numpy as np

# Load data in batches
for batch in pd.read_csv('kc_housing.csv', chunksize=100):
    # Extract price column
    price = np.array(batch['price'], np.float32)
    # Extract size column 
    size = np.array(batch['size'], np.float32)

### Training a linear model in batches

In [None]:
intercept = tf.Variable(0.1, tf.float32)
slope = tf.Variable(p.1, tf.float32)

In [None]:
# Compute predicted values and return loss function
def loss_function(intercept, slope, featuresm, target):
    predictions = intercept + features * slope
    return tf.keras.losses.mse(target, predictions)

# Define optimization operation
opt = tf.keras.optimizers.Adam()

In [None]:
# Load the data in batches from pandas
for batch in pd.read_csv('kc_housing.csv', chunksize=100):
    # Extract the target and feature column
    price_batch = np.array(batch['price'], np.float32)
    size_batch = np.array(batch['size'], np.float32)
    # Minimize the loss function
    opt.minimize(lambda : loss_function(intercept, slope, size_batch, price_batch), var_list = [intercept, slope])

In [None]:
# Print parameter values
print(intercept.numpy(), slope.numpy())

### Full sample versus batch training
1. Full Sample
    1. One step per epoch
    2. Accepts dataset without modification
    3. Limited by memory
2. Batch Training
    1. Multiple steps per epoch
    2. Requires division of dataset
    3. No limit on dataset size

## Dense layers

### The linear regression model
Linear combination으로 표현..

### What is a neural network?
Input layer(Features) -> Hidden layers -> output layer(Prediction)  
  
- Dense layer: 이전의 모든 노드로부터 연결돼있음.

#### A trivial dense layer

In [None]:
# Define input data
inputs = tf.constant([[1, 35]])

# Define weights
weights = tf.Variable([[-0.05], [-0.01]])

# Multiply inputs by the weights
product = tf.matmul(inputs, weights)

# Define dense layer
dense = tf.keras.activations.sigmoid(product)

#### Defining a complete model

In [None]:
# Define input layer
inputs = tf.constant(data, tf.float32)

# Define first dense layer
dense1 = tf.keras.layers.Dense(10, activation='sigmoid')(inputs)  # Number of outgoing node, activation function

# Define second dense layer
dense2 = tf.keras.layers.Dense(5, activation='sigmoid')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)

### High-level versus low-level approach

- High-level approach
    - High-level API operations.   (dense = keras.layers.Dense(10, activation='sigmoid')
- Low-level approach
    - Linear-algebraic operations (prod = matmul(inputs, weights) // dense = keras.activations.sigmoid(prod)

## Activation functions

- Components of a typical hidden layer
    - LinearL Matrix multiplication
    - Nonlinear: Activation function

### Why nonlinearities are important
Nonlinearity 관계를 얻기 위해.. linear layer는 여러번 쌓는 의미가 없다. 하나의 선형변환으로 쓸 수 있음..

In [None]:
# A simple example
# Define example borrower features
young, old = 0.3, 0.6
low_bill, high_bill = 0.1, 0.5

# Compute products and sums
young_high = 1.0 * young + 1.0 * high_bill
young_low = 1.0 * young + 1.0 * low_bill
old_high = 1.0 * old + 1.0 * high_bill
old_low = 1.0 * old + 1.0 * low_bill

# Print difference for young
print(young_high - young_low)

# Print difference for old
print(old_high - old_low)

In [None]:
# Print difference for young after activation is applied
print(tf.keras.activations.sigmoid(young_high).numpy() - tf.keras.activations.sigmoid(young_low).numpy())

# Print difference for old after activation is applied
print(tf.keras.activations.sigmoid(old_high).numpy() - tf.keras.activations.sigmoid(old_low).numpy())

#### The sigmoid activation function
- Binary classification
- Low-level: tf.keras.activations.sigmoid()
- High-level: sigmoid
  
#### The relu activation function
- Hidden layers
- Low-level: tf.keras.activations.relu()
- High-level: relu
  
#### The softmax activation function
- Output layer( > 2 classes) Multiclass classification
- High-level: tf.keras.activations.softmax()
- Low-level: softmax

#### Activation functions in neural networks

In [None]:
# Define input layer
inputs = tf.constant(borrower_features, tf.float32)

# Define dense layer 1
dense1 = tf.keras.layers.Dense(16, activation='relu')(inputs)

# Define dense layer 2
dense2 = tf.keras.layers.Dense(8, activation='sigmoid')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(4, activation='softmax')(dense2)

## Optimizers

### How to find a minimum?
경사를 따라 내려가자! Gradient Descent Algorithm.

- SGD(Stochastic gradient descent optimizer)
    - tf.keras.optimizers.SGD()
    - learning_rate
- RMS(Root mean squared propagation optimizer)
    - Applies different learning rates to each feature
    - tf.keras.optimizers.RMSprop()
    - learning_rate
    - decay, momentum
- Adam(Adaptive moment optimizer)
    - tf.keras.optimizers.Adam()
    - learning_rate
    - beta1
    - beta2

#### A complete example

In [None]:
import tensorflow as tf

# Compute the predicted values and loss
def loss_function(weights):
    product = tf.matmul(borrower_features, weights)
    predictions = tf.keras.activations.sigmoid(product)
    return tf.keras.losses.binary_crossentropy(default, predictions)

In [None]:
# Minimize the loss function with adam
opt = tf.keras.optimizers.Adam(lr=0.1, beta_1=0.9, beta_2=0.8)
opt.minimize(lambda : loss_function(weights), var_list=[weights])

Local Minima에 빠지지 않도록 초기값, 하이퍼파라미터를 잘 조절해주어야 함!

## Training a network in TensorFlow

### Random initializers
- Often need to initialize thousands of variables
    - tf.ones() may perform poorly
    - Tedious and difficult to initialize variables individually
- Alternatively, draw initial values from distribution
    - Random normal
    - Uniform
    - Glorot initializer. 이건 처음보는데??

#### Initializing variables in TensorFlow

In [None]:
import tensorflow as tf

# Define 500x500 random normal variable
weights = tf.Variable(tf.random.normal([500, 500]))

# Define 500x500 truncated random normal variable.(매우 큰 값이나 매우 작은 값을 버림..)
weights = tf.Variable(tf.random.truncated_normal([500, 500]))

In [None]:
# Define a dense layer with the default initializer
dense = tf.keras.layers.Dense(32, activation='relu')

# Define a dense layer with the zeros initializer
dense = tf.keras.layers.Dense(32, activation='relu', kernel_initializer='zeros')

In this exercise, you will train a neural network to predict whether a credit card holder will default.  
The features and targets you will use to train your network are available in the Python shell as borrower_features and default.  
You defined the weights and biases in the previous exercise.  
  
Note that output_layer is defined as σ(layer1∗weights2+bias2), where σ is the sigmoid activation,  
layer1 is a tensor of nodes for the first hidden dense layer, weight2 is a tensor of weights, and bias2 is the bias tensor.  
  
The trainable variables are weights1, bias1, weights2, and bias2. Additionally,  
the following operations have been imported for you: nn.relu() and keras.layers.Dropout()

Apply a rectified linear unit activation function to the first layer.  
Apply 25% dropout to layer1.  
Pass the target, targets, and the predicted values, layer2, to the cross entropy loss function.  
Add the four trainable variables to var_list in the order in which they appear as arguments to loss_function().  

In [None]:
def loss_function(weights1, bias1, weights2, bias2, features, targets):
    # Apply relu activation functions to layer 1
    layer1 = nn.relu(add(matmul(features, weights1), bias1))
    # Apply dropout
    dropout = keras.layers.Dropout(0.25)(layer1)
    layer2 = nn.sigmoid(add(matmul(dropout, weights2), bias2))
    # Pass targets and layers2 to the cross entropy loss
    return keras.losses.binary_crossentropy(targets, layer2)
  
for j in range(0, 30000, 2000):
    features, targets = borrower_features[j:j+2000, :], default[j:j+2000, :]
    # Complete the optimizer
    opt.minimize(lambda: loss_function(weights1, bias1, weights2, bias2, features, targets), var_list=[weights1, bias1, weights2, bias2])
    
print(weights1.numpy())

### Neural networks and overfitting
#### Applying dropout
과적합을 막기 위해 일부 연결관계를 끊어줌.
#### Implementing dropout in a network

In [None]:
import numpy as np
import tensorflow as tf

# Define input data
inputs = np.array(borrower_features, np.float32)

# Define dense layer 1
dense1 = tf.keras.layers.Dense(32, activation='relu')(inputs)

# Define dense layer 2
dense2 = tf.keras.layers.Dense(16, activation='relu')(dense1)

# Apply dropout operation
dropout1 = tf.keras.layers.Dropout(0.25)(dense2)    # 25%의 노드의 연결을 끊음..

# Define output layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dropout1)

## Defining neural networks with Keras

### The Sequential API
- Input layer
- Hidden layers
- Output layer
- Ordered in sequence

#### Building a sequential model

In [None]:
# Define a sequential model
model = tf.keras.Sequential()

# Define first hidden layer
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(28*28,)))

# Define second hidden layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('adam', loss='categorical_crossentropy')  # Multiclass Classification에 이용..

#### Using the functional API

In [None]:
# Define model 1 input layer shape
model1_inputs = tf.keras.Input(shape=(28*28,))

# Define model 2 input layer shape
model2_inputs = tf.keras.Input(shape=(10,))

# Define layer 1 for model 1
model1_layer1 = tf.keras.layers.Dense(12, activation='relu')(model1_inputs)

# Define layer 2 for model 1
model1_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model1_layer1)

# Define layer 1 for model 2
model2_layer1 = tf.keras.layers.Dense(8, activation='relu')(model2_inputs)

# Define layer 2 for model 2
model2_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model2_layer1)

# Merge model 1 and model 2
merged = tf.keras.layers.add([model1_layer2, model2_layer2])

# Define a functional model
model = tf.keras.Model(inputs=[moel1_inputs, model2_inputs], outputs=merged)

# Compile the model
model.compile('adam', loss='categorical_crossentropy')

In some cases, the sequential API will not be sufficiently flexible to accommodate your desired model architecture  
and you will need to use the functional API instead.  
If, for instance, you want to train two models with different architectures jointly,  
you will need to use the functional API to do this. You will use the functional API to merge the two models. 

## Training and validation with Keras

### Overview of training
1. Load and clean data
2. Define model
3. Train and validate model
4. Evaluate model

#### How to train a model

In [None]:
# Define a sequential model
model = tf.keras.Sequential()

# Define the hidden layer
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the output layer
model.add(tf.keras.layers.Dense(4, activation='softmax'))

# Compile model
model.compile('adam', loss='categorical_crossentropy')

# Train model
model.fit(image_features, image_labels)

#### The fit() operation
- Required arguments
    - features
    - labels
- Many optional arguments
    - batch_size
    - epochs
    - validation_split

#### Performing validation

In [None]:
# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)

#### Changing the metric

In [None]:
# Recompile the model with the accuraty metric
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)

#### Evaluating models
When you train a model, you will typically divide your data into three subsets: train, validation, and test.  
During the training phase, you will use the train and validation subsets.  
As you do this, you will adjust the number epochs, the learning rate, and other "hyperparameters" to reduce the train and validation set losses.  
Once this process is complete, you will evaluate your model on a separate test set.  
You can apply its .evaluate(x,y) method to compute the loss and metric values for features x and labels y  

In [None]:
model.evaluate(features, labels)

### Training models with the Estimators API

### What is the Estimators API?
- High level submodule
- Less flexible
- Enforces best practices
- Faster deployment
- Many premade models

### Model specification and training
1. Define feature column
2. Lad and transform data
3. Define an estimator
4. Apply train operation

### Defining feature columns

In [None]:
# Define a numeric feature column
size = tf.feature_column.numeric_column("size")

# Define a categorical feature column
rooms = tf.feature_column.categorical_column_with_vocabulary_list("rooms", ["1", "2", "3", "4", "5"])

# Create feature column list
features_list = [size, rooms]

# Define a matrix feature column
features_list = [tf.feature_column.numeric_column('image', shape(784,))]

### Loading and transforming data

In [None]:
# Define input data function
def input_fn():
    # Define feature dictionary
    features = {"size": [1340, 1690, 2720], "rooms": [1, 3, 4]}
    # Define labels
    labels = [221900, 538000, 180000]
    return features, labels

### Define and train a regression estimator

In [None]:
# Define a deep neural network regression
model0 = tf.estimator.DNNRegressor(feature_columns=feature_list, hidden_units=[10, 6, 6, 3])

# Train the regression model
model0.train(input_fn, steps=20)

# Define a deep neural network classifier
model1 = tf.estimator.DNNClassifier(feature_columns=feature_list, hidden_units=[32, 16, 8], n_classes=4)

# Train the classifier
model1.train(input_fn, steps=20)

### TensorFlow extensions
- TensorFlow Hub
    - Pretrained models
    - Transfer learning
- TensorFlow Probability
    - More statistical distributions
    - Trainable distributions
    - Extended set of optimizers

### TensorFlow 2.0
- eager_execution()
- Tighter keras integration
- Estimators
- function()