### The linear algebra of dense layers

There are two ways to define a dense layer in tensorflow. The first involves the use of low-level, linear algebraic operations. The second makes use of high-level keras operations. In this exercise, we will use the first method to construct the network shown in the image below.

This image depicts an neural network with 5 input nodes and 3 output nodes.

The input layer contains 3 features -- education, marital status, and age -- which are available as borrower_features. The hidden layer contains 2 nodes and the output layer contains a single node.

For each layer, you will take the previous layer as an input, initialize a set of weights, compute the product of the inputs and weights, and then apply an activation function. Note that Variable(), ones(), matmul(), and keras() have been imported from tensorflow.
**Instructions** 
* Initialize weights1 as a variable using a 3x2 tensor of ones.
* Compute the product of borrower_features by weights1 using matrix multiplication.
* Use a sigmoid activation function to transform product1 + bias1.
 

In [2]:
import tensorflow as tf


In [8]:
borrower_features = tf.Variable([2,2,43], 'float32')

In [7]:
# Initialize bias1
bias1 = tf.Variable(1.0)

# Initialize weights1 as 3x2 variable of ones
weights1 = tf.Variable(tf.ones((3, 2)))

# Perform matrix multiplication of borrower_features and weights1
product1 = tf.matmul(borrower_features, weights1)

# Apply sigmoid activation function to product1 + bias1
dense1 = tf.keras.activations.sigmoid(product1 + bias1)

# Print shape of dense1
print("\n dense1's output shape: {}".format(dense1.shape))

InvalidArgumentError: cannot compute MatMul as input #1(zero-based) was expected to be a int32 tensor but is a float tensor [Op:MatMul]

### Using the dense layer operation

We've now seen how to define dense layers in tensorflow using linear algebra. In this exercise, we'll skip the linear algebra and let keras work out the details. This will allow us to construct the network below, which has 2 hidden layers and 10 features, using less code than we needed for the network with 1 hidden layer and 3 features.

This image depicts an neural network with 10 inputs nodes and 1 output node.

To construct this network, we'll need to define three dense layers, each of which takes the previous layer as an input, multiplies it by weights, and applies an activation function. Note that input data has been defined and is available as a 100x10 tensor: borrower_features. Additionally, the keras.layers module is available.
**Instructions**
* Set dense1 to be a dense layer with 7 output nodes and a sigmoid activation function.
* Define dense2 to be dense layer with 3 output nodes and a sigmoid activation function.
* Define predictions to be a dense layer with 1 output node and a sigmoid activation function.
* Print the shapes of dense1, dense2, and predictions in that order using the .shape method. Why does each of these tensors have 100 rows?



In [None]:
# Define the first dense layer
dense1 = keras.layers.Dense(7, activation='sigmoid')(borrower_features)

# Define a dense layer with 3 output nodes
dense2 = keras.layers.Dense(3, activation='sigmoid')(dense1)

# Define a dense layer with 1 output node
predictions = keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print the shapes of dense1, dense2, and predictions
print('\n shape of dense1: ', dense1.shape)
print('\n shape of dense2: ', dense2.shape)
print('\n shape of predictions: ', predictions.shape)

### Binary classification problems

In this exercise, you will again make use of credit card data. The target variable, default, indicates whether a credit card holder defaults on his or her payment in the following period. Since there are only two options--default or not--this is a binary classification problem. While the dataset has many features, you will focus on just three: the size of the three latest credit card bills. Finally, you will compute predictions from your untrained network, outputs, and compare those the target variable, default.

The tensor of features has been loaded and is available as bill_amounts. Additionally, the constant(), float32, and keras.layers.Dense() operations are available.

***Instructions***

* Define inputs as a 32-bit floating point constant tensor using bill_amounts.
* Set dense1 to be a dense layer with 3 output nodes and a relu activation function.
* Set dense2 to be a dense layer with 2 output nodes and a relu activation function.
* Set the output layer to be a dense layer with a single output node and a sigmoid activation function.


In [9]:
import pandas as pd

In [11]:
credit_3k = pd.read_csv('../uci_credit_card.csv', nrows=3000)
credit_3k.headad()

Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default.payment.next.month
0,1,20000.0,2,2,1,24,2,2,-1,-1,...,0,0,0.0,0.0,689.0,0.0,0.0,0.0,0,1
1,2,120000.0,2,2,2,26,-1,2,0,0,...,3272,3455,3261.0,0.0,1000.0,1000.0,1000.0,0.0,2000,1
2,3,90000.0,2,2,2,34,0,0,0,0,...,14331,14948,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000,0
3,4,50000.0,2,2,1,37,0,0,0,0,...,28314,28959,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000,0
4,5,50000.0,1,2,1,57,-1,0,-1,0,...,20940,19146,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679,0


In [27]:
bill_amounts = credit_3k.loc[:,'PAY_AMT1':'PAY_AMT3'].values
default = credit_3k.loc[:, 'default.payment.next.month'].values.reshape(-1,1)

In [34]:
# Construct input layer from features
inputs = tf.constant(bill_amounts, tf.float32)

# Define first dense layer
dense1 = tf.keras.layers.Dense(3, activation='relu')(inputs)

# Define second dense layer
dense2 = tf.keras.layers.Dense(2, activation='relu')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print error for first five examples
error = default[:5] - outputs.numpy()[:5]
print(error)

[[1.]
 [1.]
 [0.]
 [0.]
 [0.]]


### Multiclass classification problems

In this exercise, we expand beyond binary classification to cover multiclass problems. A multiclass problem has targets that can take on three or more values. In the credit card dataset, the education variable can take on 6 different values, each corresponding to a different level of education. We will use that as our target in this exercise and will also expand the feature set from 3 to 10 columns.

As in the previous problem, you will define an input layer, dense layers, and an output layer. You will also print the untrained model's predictions, which are probabilities assigned to the classes. The tensor of features has been loaded and is available as borrower_features. Additionally, the constant(), float32, and keras.layers.Dense() operations are available.
***Instructions***

* Define the input layer as a 32-bit constant tensor using borrower_features.
* Set the first dense layer to have 10 output nodes and a sigmoid activation function.
* Set the second dense layer to have 8 output nodes and a rectified linear unit activation function.
* Set the output layer to have 6 output nodes and the appropriate activation function.


In [35]:
borrower_features = bill_amounts = credit_3k.loc[300:,'BILL_AMT1':'PAY_AMT4'].values

In [38]:
# Construct input layer from borrower features
inputs = tf.constant(borrower_features, tf.float32)

# Define first dense layer
dense1 = tf.keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = tf.keras.layers.Dense(8, activation='relu')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(6, activation='softmax')(dense2)

# Print first five predictions
print(outputs.numpy()[:5])

[[0.09967188 0.22663516 0.07982659 0.04413212 0.42941287 0.12032136]
 [0.11888196 0.28357947 0.09343815 0.05114814 0.31083646 0.14211582]
 [0.09967188 0.22663516 0.07982659 0.04413212 0.42941287 0.12032136]
 [0.09967188 0.22663516 0.07982659 0.04413212 0.42941287 0.12032136]
 [0.12363368 0.26292917 0.07158358 0.06026991 0.37084603 0.11073761]]


### Find the local minima

In [40]:
# Initialize x_1 and x_2
x_1 = tf.Variable(6.0,tf.float32)
x_2 = tf.Variable(0.3,tf.float32)

# Define the optimization operation
opt = tf.keras.optimizers.SGD(learning_rate=0.01)

for j in range(100):
	# Perform minimization using the loss function and x_1
	opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
	# Perform minimization using the loss function and x_2
	opt.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())

NameError: name 'loss_function' is not defined

### Avoiding local minima

The previous problem showed how easy it is to get stuck in local minima. We had a simple optimization problem in one variable and gradient descent still failed to deliver the global minimum when we had to travel through local minima first. One way to avoid this problem is to use momentum, which allows the optimizer to break through local minima. We will again use the loss function from the previous problem, which has been defined and is available for you as loss_function().

The graph is of a single variable function that contains multiple local minima and a global minimum.

Several optimizers in tensorflow have a momentum parameter, including SGD and RMSprop. You will make use of RMSprop in this exercise. Note that x_1 and x_2 have been initialized to the same value this time. Furthermore, keras.optimizers.RMSprop() has also been imported for you from tensorflow.

In [None]:
# Initialize x_1 and x_2
x_1 = Variable(0.05,float32)
x_2 = Variable(0.05,float32)

# Define the optimization operation for opt_1 and opt_2
opt_1 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)
opt_2 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.0)

for j in range(100):
	opt_1.minimize(lambda: loss_function(x_1), var_list=[x_1])
    # Define the minimization operation for opt_2
	opt_2.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())

### Defining the model and loss function

In this exercise, you will train a neural network to predict whether a credit card holder will default. The features and targets you will use to train your network are available in the Python shell as borrower_features and default. You defined the weights and biases in the previous exercise.

Note that the predictions layer is defined as
, where

is the sigmoid activation, layer1 is a tensor of nodes for the first hidden dense layer, w2 is a tensor of weights, and b2 is the bias tensor.

The trainable variables are w1, b1, w2, and b2. Additionally, the following operations have been imported for you: keras.activations.relu() and keras.layers.Dropout().
**Instructions**

* Apply a rectified linear unit activation function to the first layer.
* Apply 25% dropout to layer1.
* Pass the target, targets, and the predicted values, predictions, to the cross entropy loss function.


In [41]:
# Define the layer 1 weights
w1 = tf.Variable(tf.random.normal([23, 7]))

# Initialize the layer 1 bias
b1 = tf.Variable(tf.ones([7]))

# Define the layer 2 weights
w2 = tf.Variable(tf.random.normal([7, 1]))

# Define the layer 2 bias
b2 = tf.Variable([0.00])

In [43]:
# Define the model
def model(w1, b1, w2, b2, features = borrower_features):
	# Apply relu activation functions to layer 1
	layer1 = tf.keras.activations.relu(tf.matmul(features, w1) + b1)
    # Apply dropout rate of 0.25
	dropout = tf.keras.layers.Dropout(0.25)(layer1)
	return tf.keras.activations.sigmoid(tf.matmul(dropout, w2) + b2)

# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
	predictions = model(w1, b1, w2, b2)
	# Pass targets and predictions to the cross entropy loss
	return tf.keras.losses.binary_crossentropy(targets, predictions)

### Training neural networks with TensorFlow

In the previous exercise, you defined a model, model(w1, b1, w2, b2, features), and a loss function, loss_function(w1, b1, w2, b2, features, targets), both of which are available to you in this exercise. You will now train the model and then evaluate its performance by predicting default outcomes in a test set, which consists of test_features and test_targets and is available to you. The trainable variables are `w1, b1, w2, and b2. Additionally, the following operations have been imported for you: keras.activations.relu() and keras.layers.Dropout().`

In [44]:
# Train the model
for j in range(100):
    # Complete the optimizer
	opt.minimize(lambda: loss_function(w1, b1, w2, b2), 
                 var_list=[w1, b1, w2, b2])

# Make predictions with model using test features
model_predictions = model(w1, b1, w2, b2, test_features)

# Construct the confusion matrix
tf.confusion_matrix(test_targets, model_predictions)

InvalidArgumentError: cannot compute MatMul as input #1(zero-based) was expected to be a double tensor but is a float tensor [Op:MatMul]