# Neural Networks with TensorFlow

In this notebook we will implement some more complicated neural networks. Rather than doing everything from scratch as we have up to now, we will make use of [TensorFlow 2](https://www.tensorflow.org/) and the [Keras](https://keras.io) high level interface.

## Installing TensorFlow and Keras

TensorFlow and Keras are not included with the base Anaconda install, but can be easily installed by running the following commands on the Anaconda Command Prompt/terminal window:
```
conda install notebook jupyterlab nb_conda_kernels
conda create -n tf tensorflow ipykernel mkl
```
Once this has been done, you should be able to select the `Python [conda env:tf]` kernel from the Kernel->Change Kernel menu item at the top of this notebook. Then, we import TensorFlow package:

In [1]:
import tensorflow as tf

## Creating a simple network with TensorFlow

We will start by creating a very simple fully connected feedforward network using TensorFlow/Keras. The network will mimic the one we implemented previously, but TensorFlow/Keras will take care of most of the details for us.

First, let us load the MNIST digits dataset that we will be using to train our network. This is available directly within Keras:

In [2]:
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()

The data comes as a set of integers in the range [0,255] representing the shade of gray of a given pixel. Let's first rescale them to be in the range [0,1]:

In [3]:
x_train, x_test = x_train / 255.0, x_test / 255.0

We also have to convert the y values from integers to a one-hot representation as a 10-component vector with one non-zero entry:

In [4]:
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

Now we can build a neural network model using Keras. This uses a very simple high-level modular structure where we only have the specify the layers in our model and the properties of each layer. The layers we will have are as follows:
1. Input layer: This will be a 28x28 matrix of numbers.
2. `Flatten` layer: Convert our 28x28 pixel image into an array of size 784.
3. `Dense` layer: a fully-connected layer of the type we have been using up to now. We will use 30 neurons and the sigmoid activation function.
4. `Dense` layer: fully-connected output layer. 

In [5]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(30, activation='sigmoid'),
  tf.keras.layers.Dense(10, activation='softmax')
])

  super().__init__(**kwargs)


Next we compile this model, specifying the optimization algorithm (ADAM) and loss function (mean squared error) to be used.

In [6]:
model.compile(optimizer='adam',
              loss="mean_squared_error",
              metrics=["categorical_accuracy"])

We now train the model with our training data. We will run for 5 epochs.

In [7]:
model.fit(x_train, y_train_vec, epochs=5)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.7691 - loss: 0.0390
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9263 - loss: 0.0120
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9404 - loss: 0.0096
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9497 - loss: 0.0082
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9553 - loss: 0.0073


<keras.src.callbacks.history.History at 0x1cbd0fe8c40>

Finally, we check the accuracy of our model against the test data

In [8]:
model.evaluate(x_test, y_test_vec, verbose=False)

[0.007564563769847155, 0.9514999985694885]

It has 95.5% accuracy, consistent with what was found during training. 

### Exercises
Experiment with this network:
1. Change the number of neurons in the hidden layer.
2. Add more hidden layers.
3. Change the activation function in the hidden layer to `relu` (for examples see the list of [Keras Layer Activation Functions](https://keras.io/api/layers/activations/)).
4. Change the activation in the output layer to something other than `softmax`.
5. Change the loss function (for examples see the list of [Keras Loss Functions](https://keras.io/api/losses/)).
How does the performance of your network change with these modifications?

#### 1

In [9]:
import tensorflow as tf

# Load MNIST data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Rescale data
x_train, x_test = x_train / 255.0, x_test / 255.0

# One-hot encode target labels
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

# Function to create and train a model
def create_and_train_model(hidden_neurons=30, epochs=5):
    # Define the model
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(hidden_neurons, activation='sigmoid'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    # Compile the model
    model.compile(optimizer='adam',
                  loss="mean_squared_error",
                  metrics=["categorical_accuracy"])
    
    # Train the model
    history = model.fit(x_train, y_train_vec, epochs=epochs, verbose=True)
    
    # Evaluate on test data
    evaluation = model.evaluate(x_test, y_test_vec, verbose=False)
    
    print(f"\nModel with {hidden_neurons} hidden neurons:")
    print(f"Test Loss: {evaluation[0]:.4f}")
    print(f"Test Accuracy: {evaluation[1]:.4f}")
    return history, evaluation

# Experiment with different numbers of hidden neurons
hidden_neurons_list = [10, 30, 50, 100]
for neurons in hidden_neurons_list:
    create_and_train_model(hidden_neurons=neurons)


Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.6417 - loss: 0.0586
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.8863 - loss: 0.0193
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9050 - loss: 0.0153
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9133 - loss: 0.0139
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9220 - loss: 0.0125

Model with 10 hidden neurons:
Test Loss: 0.0122
Test Accuracy: 0.9229
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.7626 - loss: 0.0391
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9244 - loss: 0.012

#### 2

In [10]:
import tensorflow as tf

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the input data to the range [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

# Define a model with more hidden layers, using sigmoid activation
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(30, activation='sigmoid'),  # First hidden layer
    tf.keras.layers.Dense(30, activation='sigmoid'),  # Second hidden layer
    tf.keras.layers.Dense(30, activation='sigmoid'),  # Third hidden layer
    tf.keras.layers.Dense(10, activation='softmax')  # Output layer
])

# Compile the model
model.compile(optimizer='adam',
              loss="mean_squared_error",
              metrics=["categorical_accuracy"])

# Train the model
model.fit(x_train, y_train_vec, epochs=5)

# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test_vec, verbose=False)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")


Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.5171 - loss: 0.0614
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9158 - loss: 0.0136
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9405 - loss: 0.0094
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9503 - loss: 0.0078
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9566 - loss: 0.0068
Test Loss: 0.0072, Test Accuracy: 0.9532


#### 3

In [11]:
import tensorflow as tf

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the input data to the range [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

# Define a model with hidden layers using ReLU activation
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(30, activation='relu'),  # First hidden layer with ReLU
    tf.keras.layers.Dense(30, activation='relu'),  # Second hidden layer with ReLU
    tf.keras.layers.Dense(30, activation='relu'),  # Third hidden layer with ReLU
    tf.keras.layers.Dense(10, activation='softmax')  # Output layer
])

# Compile the model
model.compile(optimizer='adam',
              loss="mean_squared_error",
              metrics=["categorical_accuracy"])

# Train the model
model.fit(x_train, y_train_vec, epochs=5)

# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test_vec, verbose=False)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")


Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.7988 - loss: 0.0284
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9449 - loss: 0.0087
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9569 - loss: 0.0067
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9617 - loss: 0.0059
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9671 - loss: 0.0052
Test Loss: 0.0065, Test Accuracy: 0.9588


#### 4

In [12]:
import tensorflow as tf

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the input data to the range [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

# Define a model with sigmoid activation in the output layer
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(30, activation='relu'),  # First hidden layer with ReLU
    tf.keras.layers.Dense(30, activation='relu'),  # Second hidden layer with ReLU
    tf.keras.layers.Dense(30, activation='relu'),  # Third hidden layer with ReLU
    tf.keras.layers.Dense(10, activation='sigmoid')  # Output layer with Sigmoid
])

# Compile the model
model.compile(optimizer='adam',
              loss="binary_crossentropy",  # Binary crossentropy for sigmoid output
              metrics=["categorical_accuracy"])

# Train the model
model.fit(x_train, y_train_vec, epochs=5)

# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test_vec, verbose=False)
print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")


Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.7197 - loss: 0.1643
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - categorical_accuracy: 0.9440 - loss: 0.0364
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9579 - loss: 0.0274
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.9646 - loss: 0.0224
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9703 - loss: 0.0196
Test Loss: 0.0217, Test Accuracy: 0.9678


Used sigmoid instead, which is commonly used for binary classification tasks but can also be adapted for multi-class classification:

#### 5

In [13]:
import tensorflow as tf

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the input data to the range [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding
y_train_vec = tf.keras.utils.to_categorical(y_train, 10)
y_test_vec = tf.keras.utils.to_categorical(y_test, 10)

# Define a function to create and train a model with a specific loss function
def train_model_with_loss(loss_function):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(30, activation='relu'),  # First hidden layer with ReLU
        tf.keras.layers.Dense(30, activation='relu'),  # Second hidden layer with ReLU
        tf.keras.layers.Dense(30, activation='relu'),  # Third hidden layer with ReLU
        tf.keras.layers.Dense(10, activation='softmax')  # Output layer with Softmax
    ])
    
    # Compile the model with the specified loss function
    model.compile(optimizer='adam',
                  loss=loss_function,  # Use the loss function passed to the function
                  metrics=["categorical_accuracy"])
    
    # Train the model
    print(f"\nTraining with loss function: {loss_function}")
    model.fit(x_train, y_train_vec, epochs=5, verbose=1)
    
    # Evaluate the model
    loss, accuracy = model.evaluate(x_test, y_test_vec, verbose=False)
    print(f"Test Loss: {loss:.4f}, Test Accuracy: {accuracy:.4f}")
    return loss, accuracy

# Experiment with different loss functions
loss_functions = [
    "categorical_crossentropy",  # Standard for multi-class classification
    "mean_squared_error",        # Used previously, though not optimal for classification
    "mean_absolute_error",       # Less common but measures average absolute differences
    "hinge"                      # Suitable for binary classification or margin-based tasks
]

results = {}
for loss_fn in loss_functions:
    results[loss_fn] = train_model_with_loss(loss_fn)

# Display the results
print("\nSummary of Results:")
for loss_fn, (loss, accuracy) in results.items():
    print(f"{loss_fn}: Test Loss = {loss:.4f}, Test Accuracy = {accuracy:.4f}")



Training with loss function: categorical_crossentropy
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.8077 - loss: 0.6336
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9483 - loss: 0.1763
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - categorical_accuracy: 0.9596 - loss: 0.1328
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - categorical_accuracy: 0.9661 - loss: 0.1127
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - categorical_accuracy: 0.9698 - loss: 0.0986
Test Loss: 0.1247, Test Accuracy: 0.9643

Training with loss function: mean_squared_error
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 3ms/step - categorical_accuracy: 0.7986 - loss: 0.0277
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

### Explanation of Loss Functions:

1. **categorical_crossentropy**:
   - Most commonly used for multi-class classification with one-hot encoded labels.
   - Measures the log loss between predicted and true probability distributions.

2. **mean_squared_error**:
   - Measures the squared differences between predicted and true values.
   - Better suited for regression tasks but included here for comparison.

3. **mean_absolute_error**:
   - Measures the average absolute differences between predicted and true values.
   - Similar to mean squared error but penalizes outliers less harshly.

4. **hinge**:
   - Typically used for binary classification tasks.
   - Evaluates the margin between predicted and true class labels, with penalties for violations.

### Output:
At the end of the training, the results will display the **test loss** and **test accuracy** for each loss function. This allows you to compare the effectiveness of different loss functions for your neural network.

# Neural Network Performance with Different Loss Functions

## Code Summary and Observations for Loss Functions:

### 1. **categorical_crossentropy**
- **Usage**: Best suited for multi-class classification tasks with one-hot encoded labels.
- **Performance**:
  - Test Loss: 0.1247
  - Test Accuracy: 96.43%
- **Observation**: This is the most effective loss function for the MNIST classification problem, balancing precision and stability.

---

### 2. **mean_squared_error**
- **Usage**: Measures squared differences between predicted and true values; often used for regression.
- **Performance**:
  - Test Loss: 0.0068
  - Test Accuracy: 95.59%
- **Observation**: While it works for classification, it's less effective than categorical crossentropy due to differences in optimization dynamics.

---

### 3. **mean_absolute_error**
- **Usage**: Measures average absolute differences; penalizes outliers less harshly compared to mean squared error.
- **Performance**:
  - Test Loss: 0.0112
  - Test Accuracy: 94.56%
- **Observation**: Shows slightly lower accuracy and stability compared to mean squared error.

---

### 4. **hinge**
- **Usage**: Evaluates margins between predicted and true labels; commonly used in binary classification.
- **Performance**:
  - Test Loss: 0.9108
  - Test Accuracy: 94.72%
- **Observation**: Performs surprisingly well despite being tailored for binary classification; however, loss remains high due to incompatibility with multi-class tasks.

---

## Comparison Summary:

| Loss Function           | Test Loss | Test Accuracy |
|-------------------------|-----------|---------------|
| categorical_crossentropy | 0.1247    | **96.43%**    |
| mean_squared_error       | 0.0068    | 95.59%        |
| mean_absolute_error      | 0.0112    | 94.56%        |
| hinge                    | 0.9108    | 94.72%        |

---

## Key Insights:
- **categorical_crossentropy** consistently outperforms other loss functions for the MNIST dataset due to its alignment with the task's probabilistic nature.
- Regression-based loss functions like **mean_squared_error** and **mean_absolute_error** are viable alternatives but less optimal.
- **hinge loss**, while effective for binary classification, is less suited for multi-class problems like MNIST.

These observations highlight the importance of selecting the correct loss function based on the problem domain. Let me know if you would like further experiments or details!
