<a href="https://colab.research.google.com/github/Cliffochi/aviva_data_science_course/blob/main/TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##TensorFlow

###[Question 1] Looking back at Scratch

Looking back at implementing deep learning from scratch, here are the key components that were needed:  

### **1. Model Architecture**  
- **Neural Network Layers**: Defining the structure (e.g., Dense, Conv2D, LSTM).  
- **Weight Initialization**: Setting initial values for weights (e.g., random, Xavier, He).  
- **Bias Initialization**: Initializing bias terms.  

### **2. Forward Propagation**  
- **Input Handling**: Passing input data through the network.  
- **Activation Functions**: Applying ReLU, Sigmoid, Tanh, etc.  
- **Loss Calculation**: Computing loss (e.g., Cross-Entropy, MSE).  

### **3. Backward Propagation (Gradient Calculation)**  
- **Gradient Computation**: Calculating gradients using chain rule.  
- **Loss Gradient**: Deriving gradients w.r.t. loss.  
- **Weight & Bias Updates**: Adjusting parameters using gradients.  

### **4. Optimization**  
- **Optimizer Implementation**: Updating weights (e.g., SGD, Adam, RMSprop).  
- **Learning Rate**: Managing step size for updates.  

### **5. Training Loop**  
- **Epoch Loop**: Iterating over the entire dataset.  
- **Batch Processing**: Splitting data into mini-batches.  
- **Validation**: Monitoring performance on validation data.  

### **6. Evaluation**  
- **Accuracy/Loss Metrics**: Measuring model performance.  
- **Prediction**: Running inference on test data.  

### **7. Data Handling**  
- **Data Loading**: Reading input data (e.g., CSV, images).  
- **Preprocessing**: Normalization, reshaping, one-hot encoding.  
- **Batching**: Creating mini-batches for training.  

### **8. Debugging & Monitoring**  
- **Gradient Checking**: Ensuring correct backpropagation.  
- **Logging**: Tracking loss/accuracy over epochs.  

---  
### **How Frameworks (Like TensorFlow) Implement These**  
1. **Automatic Differentiation** → No manual gradient computation (uses `tf.GradientTape`).  
2. **Predefined Layers** → `tf.keras.layers` handles weight initialization and forward pass.  
3. **Built-in Optimizers** → `tf.optimizers` (SGD, Adam, etc.) manage updates.  
4. **Loss Functions** → `tf.losses` provides common loss computations.  
5. **Training Loop Abstraction** → `model.fit()` automates epochs/batches.  


###[Question 2] Considering compatibility between Scratch and TensorFlow

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Iris.csv")

# Filter the DataFrame by specific conditions
df = df[(df["Species"] == "Iris-versicolor") | (df["Species"] == "Iris-virginica")]
y = df["Species"]
X = df.loc[:, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]

# Convert to NumPy arrays
X = np.array(X)
y = np.array(y)

# Convert labels to numeric
y[y == "Iris-versicolor"] = 0
y[y == "Iris-virginica"] = 1
y = y.astype(np.int64)[:, np.newaxis]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Further split train into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=0)

class GetMiniBatch:
    """
    Iterator for retrieving mini-batches

    Parameters
    ----------
    X : ndarray of shape (n_samples, n_features)
        Training data
    y : ndarray of shape (n_samples, 1)
        Ground truth labels
    batch_size : int
        Size of each mini-batch
    seed : int
        Seed for NumPy's random number generator
    """
    def __init__(self, X, y, batch_size=10, seed=0):
        self.batch_size = batch_size
        np.random.seed(seed)
        shuffle_index = np.random.permutation(np.arange(X.shape[0]))
        self.X = X[shuffle_index]
        self.y = y[shuffle_index]
        # Use np.intp instead of np.int as np.int is deprecated
        self._stop = np.ceil(X.shape[0]/self.batch_size).astype(np.intp)
    def __len__(self):
        return self._stop
    def __getitem__(self, item):
        p0 = item * self.batch_size
        p1 = item * self.batch_size + self.batch_size
        return self.X[p0:p1], self.y[p0:p1]
    def __iter__(self):
        self._counter = 0
        return self
    def __next__(self):
        if self._counter >= self._stop:
            raise StopIteration()
        p0 = self._counter * self.batch_size
        p1 = self._counter * self.batch_size + self.batch_size
        self._counter += 1
        return self.X[p0:p1], self.y[p0:p1]

# Set hyperparameters
learning_rate = 0.001
batch_size = 10
num_epochs = 100

n_hidden1 = 50
n_hidden2 = 100
n_input = X_train.shape[1]
n_samples = X_train.shape[0]
n_classes = 1

# Define the model using Keras
model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden1, activation='relu', input_shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    tf.keras.layers.Dense(n_classes)
])

# Compile the model
# Use Adam optimizer and BinaryCrossentropy loss for binary classification
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Mini-batch iterator for training
get_mini_batch_train = GetMiniBatch(X_train, y_train, batch_size=batch_size)

# Create a TensorFlow dataset from the generator
train_dataset = tf.data.Dataset.from_generator(
    lambda: get_mini_batch_train, # Use the instance of your iterator
    output_types=(tf.float64, tf.int64), # Specify the data types of the output
    output_shapes=((None, n_input), (None, n_classes)) # Specify the shapes of the output (None for batch size)
)

# Train the model using the created dataset
history = model.fit(train_dataset,
                    epochs=num_epochs,
                    verbose=0, # Set to 1 to see progress
                    validation_data=(X_val, y_val))

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print("test_acc : {:.3f}".format(test_acc))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


test_acc : 0.950


Explanation of how the **"things needed to implement deep learning"** from scratch map to TensorFlow’s implementation in the sample code.  

---

### **1. Model Architecture**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manually define layers (e.g., `DenseLayer` class). | Layers are defined via `tf.Variable` for weights/biases and `tf.matmul` + `tf.add`. |
| Explicit weight initialization (e.g., He initialization). | Uses `tf.random_normal` for initialization. |
| Hand-coded forward pass (e.g., `forward()` method). | Forward pass is a sequence of `tf.matmul`, `tf.add`, and `tf.nn.relu`. |

**Code Example:**  
```python
# TensorFlow's manual layer definition
layer_1 = tf.add(tf.matmul(x, weights['w1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
```

---

### **2. Forward & Backward Propagation**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manual gradient calculations (chain rule). | **Automatic differentiation** via `tf.GradientTape` (not shown here, but `train_op` handles it). |
| Hand-written loss computation (e.g., cross-entropy). | Uses `tf.nn.sigmoid_cross_entropy_with_logits`. |

**Code Example:**  
```python
# Loss and gradients handled automatically
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logits))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)  # Backpropagation happens here
```

---

### **3. Training Loop**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Custom epoch/batch loops (e.g., `for epoch in epochs`). | Same epoch loop, but batching is abstracted via `GetMiniBatch` class. |
| Manual parameter updates (e.g., `weights -= lr * gradients`). | Optimizer (`AdamOptimizer`) handles updates via `train_op`. |

**Code Example:**  
```python
for epoch in range(num_epochs):
    for mini_batch_x, mini_batch_y in get_mini_batch_train:
        sess.run(train_op, feed_dict={X: mini_batch_x, Y: mini_batch_y})  # Updates weights
```

---

### **4. Data Handling**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manual data splitting/shuffling. | Uses `sklearn.model_selection.train_test_split` and custom `GetMiniBatch` iterator. |
| One-hot encoding by hand. | Labels converted to `0`/`1` manually. |

**Code Example:**  
```python
# Manual batching (similar to scratch)
class GetMiniBatch:
    def __next__(self):
        return self.X[p0:p1], self.y[p0:p1]  # Returns mini-batch
```

---

### **5. Evaluation**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Custom accuracy/loss calculations. | Uses `tf.reduce_mean` and `tf.cast` for metrics. |
| Manual validation/testing loops. | Evaluated in-session via `sess.run`. |

**Code Example:**  
```python
correct_pred = tf.equal(tf.sign(Y - 0.5), tf.sign(tf.sigmoid(logits) - 0.5))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
```

---

### **Key Takeaways**  
1. **TensorFlow Automates Gradients**: No need for manual backpropagation (handled by `optimizer.minimize()`).  
2. **Low-Level Still Requires Manual Setup**: Layers, batching, and training loops are explicit (unlike `tf.keras`).  
3. **Placeholders Feed Data**: `tf.placeholder` acts as an input pipeline (replaced by `tf.data` in TF 2.x).  

---
### **Why This Matters**  
Understanding this mapping helps:  
- Transition from scratch to frameworks.  
- Debug low-level TensorFlow code.  
- Appreciate high-level APIs (like `tf.keras`) that abstract these steps.  

##3. Application to other datasets
###[Problem 3] Create a model for Iris using all three objective variables

####Iris (3-Class Classification)
Key Changes from Binary to Multi-Class:

- Labels: One-hot encode Species (3 classes: setosa, versicolor, virginica).

- Loss Function: Use tf.nn.softmax_cross_entropy_with_logits instead of sigmoid.

- Output Layer: 3 neurons (one per class) with linear activation (logits).

- Accuracy: Compare predicted class (tf.argmax) with true class.

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load Iris dataset (all 3 classes)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Iris.csv")
X = df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].values
y = pd.get_dummies(df['Species']).values  # One-hot encoding

# Split into train/val/test (60/20/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=0)  # 0.25 * 0.8 = 0.2

# Convert data types to match TensorFlow defaults (float32)
X_train = X_train.astype(np.float32)
X_val = X_val.astype(np.float32)
X_test = X_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_val = y_val.astype(np.float32)
y_test = y_test.astype(np.float32)


# Hyperparameters
learning_rate = 0.001
batch_size = 10
num_epochs = 100
n_input = X_train.shape[1] # Number of features
n_classes = 3  # Now 3 classes!

n_hidden1 = 50
n_hidden2 = 100


# Define the model using Keras Sequential API
model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden1, activation='relu', input_shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    # For multi-class classification, the output layer has n_classes neurons.
    # No activation is needed here if using from_logits=True in the loss function.
    tf.keras.layers.Dense(n_classes)
])

# Compile the model
# Use Adam optimizer and CategoricalCrossentropy loss for multi-class classification
# from_logits=True means the output layer does not have a softmax activation
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model using the data directly
# Keras handles the batching internally when you provide numpy arrays
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=0, # Set to 1 to see progress
                    validation_data=(X_val, y_val))

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {test_acc:.3f}")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Test Accuracy: 1.000


###House Prices (Regression)
Key Changes for Regression:

- Labels: Continuous values (no one-hot encoding).

- Loss Function: Use Mean Squared Error (tf.losses.mean_squared_error).

- Output Layer: 1 neuron with linear activation.

- Metrics: Track MSE or RMSE instead of accuracy.

In [7]:
# Load House Prices dataset (example)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")
X = df[['feature1', 'feature2', ...]].values  # Select features
y = df['price'].values.reshape(-1, 1)  # Reshape to (n_samples, 1)

# Split into train/val/test (same as Iris)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=0)

# Placeholders
X_ph = tf.placeholder(tf.float32, [None, X.shape[1]])
Y_ph = tf.placeholder(tf.float32, [None, 1])  # Single output

# Model (regression)
def house_net(x):
    weights = {
        'w1': tf.Variable(tf.random_normal([X.shape[1], 50])),
        'w2': tf.Variable(tf.random_normal([50, 100])),
        'w3': tf.Variable(tf.random_normal([100, 1]))
    }
    biases = {
        'b1': tf.Variable(tf.random_normal([50])),
        'b2': tf.Variable(tf.random_normal([100])),
        'b3': tf.Variable(tf.random_normal([1]))
    }
    layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['w1']), biases['b1']))
    layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['w2']), biases['b2']))
    return tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])  # Linear output

pred = house_net(X_ph)

# Loss (MSE)
loss_op = tf.losses.mean_squared_error(labels=Y_ph, predictions=pred)
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = optimizer.minimize(loss_op)

# Training (same loop as Iris, but track MSE/RMSE)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(num_epochs):
        # ... (same training loop)
        val_mse = sess.run(loss_op, feed_dict={X_ph: X_val, Y_ph: y_val})
        print(f"Epoch {epoch}, Val MSE: {val_mse:.2f}")

KeyError: "None of [Index(['feature1', 'feature2', Ellipsis], dtype='object')] are in the [columns]"

In [9]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load House Prices dataset (example)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")

X = df[['GrLivArea', 'YearBuilt']].values  # Select features
y = df['SalePrice'].values.reshape(-1, 1)  # Assuming 'SalePrice' is the target variable and reshaping it

# Convert data types to match TensorFlow defaults (float32)
X = X.astype(np.float32)
y = y.astype(np.float32)

# Split into train/val/test (same as Iris)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=0)

# Hyperparameters
learning_rate = 0.001 # Using the learning rate from the Iris example
batch_size = 10 # Using the batch size from the Iris example
num_epochs = 100
n_input = X_train.shape[1] # Number of features
n_output = 1 # Single output for regression

n_hidden1 = 50
n_hidden2 = 100

# Define the model using Keras Sequential API for regression
model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden1, activation='relu', input_shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    # For regression, the output layer has 1 neuron and no activation (linear output)
    tf.keras.layers.Dense(n_output)
])

# Compile the model
# Use Adam optimizer and MeanSquaredError loss for regression
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mse']) # Use Mean Squared Error as a metric

# Train the model using the data directly
# Keras handles the batching internally when you provide numpy arrays
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=0, # Set to 1 to see progress
                    validation_data=(X_val, y_val))

# Evaluate the model on the test set
test_loss, test_mse = model.evaluate(X_test, y_test, verbose=0)
print(f"Test MSE: {test_mse:.2f}")

# You might also want to calculate RMSE for regression
test_rmse = np.sqrt(test_mse)
print(f"Test RMSE: {test_rmse:.2f}")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Test MSE: 3921277952.00
Test RMSE: 62620.11


###[Question 4] Create a model for House Prices

In [14]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf

# Load dataset (ensure 'train.csv' is in your directory)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")

# Select features and target. Using more features than the example to potentially improve performance.
# Added numerical features that are likely relevant to house prices.
features = ['GrLivArea', 'YearBuilt', 'OverallQual', 'TotalBsmtSF',
            '1stFlrSF', '2ndFlrSF', 'BsmtFinSF1', 'GarageArea', 'WoodDeckSF',
            'Fireplaces', 'LotFrontage', 'LotArea', 'MasVnrArea', 'BedroomAbvGr',
            'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces']

# Filter out rows with missing values in the selected features for simplicity in this example
# In a real project, you would handle missing data more robustly (e.g., imputation)
df = df.dropna(subset=features + ['SalePrice'])

X = df[features].values
y = df['SalePrice'].values.reshape(-1, 1)  # Reshape to (n_samples, 1)

# Split into train/val/test (60/20/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)  # 0.25*0.8=0.2

# Standardize features (critical for regression). Fit only on training data.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Convert data types to float32, which is common for TensorFlow
X_train = X_train.astype(np.float32)
X_val = X_val.astype(np.float32)
X_test = X_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_val = y_val.astype(np.float32)
y_test = y_test.astype(np.float32)


# Hyperparameters
# Adjusted learning rate and epochs slightly for a potentially larger model
learning_rate = 0.005
batch_size = 32
num_epochs = 200
n_features = X_train.shape[1] # Number of features

# Define the model using Keras Sequential API for regression
# Using slightly larger hidden layers
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(n_features,)),
    tf.keras.layers.Dense(64, activation='relu'),
    # For regression, the output layer has 1 neuron and no activation (linear output)
    tf.keras.layers.Dense(1)
])

# Compile the model
# Use Adam optimizer and MeanSquaredError loss for regression
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mse']) # Use Mean Squared Error as a metric

# Train the model using the data directly
# Keras handles the batching internally when you provide numpy arrays
print("Starting training...")
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=1, # Set to 1 to see progress during training
                    validation_data=(X_val, y_val))
print("Training finished.")

# Evaluate the model on the test set
print("Evaluating on test set...")
test_loss, test_mse = model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest MSE: {test_mse:.2f}")

# Calculate and print RMSE for regression
test_rmse = np.sqrt(test_mse)
print(f"Test RMSE: ${test_rmse:,.2f}") # Format as USD

# Example prediction
# Create a sample house data point using the selected features and scale it
# Make sure the order of features matches the 'features' list
sample_house_data = np.array([[1500, 2003, 7, 1000, 1500, 0, 1000, 400, 100, 1, 70, 8000, 200, 3, 1, 7, 1]])
sample_house_scaled = scaler.transform(sample_house_data.astype(np.float32))

predicted_price = model.predict(sample_house_scaled)
print(f"Predicted Price for sample house: ${predicted_price[0][0]:,.2f}")

Starting training...
Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - loss: 37558546432.0000 - mse: 37558546432.0000 - val_loss: 35265757184.0000 - val_mse: 35265757184.0000
Epoch 2/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 39505211392.0000 - mse: 39505211392.0000 - val_loss: 35102789632.0000 - val_mse: 35102789632.0000
Epoch 3/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 39800643584.0000 - mse: 39800643584.0000 - val_loss: 34530549760.0000 - val_mse: 34530549760.0000
Epoch 4/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 38116704256.0000 - mse: 38116704256.0000 - val_loss: 33213003776.0000 - val_mse: 33213003776.0000
Epoch 5/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 34550554624.0000 - mse: 34550554624.0000 - val_loss: 30827575296.0000 - val_mse: 30827575296.0000
Epoch 6/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

###[Question 5] Create an MNIST model

In [16]:
import numpy as np
import tensorflow as tf

# Load MNIST data using tf.keras.datasets
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess the data
# Reshape images to add a channel dimension (for compatibility with some layers, though not strictly needed for Dense layers)
# Flatten the images for the Dense layers
x_train = x_train.reshape(x_train.shape[0], 784).astype('float32') / 255.0
x_test = x_test.reshape(x_test.shape[0], 784).astype('float32') / 255.0

# One-hot encode the labels
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

# Split off a validation set from the training data
# Using a simple split here, similar to previous examples
# In a real scenario, you might use train_test_split from sklearn or tf.data.Dataset tools
validation_split = 0.1 # 10% of training data for validation
split_index = int(x_train.shape[0] * (1 - validation_split))

x_val, y_val = x_train[split_index:], y_train[split_index:]
x_train, y_train = x_train[:split_index], y_train[:split_index]


# Hyperparameters (adjusted slightly from the original to match Keras style better)
learning_rate = 0.001 # Using a common starting learning rate
batch_size = 32     # Keras typically defaults to 32
num_epochs = 10
n_input = x_train.shape[1]    # 28x28 flattened = 784
n_hidden1 = 512   # Hidden layer size
n_hidden2 = 256   # Add another hidden layer for potentially better performance
n_classes = num_classes  # Digits 0-9

# Define the model using Keras Sequential API
model = tf.keras.Sequential([
    tf.keras.layers.Dense(n_hidden1, activation='relu', input_shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    # Output layer for multi-class classification
    # Use softmax activation if not using from_logits=True in loss
    # Using linear output here and from_logits=True in loss is also valid.
    tf.keras.layers.Dense(n_classes)
])

# Compile the model
# Use Adam optimizer and CategoricalCrossentropy loss for multi-class classification
# from_logits=True means the output layer does not have a softmax activation
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model using the data directly
# Keras handles the batching internally when you provide numpy arrays
print("Starting training...")
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=1, # Set to 1 to see progress during training
                    validation_data=(x_val, y_val))
print("Training finished.")


# Evaluate the model on the test set
print("Evaluating on test set...")
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=1)
print(f"\nTest Accuracy: {test_acc:.4f}")

# Example prediction (predict on a single test image)
# Reshape the test image to (1, 784) for prediction
sample_image = x_test[0].reshape(1, n_input)
predicted_probabilities = model.predict(sample_image)
# Get the class with the highest probability
predicted_class = np.argmax(predicted_probabilities)

# Find the true label (which is one-hot encoded, so find the index of 1)
true_class = np.argmax(y_test[0])

print(f"\nSample Image Prediction:")
print(f"  Predicted Probabilities (logits): {predicted_probabilities[0]}")
# Apply softmax to logits to get actual probabilities if desired
# predicted_probs_softmax = tf.nn.softmax(predicted_probabilities).numpy()[0]
# print(f"  Predicted Probabilities (softmax): {predicted_probs_softmax}")
print(f"  Predicted Class: {predicted_class}")
print(f"  True Class: {true_class}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Starting training...
Epoch 1/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 12ms/step - accuracy: 0.9004 - loss: 0.3324 - val_accuracy: 0.9750 - val_loss: 0.0842
Epoch 2/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 11ms/step - accuracy: 0.9748 - loss: 0.0816 - val_accuracy: 0.9803 - val_loss: 0.0666
Epoch 3/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 11ms/step - accuracy: 0.9830 - loss: 0.0536 - val_accuracy: 0.9738 - val_loss: 0.0963
Epoch 4/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 10ms/step - accuracy: 0.9882 - loss: 0.0357 - val_accuracy: 0.9768 - val_loss: 0.0870
Epoch 5/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 11ms/step - accuracy: 0.9908 - loss: 0.0278 - val_accuracy: 0.9795 - val_loss: 0.0701
Epoch 6/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 10ms/step - accuracy: 0.9924 - loss: 0.0240 - val_accuracy: 0.9807 - 