<a href="https://colab.research.google.com/github/Cliffochi/aviva_data_science_course/blob/main/TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##TensorFlow

###[Question 1] Looking back at Scratch

Looking back at implementing deep learning from scratch, here are the key components that were needed:  

### **1. Model Architecture**  
- **Neural Network Layers**: Defining the structure (e.g., Dense, Conv2D, LSTM).  
- **Weight Initialization**: Setting initial values for weights (e.g., random, Xavier, He).  
- **Bias Initialization**: Initializing bias terms.  

### **2. Forward Propagation**  
- **Input Handling**: Passing input data through the network.  
- **Activation Functions**: Applying ReLU, Sigmoid, Tanh, etc.  
- **Loss Calculation**: Computing loss (e.g., Cross-Entropy, MSE).  

### **3. Backward Propagation (Gradient Calculation)**  
- **Gradient Computation**: Calculating gradients using chain rule.  
- **Loss Gradient**: Deriving gradients w.r.t. loss.  
- **Weight & Bias Updates**: Adjusting parameters using gradients.  

### **4. Optimization**  
- **Optimizer Implementation**: Updating weights (e.g., SGD, Adam, RMSprop).  
- **Learning Rate**: Managing step size for updates.  

### **5. Training Loop**  
- **Epoch Loop**: Iterating over the entire dataset.  
- **Batch Processing**: Splitting data into mini-batches.  
- **Validation**: Monitoring performance on validation data.  

### **6. Evaluation**  
- **Accuracy/Loss Metrics**: Measuring model performance.  
- **Prediction**: Running inference on test data.  

### **7. Data Handling**  
- **Data Loading**: Reading input data (e.g., CSV, images).  
- **Preprocessing**: Normalization, reshaping, one-hot encoding.  
- **Batching**: Creating mini-batches for training.  

### **8. Debugging & Monitoring**  
- **Gradient Checking**: Ensuring correct backpropagation.  
- **Logging**: Tracking loss/accuracy over epochs.  

---  
### **How Frameworks (Like TensorFlow) Implement These**  
1. **Automatic Differentiation** → No manual gradient computation (uses `tf.GradientTape`).  
2. **Predefined Layers** → `tf.keras.layers` handles weight initialization and forward pass.  
3. **Built-in Optimizers** → `tf.optimizers` (SGD, Adam, etc.) manage updates.  
4. **Loss Functions** → `tf.losses` provides common loss computations.  
5. **Training Loop Abstraction** → `model.fit()` automates epochs/batches.  


###[Question 2] Considering compatibility between Scratch and TensorFlow

In [17]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Iris.csv")

# Filter the DataFrame by specific conditions
df = df[(df["Species"] == "Iris-versicolor") | (df["Species"] == "Iris-virginica")]
y = df["Species"]
X = df.loc[:, ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]]

# Convert to NumPy arrays
X = np.array(X)
y = np.array(y)

# Convert labels to numeric
y[y == "Iris-versicolor"] = 0
y[y == "Iris-virginica"] = 1
y = y.astype(np.int64)[:, np.newaxis]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=0)

class GetMiniBatch:
    def __init__(self, X, y, batch_size=10, seed=0):
        self.batch_size = batch_size
        np.random.seed(seed)
        shuffle_index = np.random.permutation(np.arange(X.shape[0]))
        self.X = X[shuffle_index]
        self.y = y[shuffle_index]
        self._stop = np.ceil(X.shape[0] / self.batch_size).astype(np.intp)
    def __len__(self):
        return self._stop
    def __getitem__(self, item):
        p0 = item * self.batch_size
        p1 = item * self.batch_size + self.batch_size
        return self.X[p0:p1], self.y[p0:p1]
    def __iter__(self):
        self._counter = 0
        return self
    def __next__(self):
        if self._counter >= self._stop:
            raise StopIteration()
        p0 = self._counter * self.batch_size
        p1 = self._counter * self.batch_size + self.batch_size
        self._counter += 1
        return self.X[p0:p1], self.y[p0:p1]

# Hyperparameters
learning_rate = 0.001
batch_size = 10
num_epochs = 100

n_hidden1 = 50
n_hidden2 = 100
n_input = X_train.shape[1]
n_samples = X_train.shape[0]
n_classes = 1

# Define the model
model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden1, activation='relu'),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    tf.keras.layers.Dense(n_classes)
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Create training dataset using tf.data
get_mini_batch_train = GetMiniBatch(X_train, y_train, batch_size=batch_size)

train_dataset = tf.data.Dataset.from_generator(
    lambda: get_mini_batch_train,
    output_types=(tf.float64, tf.int64),
    output_shapes=((None, n_input), (None, n_classes))
).repeat()

steps_per_epoch = len(get_mini_batch_train)

# Train the model
history = model.fit(
    train_dataset,
    epochs=num_epochs,
    steps_per_epoch=steps_per_epoch,
    validation_data=(X_val, y_val),
    verbose=0
)

# Evaluate on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print("test_acc : {:.3f}".format(test_acc))


test_acc : 0.900


Explanation of how the **"things needed to implement deep learning"** from scratch map to TensorFlow’s implementation in the sample code.  

---

### **1. Model Architecture**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manually define layers (e.g., `DenseLayer` class). | Layers are defined via `tf.Variable` for weights/biases and `tf.matmul` + `tf.add`. |
| Explicit weight initialization (e.g., He initialization). | Uses `tf.random_normal` for initialization. |
| Hand-coded forward pass (e.g., `forward()` method). | Forward pass is a sequence of `tf.matmul`, `tf.add`, and `tf.nn.relu`. |

**Code Example:**  
```python
# TensorFlow's manual layer definition
layer_1 = tf.add(tf.matmul(x, weights['w1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
```

---

### **2. Forward & Backward Propagation**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manual gradient calculations (chain rule). | **Automatic differentiation** via `tf.GradientTape` (not shown here, but `train_op` handles it). |
| Hand-written loss computation (e.g., cross-entropy). | Uses `tf.nn.sigmoid_cross_entropy_with_logits`. |

**Code Example:**  
```python
# Loss and gradients handled automatically
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logits))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)  # Backpropagation happens here
```

---

### **3. Training Loop**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Custom epoch/batch loops (e.g., `for epoch in epochs`). | Same epoch loop, but batching is abstracted via `GetMiniBatch` class. |
| Manual parameter updates (e.g., `weights -= lr * gradients`). | Optimizer (`AdamOptimizer`) handles updates via `train_op`. |

**Code Example:**  
```python
for epoch in range(num_epochs):
    for mini_batch_x, mini_batch_y in get_mini_batch_train:
        sess.run(train_op, feed_dict={X: mini_batch_x, Y: mini_batch_y})  # Updates weights
```

---

### **4. Data Handling**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Manual data splitting/shuffling. | Uses `sklearn.model_selection.train_test_split` and custom `GetMiniBatch` iterator. |
| One-hot encoding by hand. | Labels converted to `0`/`1` manually. |

**Code Example:**  
```python
# Manual batching (similar to scratch)
class GetMiniBatch:
    def __next__(self):
        return self.X[p0:p1], self.y[p0:p1]  # Returns mini-batch
```

---

### **5. Evaluation**  
| **Scratch** | **TensorFlow (Low-Level)** |
|-------------|---------------------------|
| Custom accuracy/loss calculations. | Uses `tf.reduce_mean` and `tf.cast` for metrics. |
| Manual validation/testing loops. | Evaluated in-session via `sess.run`. |

**Code Example:**  
```python
correct_pred = tf.equal(tf.sign(Y - 0.5), tf.sign(tf.sigmoid(logits) - 0.5))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
```

---

### **Key Takeaways**  
1. **TensorFlow Automates Gradients**: No need for manual backpropagation (handled by `optimizer.minimize()`).  
2. **Low-Level Still Requires Manual Setup**: Layers, batching, and training loops are explicit (unlike `tf.keras`).  
3. **Placeholders Feed Data**: `tf.placeholder` acts as an input pipeline (replaced by `tf.data` in TF 2.x).  

---
### **Why This Matters**  
Understanding this mapping helps:  
- Transition from scratch to frameworks.  
- Debug low-level TensorFlow code.  
- Appreciate high-level APIs (like `tf.keras`) that abstract these steps.  

##3. Application to other datasets
###[Problem 3] Create a model for Iris using all three objective variables

####Iris (3-Class Classification)
Key Changes from Binary to Multi-Class:

- Labels: One-hot encode Species (3 classes: setosa, versicolor, virginica).

- Loss Function: Use tf.nn.softmax_cross_entropy_with_logits instead of sigmoid.

- Output Layer: 3 neurons (one per class) with linear activation (logits).

- Accuracy: Compare predicted class (tf.argmax) with true class.

In [20]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load Iris dataset (all 3 classes)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Iris.csv")
X = df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].values
y = pd.get_dummies(df['Species']).values  # One-hot encoding

# Split into train/val/test (60/20/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=0)  # 0.25 * 0.8 = 0.2

# Convert data types to match TensorFlow defaults (float32)
X_train = X_train.astype(np.float32)
X_val = X_val.astype(np.float32)
X_test = X_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_val = y_val.astype(np.float32)
y_test = y_test.astype(np.float32)

# Hyperparameters
learning_rate = 0.001
batch_size = 10
num_epochs = 100
n_input = X_train.shape[1]  # Number of features
n_classes = 3  # 3 classes in Iris dataset

n_hidden1 = 50
n_hidden2 = 100

# Define the model using Keras Sequential API with explicit Input layer
model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_input,)),  # Fix for the input_shape warning
    tf.keras.layers.Dense(n_hidden1, activation='relu'),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    tf.keras.layers.Dense(n_classes)  # No softmax, using from_logits=True
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=0,  # Change to 1 to view training progress
                    validation_data=(X_val, y_val))

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {test_acc:.3f}")


Test Accuracy: 1.000


###House Prices (Regression)
Key Changes for Regression:

- Labels: Continuous values (no one-hot encoding).

- Loss Function: Use Mean Squared Error (tf.losses.mean_squared_error).

- Output Layer: 1 neuron with linear activation.

- Metrics: Track MSE or RMSE instead of accuracy.

In [21]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Load House Prices dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")

# Select features and target
X = df[['GrLivArea', 'YearBuilt']].values
y = df['SalePrice'].values.reshape(-1, 1)

# Convert data types to float32 for TensorFlow
X = X.astype(np.float32)
y = y.astype(np.float32)

# Split data: 60% train, 20% val, 20% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=0)

# Hyperparameters
learning_rate = 0.001
batch_size = 10
num_epochs = 100
n_input = X_train.shape[1]
n_output = 1

n_hidden1 = 50
n_hidden2 = 100

# Define the model with an explicit Input layer
model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_input,)),  # Fixes the input_shape warning
    tf.keras.layers.Dense(n_hidden1, activation='relu'),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    tf.keras.layers.Dense(n_output)  # Linear output for regression
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mse'])

# Train the model
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=0,
                    validation_data=(X_val, y_val))

# Evaluate the model
test_loss, test_mse = model.evaluate(X_test, y_test, verbose=0)
print(f"Test MSE: {test_mse:.2f}")

# Calculate RMSE
test_rmse = np.sqrt(test_mse)
print(f"Test RMSE: {test_rmse:.2f}")


Test MSE: 3856587008.00
Test RMSE: 62101.43


###[Question 4] Create a model for House Prices

In [22]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load dataset (ensure 'train.csv' is in your directory or drive)
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/train.csv")

# Select features and target
features = ['GrLivArea', 'YearBuilt', 'OverallQual', 'TotalBsmtSF',
            '1stFlrSF', '2ndFlrSF', 'BsmtFinSF1', 'GarageArea', 'WoodDeckSF',
            'Fireplaces', 'LotFrontage', 'LotArea', 'MasVnrArea', 'BedroomAbvGr',
            'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces']

# Drop rows with missing values in the selected features
df = df.dropna(subset=features + ['SalePrice'])

X = df[features].values
y = df['SalePrice'].values.reshape(-1, 1)

# Split into train/val/test sets (60% train, 20% val, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)  # 0.25*0.8 = 0.2

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

# Convert to float32 for TensorFlow
X_train = X_train.astype(np.float32)
X_val = X_val.astype(np.float32)
X_test = X_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_val = y_val.astype(np.float32)
y_test = y_test.astype(np.float32)

# Hyperparameters
learning_rate = 0.005
batch_size = 32
num_epochs = 200
n_features = X_train.shape[1]

# Define the model using Input() to suppress warning
model = Sequential([
    Input(shape=(n_features,)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(1)  # Linear activation for regression
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss='mse',
              metrics=['mse'])

# Train the model
print("Starting training...")
history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=1,
                    validation_data=(X_val, y_val))
print("Training finished.")

# Evaluate on test set
print("Evaluating on test set...")
test_loss, test_mse = model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest MSE: {test_mse:.2f}")

# Compute RMSE
test_rmse = np.sqrt(test_mse)
print(f"Test RMSE: ${test_rmse:,.2f}")

# Example prediction
sample_house_data = np.array([[1500, 2003, 7, 1000, 1500, 0, 1000, 400, 100,
                               1, 70, 8000, 200, 3, 1, 7, 1]])
sample_house_scaled = scaler.transform(sample_house_data.astype(np.float32))

predicted_price = model.predict(sample_house_scaled)
print(f"Predicted Price for sample house: ${predicted_price[0][0]:,.2f}")


Starting training...
Epoch 1/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 15ms/step - loss: 39097364480.0000 - mse: 39097364480.0000 - val_loss: 35268038656.0000 - val_mse: 35268038656.0000
Epoch 2/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 40433647616.0000 - mse: 40433647616.0000 - val_loss: 35118989312.0000 - val_mse: 35118989312.0000
Epoch 3/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 37705216000.0000 - mse: 37705216000.0000 - val_loss: 34625261568.0000 - val_mse: 34625261568.0000
Epoch 4/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 40683941888.0000 - mse: 40683941888.0000 - val_loss: 33418534912.0000 - val_mse: 33418534912.0000
Epoch 5/200
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 35884494848.0000 - mse: 35884494848.0000 - val_loss: 31246186496.0000 - val_mse: 31246186496.0000
Epoch 6/200
[1m23/23[

###[Question 5] Create an MNIST model

In [23]:
import numpy as np
import tensorflow as tf

# Load MNIST data using tf.keras.datasets
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(x_train.shape[0], 784).astype('float32') / 255.0
x_test = x_test.reshape(x_test.shape[0], 784).astype('float32') / 255.0

# One-hot encode the labels
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

# Create validation set
validation_split = 0.1
split_index = int(x_train.shape[0] * (1 - validation_split))
x_val, y_val = x_train[split_index:], y_train[split_index:]
x_train, y_train = x_train[:split_index], y_train[:split_index]

# Hyperparameters
learning_rate = 0.001
batch_size = 32
num_epochs = 10
n_input = x_train.shape[1]
n_hidden1 = 512
n_hidden2 = 256
n_classes = num_classes

# Define the model using Keras Sequential API with Input layer
model = tf.keras.Sequential([
    tf.keras.Input(shape=(n_input,)),
    tf.keras.layers.Dense(n_hidden1, activation='relu'),
    tf.keras.layers.Dense(n_hidden2, activation='relu'),
    tf.keras.layers.Dense(n_classes)  # No softmax since using from_logits=True
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model
print("Starting training...")
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=num_epochs,
                    verbose=1,
                    validation_data=(x_val, y_val))
print("Training finished.")

# Evaluate the model on the test set
print("Evaluating on test set...")
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=1)
print(f"\nTest Accuracy: {test_acc:.4f}")

# Predict on a sample image
sample_image = x_test[0].reshape(1, n_input)
predicted_logits = model.predict(sample_image)
predicted_class = np.argmax(predicted_logits)

# True class
true_class = np.argmax(y_test[0])

# Optionally apply softmax to get probabilities
predicted_probs = tf.nn.softmax(predicted_logits).numpy()[0]

print(f"\nSample Image Prediction:")
print(f"  Predicted Logits: {predicted_logits[0]}")
print(f"  Predicted Probabilities (softmax): {predicted_probs}")
print(f"  Predicted Class: {predicted_class}")
print(f"  True Class: {true_class}")


Starting training...
Epoch 1/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 11ms/step - accuracy: 0.8947 - loss: 0.3424 - val_accuracy: 0.9663 - val_loss: 0.1039
Epoch 2/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 10ms/step - accuracy: 0.9743 - loss: 0.0806 - val_accuracy: 0.9770 - val_loss: 0.0762
Epoch 3/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 10ms/step - accuracy: 0.9831 - loss: 0.0522 - val_accuracy: 0.9798 - val_loss: 0.0732
Epoch 4/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 11ms/step - accuracy: 0.9873 - loss: 0.0391 - val_accuracy: 0.9760 - val_loss: 0.0793
Epoch 5/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 10ms/step - accuracy: 0.9904 - loss: 0.0287 - val_accuracy: 0.9818 - val_loss: 0.0768
Epoch 6/10
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 11ms/step - accuracy: 0.9922 - loss: 0.0227 - val_accuracy: 0.9800 - 