## Backpropagation
Back propagation is a fundamental technique used in the training of neural networks which helps in optimizing the weights and biases of a model based on the error between the predicted output and the actual output. The basic idea behind this technique is to calculate the gradient of the loss function with respect to each weight and bias in the model. The gradient tells us how much the loss function will be affected by changing the weights and bias by a small amount. The main goal is to reduce the loss which is achieved by iteratively updating the weights and bias of the model based on the gradient.

Backpropagation consists of two phases - the first one is a feedforward pass and the later is a backward pass where the weights and bias are optimized.

### Feedforward Pass:
This is the first step in the training of a neural network where the data flows from the input layer to the output layer through certain hidden layers, undergoing essential computations. Neurons in each layer perform weighted sum calculations, and apply activation functions, capturing intricate data patterns. Hidden layers transform the data into hierarchical features, aiding in understanding complex structures. The process culminates at the output layer, producing predictions or classifications. During training, neural networks optimize weights and biases through backpropagation, enhancing their predictive accuracy. This process, combined with feedforward pass, empowers neural networks to learn and excel in various applications.

### Backward Pass:
The backward pass is a critical phase in neural network training, initiated after making predictions to minimize errors and enhance accuracy. It calculates the disparity between actual and predicted values, aiming to reduce this error. In this phase, error information is retroactively propagated from the output layer to the input layer. The key objective is to compute gradients with respect to the network's weights and biases. These gradients reveal the contribution of each weight and bias to the error, helping the network understand how to adjust parameters to minimize errors systematically. Through backpropagation, neural networks iteratively fine-tune their parameters, ultimately improving their predictive capabilities.

Then the weights get updated, and both the passes run iteratively till we get reduced loss.

### Back propagation in TensorFlow
TensorFlow is one of the most popular deep learning libraries which helps in efficient training of deep neural networks. Now let's deep dive into how back propagation works in TensorFlow.

In tensorflow, back propagation is calculated using automatic differentiation, which is a technique where we don't explicitly compute the gradients of the function. When we define the neural network, tensorflow automatically creates a computational graph that represents the flow of data through the network. Each node consists of the mathematical operation that takes place during both the forward as well as backward pass.

The goal of back propagation is to optimize the weights and biases of the model to minimize the loss. So, we use tensorflow's automatic differentiation capabilities to compute the gradient of the loss function with respect to weights and biases. When the variable is defined, its takes a trainable parameter which can be set to True, which tells TensorFlow to keep track of its value during training and compute its gradient with respect to the loss function.

Once we have the gradients, there are certain optimizers in Tensorflow such as SGD, Adagrad, and Adam which can be used to update the weights accordingly.

#### Implementing Back propagation
Installing of the libraries

In [None]:
#%pip install tensorflow


First install tensorflow in your system by entering the command in your terminal

**Importing Libraries**


In [1]:
#importing libraries 
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split



Here, we are importing all the important libraries need to build the model.
The following libraries are:

- Numpy: A Python library for numerical computations, including support for large, multi-dimensional arrays and matrices, along with a wide array of mathematical functions.
- Sklearn: A python library for machine learning that provides tools for data preprocessing, modelling, evaluation, and various algorithms for classification, regression, and more.
Loading the dataset


In [2]:
# Load the Iris dataset
iris = datasets.load_iris()

# Extract the features (X) and target labels (y) from the dataset
# X contains the feature data
X = iris.data
# y contains the target labels
y = iris.target



Here in this Code, we are gathering the data and preprocessing it. Preprocessing of the data require cleaning of data, removal of outliers, and if the numerical data is huge then scaling it to a specific range. In order to study the model , spilt the prepared data into training and testing data.

Training and Testing the model



In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)



Here, We divide the the iris dataset into training set (80%) and testing set(20%) to facilitate the development and evaluation of the model. The 'random_state' argument is set for reproducibility, ensuring that same split is obtained each time the code is run.

**Defining a machine learning model**

Here, we are defining a model using tensorflow's Keras.
The model consist of two layers:

Dense Layer: A hidden layer with the ReLU activation function and an input shape that matches the number of the features in the training data.
Output Layer: An output layer has three neurons and uses a softmax activation function to produce class probabilities.


In [None]:

# Define the neural network architecture 兩層,輸出層有3個神經元
hidden_layer_size =32
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(3, activation='softmax')  # 3 classes for Iris dataset
])

model.summary()


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


**Loss function and optimizer**

Here we are defining the loss function and optimizer used for the model.

In [8]:
# Define hyperparameters
learning_rate = 0.01
epochs = 1000
hidden_layer_size = 10

# Define the loss function and optimizer
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.SGD(learning_rate)


- Sparse Categorical Crossentropy: It is a loss function used in classification tasks where target and labels are integers. It calculates the cross-entropy loss between the predicted class probabilities and true class labels, automatically converting integer labels to one-hot encoded vectors internally.
- Stochastic Gradient Descent(SGD): It is an optimization algorithm used for training models. It updates model parameters using small, randomly sampled subsets of the training data, which introduces randomness and helps the model converge to a solution faster and potentially escape local minima.



{{< notice type="info" class="" >}}
**Categorical Crossentropy**

```python
import numpy as np 
import tensorflow as tf
y_true = np.array([1, 2])  # 原始資料為整數分類
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])  # 預測概率
delta = 1e-7
y_true_one_hot = tf.keras.utils.to_categorical( # 獨熱編碼
    y_true,  # 類數組，類值要轉換為矩陣（從 0 到 分類數-1 的整數）
    num_classes=3,  # 分類數
    dtype="float32" # 輸出數據類型，默認為float32
)
print(y_true_one_hot)
```
```
[[0. 1. 0.]
 [0. 0. 1.]]
``` 
![](tensorflow_bpn.files/2024-12-02-16-56-56.png)

#### 利用tf.keras.losses.SparseCategoricalCrossentropy 實現
```python
scce = tf.keras.losses.SparseCategoricalCrossentropy()
print(scce(y_true, y_pred).numpy())
```
```
1.1769392490386963
```

#### numpy 實現
```python
y_pred_delta = y_pred+delta  # 添加一個微小值可以防止負無限大(np.log(0))的發生。
print(y_pred_delta)

# [[5.0e-02 9.5e-01 1.0e-11]
#  [1.0e-01 8.0e-01 1.0e-01]]

y_pred_log = np.log(y_pred_delta)  # log表示以e為底數的自然對數
print(y_pred_log)

# [[ -2.99573227  -0.05129329 -25.32843602]
#  [ -2.30258509  -0.22314355  -2.30258509]]

print(y_true_one_hot*y_pred_log)

# [[-0.         -0.05129329 -0.        ]
#  [-0.         -0.         -2.30258509]]

loss = -np.sum(y_true_one_hot*y_pred_log)/2
print(loss)

# 1.1769392490386963
```

{{< /notice >}}

**Backpropagation**

Now implement the backpropagation on the trained model in a loop called training loop.


In [9]:
# Training loop
# Iterate through a specified number of training epochs

for epoch in range(epochs):

    # Use TensorFlow's GradientTape to record operations for automatic differentiation
    with tf.GradientTape() as tape:
        # Forward pass: Compute predictions (logits) by passing training data through the neural network
        logits = model(X_train)

        # Calculate the loss by comparing predicted logits with the true training labels (y_train)
        loss_value = loss_fn(y_train, logits)

    # Backpropagation: Compute gradients of the loss with respect to model parameters
    grads = tape.gradient(loss_value, model.trainable_variables)

    # Apply the computed gradients to update the model's parameters using the specified optimizer
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Print the loss at regular intervals to monitor training progress
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch + 1}/{epochs}, Loss: {loss_value.numpy()}")
        


Epoch 100/1000, Loss: 0.5040257573127747
Epoch 200/1000, Loss: 0.3756517469882965
Epoch 300/1000, Loss: 0.31077930331230164
Epoch 400/1000, Loss: 0.26728513836860657
Epoch 500/1000, Loss: 0.23448063433170319
Epoch 600/1000, Loss: 0.20877058804035187
Epoch 700/1000, Loss: 0.18835049867630005
Epoch 800/1000, Loss: 0.17195174098014832
Epoch 900/1000, Loss: 0.15864413976669312
Epoch 1000/1000, Loss: 0.1477571427822113


In the above code, it represents a training loop for a neural network. It iterates through a specified number of epochs, computing predictions, calculating predictions and loss, and updating model parameters using backpropagation and an optimizer. Training progress is monitored by printing the loss every 100 epochs.
Clearly, with increasing epochs the loss is gradually decreasing. This is a result of backpropagation, its adjusting the weights of layers according to the desired output in order to achieve higher accuracy.

**Advantages**

Efficient Gradient Calculation: Tensorflow's automatic differentiation capabilities make it efficient to compute gradients during backpropagation. This is crucial for optimization the mode's parameters.
Flexibility: Tensorflow allows you to define and customize complex neural network architectures easily, making it suitable for wide range of ML tasks.

GPU Acceleration: Tensorflow effortlessly integrates with GPUs, which can significantly speed up the training process for neural networks.
Deployment: Tensorflow provides tools for converting trained models into formats suitable for deployment on various platforms, including mobile devices and the web.

** Disadvantages**

Increased memory consumption: It requires storing the intermediate values during forward and backward passes to complete gradients.
Computational overhead: Using Automatic differentiation for simple functions can create a significant computational overhead. So, its better to generate gradients manually for these functions.

Updates and Compatibility: Tensorflow occasionally introduces updates and changes that may require adjustments to existing code. Compatibility with older versions can be a concern for long term projects.
Resource Intensive: Training deep neural networks with tensorflow can be resource intensive, requiring powerful GPUs or TPUs, which may not be readily available to everyone.

Are you passionate about data and looking to make one giant leap into your career? Our Data Science Course will help you change your game and, most importantly, allow students, professionals, and working adults to tide over into the data science immersion. Master state-of-the-art methodologies, powerful tools, and industry best practices, hands-on projects, and real-world applications. Become the executive head of industries related to Data Analysis, Machine Learning, and Data Visualization with these growing skills. Ready to Transform Your Future? Enroll Now to Be a Data Science Expert!