# Tensorflow debugging basics

This notebook describes basic strategies for debugging models written in tensorflow.  We use a buggy version of the basic text classification example and debug it.


Colab has very limited debugging support:  although we've been running our models in colab so far, In order to verify correctness of any model, it is useful to run the code locally and overfit on a small dataset. Therefore **run this notebook locally on your computer** within [VSCode](https://code.visualstudio.com/). if you run into memory issues, try using a smaller batch size.

This notebook uses VSCode for illustration but alternatively, you should also be able to use Pycharm.



#### Prerequisite: Install VSCode and relevant plugins

1. Follow [these instructions](https://code.visualstudio.com/docs/setup/setup-overview) to install VSCode. 
2. Install VSCode Plugins for [Python](https://code.visualstudio.com/docs/python/python-tutorial) and [Jupyter](https://code.visualstudio.com/docs/datascience/jupyter-notebooks).
3. Make sure you're using python version 3.9 or 3.10 by running the below

```
python --version
```

_Python 3.10.0_

4. Setup a new virtual environment. On the command line (Powershell on windows, Terminal on Mac) run the following commands one at a time to create a new virtual env. In the example below, `debug_notebook` is our working directory and `tfdebug` is the name of the virtual env. You can pick any names you like.

```
mkdir debug_notebook
cd debug_notebook
python -m pip install --upgrade pip
python -m venv tfdebug
```
`source tfdebug/bin/activate` [`.\tfdebug\Scripts\activate` on windows]


5. Install required libraries in virtualenv `tfdebug` by running following commands on the command line

 ```
pip install ipykernel sklearn nltk matplotlib
pip install tensorflow
pip install tensorflow-datasets
pip install pydot
pip install transformers
 ```

 6. Copy this notebook into the `debug_notebook` directory created 
 7. Below command launches vscode. Open this notebook within the IDE
```
code .

```
8. Ensure that the python version used within vscode points to the version within the `tfdebug` venv as seen in image below



In [1]:

%%html
<img src="http://drive.google.com/uc?export=view&id=1zKmOCFOjbLMbv8tW4v8a-gF0l3iIA1kc">

##### Turn off existing GPU if any

In [None]:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

#### Step debugging

(You can skip this section if you're familiar with debugging within an IDE)

If you have the required setup, you should be able to step through this _(poorly written)_ fibonacci sequence below within VSCode. Try to find the bug by stepping through this block. 

1. Add a breakpoint within the `for` loop
2. Instead of running this block below, click on the dropdown next to the run icon in cell below and click debug instead.
3. Step through the code using the icons at the top. Make sure you understand stepping over, stepping into, step out and conditional evaluation
4. Try evaluating an expression in the debug console (screenshot below)

Check the [VSCode documentation](https://code.visualstudio.com/docs/editor/debugging) for more details.

_hint: when does the function terminate_

In [None]:
def fib(n):
    return fib(n - 1) + fib(n - 2) 

print(fib(5))

In [2]:

%%html
<img src="http://drive.google.com/uc?export=view&id=15r86iTVU9QG-bSNDyh5RzqBch7DjDevR">

### Debugging BERT Classification model

In order to demonstrate debugging a tensorflow model, we use the BERT Classification Model example from the lesson 4 notebook. Assuming that resources are limited on local computers, we load 20 records for training and 5 for test. The setup code has been hidden so we can focus on the debugging concepts. 

In [None]:
#@title
import numpy as np
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.layers import Embedding, Input, Dense, Lambda
from tensorflow.keras.models import Model
import tensorflow.keras.backend as K
import tensorflow_datasets as tfds



import sklearn as sk
import os
import nltk
from nltk.data import find

import matplotlib.pyplot as plt

import re


from transformers import BertTokenizer, TFBertModel

train_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:80%]', 'test[80%:]'),
    as_supervised=True)

num_train_examples = 20      # set number of train examples - 1500 for realtime demo
num_test_examples = 5        # set number of test examples - 500 for realtime demo

#make it easier to use a variety of BERT subword models
model_checkpoint = 'bert-base-cased'

train_examples, train_labels = next(iter(train_data.batch(num_train_examples))) # load 2000 records for training
test_examples, test_labels = next(iter(test_data.batch(num_test_examples))) # load 500 records for test

# BERT Tokenization of training and test data
max_length = 128                 # set max_length

all_train_examples = [x.decode('utf-8') for x in train_examples.numpy()]
all_test_examples = [x.decode('utf-8') for x in test_examples.numpy()]

bert_tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

x_train = bert_tokenizer(all_train_examples[:num_train_examples],
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_train = train_labels[:num_train_examples]

x_test = bert_tokenizer(all_test_examples[:num_test_examples],
              max_length=max_length,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_test = test_labels[:num_test_examples]



#### Custom model implementation and eager execution

In previous examples and notebooks in the course so far, we've been training our models using high level APIs, such as `model.fit`, which hides most of the complexity. However, Keras allows you to [customize](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit) what happens in `model.fit`, exposing the underlying steps. This can be useful for more complex models as well as debugging. Keras provides *progressive disclosure of complexity* when needed, therefore, the `train_step` and `test_step` functions can be removed and the model will still work using the default implementation in the base class. 

It is also important to understand the difference between the default Graph mode vs the eager mode. [This link](https://www.tensorflow.org/guide/intro_to_graphs) has a good explanation of the difference. We need to run in eager mode in order to debug the model

In [None]:
from inspect import trace
from numpy import int32
import traceback
import pickle

class BertClassificationModel(Model):
    def __init__(self,  checkpoint, hidden_size=201,  dropout=0.3):
        super().__init__()
        self.bert_model = TFBertModel.from_pretrained(checkpoint)
        self.top_layer = tf.keras.layers.Dense(hidden_size, activation='relu', name='hidden_layer')
        self.classifier = tf.keras.layers.Dense(1, activation='sigmoid', name='classification_layer')
        self.dropout = tf.keras.layers.Dropout(dropout)
        self.hidden_size = hidden_size

    def call(self, inputs, training=True):
        input_ids, token_type_ids, attention_mask = inputs

        result = self.bert_model(input_ids, token_type_ids, attention_mask)
        cls_out = result[1]
        hidden = self.top_layer(cls_out)
        hidden = self.dropout(hidden)
        classification = self.classifier(cls_out)
        return classification

    def train_step(self, data):
        # Unpack the data. Its structure depends on your model and
        # on what you pass to `fit()`.
        x, y = data
        self.x, self.y = x, y

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  # Forward pass
            # Compute the loss value
            # (the loss function is configured in `compile()`)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)
        # Update weights
        self.optimizer.apply_gradients((grad, var) for grad, var in zip(gradients, trainable_vars) if grad is not None)
        # Update metrics (includes the metric that tracks the loss)
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value
        return {m.name: m.result() for m in self.metrics}

    def test_step(self, data):
        # Unpack the data
        x, y = data
        self.x, self.y = x, y

        # Compute predictions
        y_pred = self(x, training=False)
        # Updates the metrics tracking the loss
        self.compiled_loss(y, y_pred, regularization_losses=self.losses)
        # Update the metrics.
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value.
        # Note that it will include the loss (tracked in self.metrics).
        return {m.name: m.result() for m in self.metrics}

#### Debug the code block below

The block below does the same thing as the one in the lesson notebook, however has an error. In order to debug, set the `run_eagerly` flag to `true`. This will prevent the model from executing in graph mode and we're able to step through the model execution.
Step through the code and check where the code fails. Try different things:
* Add a breakpoint somewhere within the `train_step` function
* Step through the code
* Check the shapes of the `x` and `y`. See the values of `x` and `y`
* Evaluate the output of the `self.compiled_loss(..)` call directly in the debug console

Once the bug is fixed, set `run_eagerly` to `False` and then try to debug. Does it work?

In [None]:
bert_classification_model = BertClassificationModel(checkpoint=model_checkpoint)

bert_classification_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.00005),
                                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                                metrics='accuracy')
bert_classification_model.run_eagerly = True
try:
    bert_classification_model_history = bert_classification_model.fit(
        [x_train.input_ids, x_train.token_type_ids, x_train.attention_mask],
        y_train,
        validation_data=([x_test.input_ids, x_test.token_type_ids, x_test.attention_mask], y_test),
        batch_size=2,
        epochs=2
    )
except Exception as e:
    bert_classification_model.save_weights('cl_model_weights.ckpt')
    features_dict = {}
    for n, v in enumerate(bert_classification_model.x):
        features_dict[n] = v.numpy()
    with open('cl_features.pkl','wb') as f:
        pickle.dump(features_dict,f)
    labels_dict = {}
    labels_dict["y"] = bert_classification_model.y
    with open('cl_labels.pkl', 'wb') as f:
        pickle.dump(labels_dict, f)
    raise e