<img src="https://github.com/arthurflor23/handwritten-text-recognition/blob/master/doc/images/000.png?raw=true" />

# Handwritten Text Recognition using TensorFlow 2.0

This tutorial shows how you can use the project of [Handwritten Text Recognition](https://github.com/arthurflor23/handwritten-text-recognition) in your Google Colab.



## 1 Localhost Environment

We'll make sure you have the project in your Google Drive with the datasets in HDF5. If you already have structured files in the cloud, skip this step.

### 1.1 Datasets

The datasets that you can use:

a. [Bentham](http://transcriptorium.eu/datasets/bentham-collection/)

b. [IAM](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database)

c. [Rimes](http://www.a2ialab.com/doku.php?id=rimes_database:start)

d. [Saint Gall](http://www.fki.inf.unibe.ch/databases/iam-historical-document-database/saint-gall-database)

### 1.2 Raw folder

On localhost, download the code project from GitHub and extract the chosen dataset (or all if you prefer) in the **raw** folder. Don't change anything of the structure of the dataset, since the scripts were made from the **original structure** of them. Your project directory will be like this:

```
.
├── doc
│   ├── images
│   └── results
├── LICENSE
├── raw
│   ├── bentham
│   │   ├── BenthamDatasetR0-GT
│   │   └── BenthamDatasetR0-Images
│   ├── iam
│   │   ├── ascii
│   │   ├── forms
│   │   ├── largeWriterIndependentTextLineRecognitionTask
│   │   ├── lines
│   │   └── xml
│   ├── rimes
│   │   ├── eval_2011
│   │   ├── eval_2011_annotated.xml
│   │   ├── training_2011
│   │   └── training_2011.xml
│   └── saintgall
│       ├── data
│       ├── ground_truth
│       ├── README.txt
│       └── sets
├── README.md
├── requirements.txt
└── src
    ├── data
    │   ├── generator.py
    │   ├── preproc.py
    ├── main.py
    ├── network
    │   ├── architecture.py
    │   ├── gated.py
    │   ├── model.py
    ├── transform
    │   ├── bentham.py
    │   ├── iam.py
    │   ├── rimes.py
    │   └── saintgall.py
    └── tutorial.ipynb

```

After that, create virtual environment and install the dependencies with python 3 and pip:

> ```python -m venv .venv && source .venv/bin/activate```

> ```pip install -r requirements.txt```

### 1.3 HDF5 files

Now, you'll run the *transform* function from **main.py**. For this, execute on **src** folder:

> ```python main.py --dataset=<DATASET_NAME> --transform```

Your data will be preprocess and encode, creating and saving in the **data** folder. Now your project directory will be like this:


```
.
├── data
│   ├── bentham.hdf5
│   ├── iam.hdf5
│   ├── rimes.hdf5
│   └── saintgall.hdf5
├── doc
│   ├── images
│   └── results
├── LICENSE
├── raw
│   ├── bentham
│   │   ├── BenthamDatasetR0-GT
│   │   └── BenthamDatasetR0-Images
│   ├── iam
│   │   ├── ascii
│   │   ├── forms
│   │   ├── largeWriterIndependentTextLineRecognitionTask
│   │   ├── lines
│   │   └── xml
│   ├── rimes
│   │   ├── eval_2011
│   │   ├── eval_2011_annotated.xml
│   │   ├── training_2011
│   │   └── training_2011.xml
│   └── saintgall
│       ├── data
│       ├── ground_truth
│       ├── README.txt
│       └── sets
├── README.md
├── requirements.txt
└── src
    ├── data
    │   ├── generator.py
    │   ├── preproc.py
    ├── main.py
    ├── network
    │   ├── architecture.py
    │   ├── gated.py
    │   ├── model.py
    ├── transform
    │   ├── bentham.py
    │   ├── iam.py
    │   ├── rimes.py
    │   └── saintgall.py
    └── tutorial.ipynb

```

Then upload the **data** and **src** folders in the same directory in your Google Drive.

## 2 Google Drive Environment


### 2.1 TensorFlow 2.0

Make sure the jupyter notebook is using GPU mode. Try to use **Tesla T4** instead of Tesla K80 (faster).

In [0]:
import tensorflow as tf

device_name = tf.test.gpu_device_name()

if device_name != "/device:GPU:0":
    raise SystemError("GPU device not found")

print(f"Found GPU at: {device_name}\n")

!nvidia-smi

Now, we'll install TensorFlow 2.0 with GPU support.

In [0]:
!pip install -q tensorflow-gpu==2.0.0-beta1

### 2.2 Google Drive

Mount your Google Drive partition.

**Note:** *\"Colab Notebooks/handwritten-text-recognition/src/\"* was the directory where you put the project folders, specifically the **src** folder.

In [0]:
from google.colab import drive

drive.mount("./gdrive")

%cd "./gdrive/My Drive/Colab Notebooks/handwritten-text-recognition/src/"
!ls -l

After mount, you can see the list os files in the project folder.

## 3 Set Python Classes

### 3.1 Environment

First, let's define our environment variables.

Set the main configuration parameters, like input size, batch size, number of epochs and list of characters. This make compatible with **main.py** and jupyter notebook:

* **dataset**: "bentham", "iam", "rimes", "saintgall"

* **arch**: network to run: "bluche", "puigcerver", "flor"

* **epochs**: number of epochs

* **batch_size**: number size of the batch

In [0]:
import os

# define parameters
dataset = "iam"
arch = "flor"
epochs = 1000
batch_size = 16

# define paths
hdf5_src = os.path.join("..", "data", f"{dataset}.hdf5")
output = os.path.join("..", "output", f"{dataset}_{arch}")

# define input size, number max of chars per line and list of valid chars
input_size = (1024, 128, 1)
max_text_length = 128
charset = "".join([chr(i) for i in range(32, 127)])

print("source:", hdf5_src)
print("output", output)
print("charset:", charset)

### 3.2 DataGenerator Class

The second class is **DataGenerator()**, responsible for:

* Load the dataset partitions (train, valid, test);

* Manager batchs for train/validation/test process.

In [0]:
from data.generator import DataGenerator

dtgen = DataGenerator(hdf5_src=hdf5_src,
                      batch_size=batch_size,
                      max_text_length=max_text_length)

print(f"Train images: {dtgen.total_train}")
print(f"Validation images: {dtgen.total_valid}")
print(f"Test images: {dtgen.total_test}")

### 3.3 HTRModel Class

The third class is **HTRModel()**, was developed to be easy to use and to abstract the complicated flow of a HTR system. It's responsible for:

* Create model with Handwritten Text Recognition flow, in which calculate the loss function by CTC and decode output to calculate the HTR metrics (CER, WER, SER);

* Save and load models;

* Load weights in the models, if exists;

* Make Train/Predict process using *generator*.

To make a dynamic HTRModel, its parameters are the *input_layer* and *output_layer* from your own network (default code has Bluche and Puigcerver implementations on **network/architecture.py**).
The last parameter is the list of chars you want to work with (default is 96 chars from ASCII).

In [0]:
from network.model import HTRModel
from network import architecture

# get the input_layer, output_layer and optimizer from default network
network_func = getattr(architecture, arch)
ioo = network_func(input_size=input_size, output_size=len(charset) + 1)

# initiate and compile the HTRModel
model = HTRModel(inputs=ioo[0], outputs=ioo[1], charset=charset)
model.compile(optimizer=ioo[2])

# save network summary
model.summary(output, "summary.txt")

# load checkpoint weights (HDF5) if exists and get default callbacks
checkpoint = "checkpoint_weights.hdf5"

model.load_checkpoint(output, checkpoint)
callbacks = model.callbacks(logdir=output, hdf5_target=checkpoint)

## 4 Tensorboard

To facilitate the visualization of the model's training, you can instantiate the Tensorboard. 

**Note**: All data is saved in the output folder

In [0]:
%load_ext tensorboard
%tensorboard --reload_interval=180 --logdir={output}

## 5 Training

The training process is similar to the *fit_generator* of the Keras. After training, the information (epochs and minimum loss) is save.

In [0]:
import time

# to calculate total and average time per epoch
start_time = time.time()

h = model.fit_generator(generator=dtgen.next_train_batch(),
                        epochs=epochs,
                        steps_per_epoch=dtgen.train_steps,
                        validation_data=dtgen.next_valid_batch(),
                        validation_steps=dtgen.valid_steps,
                        callbacks=callbacks,
                        shuffle=True,
                        verbose=1)

total_time = time.time() - start_time

loss = h.history['loss']
val_loss = h.history['val_loss']

min_val_loss = min(val_loss)
min_val_loss_i = val_loss.index(min_val_loss)

train_corpus = "\n".join([
    f"Total train images:       {dtgen.total_train}",
    f"Total validation images:  {dtgen.total_valid}",
    f"Batch:                    {batch_size}\n",
    f"Total time:               {(total_time / 60):.0f} min",
    f"Average time per epoch:   {(total_time / len(loss)):.0f} sec\n",
    f"Total epochs:             {len(loss)}",
    f"Best epoch                {min_val_loss_i + 1}\n",
    f"Training loss:            {loss[min_val_loss_i]:.4f}",
    f"Validation loss:          {min_val_loss:.4f}"
])

with open(os.path.join(output, "train.txt"), "w") as lg:
    print(f"\n{train_corpus}")
    lg.write(train_corpus)

## 6 Predict/Evaluate

The predict function mescle two functionalities: (1) *predict*, (2) *evaluate*.

The predict process is similar to the *predict_generator* of the Keras, but has one more parameter **metrics**. If, you pass this parameter, will be calculate metrics (loss, cer, wer, ser) and get the predictions yet. After that, the information (predicts and metrics) is save in output directory.

In [0]:
# predict[0]: ground truth
# predict[1]: predict
# evaluate: metrics parameter
predict, evaluate = model.predict_generator(generator=dtgen.next_test_batch(),
                                            steps=dtgen.test_steps,
                                            metrics=["loss", "cer", "wer", "ser"],
                                            verbose=1)

# save evaluation
eval_corpus = "\n".join([
    f"Total test images:    {dtgen.total_test}\n",
    f"Metrics:",
    f"Test Loss:            {evaluate[0]:.4f}",
    f"Character Error Rate: {evaluate[1]:.4f}",
    f"Word Error Rate:      {evaluate[2]:.4f}",
    f"Sequence Error Rate:  {evaluate[3]:.4f}"
])

with open(os.path.join(output, "evaluate.txt"), "w") as lg:
    lg.write(eval_corpus)
    print(f"\n{eval_corpus}")

# save predicts
pred_corpus = "\n".join([f"L: {l}\nP: {p}\n" for (l, p) in zip(predict[0], predict[1])])

with open(os.path.join(output, "predict.txt"), "w") as lg:
    lg.write(pred_corpus)