# Auto-encoders for Document Denoising

## About Autoencoders
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name.

![](https://osclasspoint.com/kaggle/autoencoder.png)

## Import libraries and data

In [None]:
import numpy as np
import matplotlib as mpl
import os
import cv2

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Dropout, BatchNormalization, Input

%matplotlib inline

In [None]:
# special need for Google Colab
from google.colab import drive
drive.mount('/content/drive')
os.chdir("/content/drive/MyDrive/ColabNotebooks/DocDenoise")
!ls

In [None]:
# check GPU details
!nvidia-smi

In [None]:
# the whole data path
path = 'data/'
# the directory storing images to be processed
to_process_path = 'to_process/'
# the directory storing processed images
processed_path = 'processed/'
# list storing image filenames
to_process_img = sorted(os.listdir(path + to_process_path))

## Data preparation
Next step is to define function to process images and then store this images in list. As there is not as many data, we do not need to work in batches.

In [None]:
IMG_WIDTH = 3024
IMG_HEIGHT = 4032

# prepare function
def process_image(path):
    img = cv2.imread(path)
    img = np.asarray(img, dtype="float32")
    img = cv2.resize(img, (IMG_WIDTH, IMG_HEIGHT))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = img/255.0
    img = np.reshape(img, (IMG_HEIGHT, IMG_WIDTH, 1))
    
    return img

In [None]:
# preprocess images
test_chinese = []

for f in to_process_img:
    test_chinese.append(process_image(path + to_process_path + f))

test_chinese = np.asarray(test_chinese)

## Modeling

In [None]:
def model():
    input_layer = Input(shape=(IMG_HEIGHT, IMG_WIDTH, 1))
    
    # encoding
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(input_layer)
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    x = BatchNormalization()(x)

    x = MaxPooling2D((2, 2), padding='same')(x)
    
    x = Dropout(0.5)(x)

    # decoding
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = BatchNormalization()(x)

    x = UpSampling2D((2, 2))(x)

    output_layer = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
    model = Model(inputs=[input_layer], outputs=[output_layer])
    opt = tf.keras.optimizers.Adam()
    model.compile(optimizer=opt, loss='mean_squared_error', metrics=['mae'])

    return model


model = model()
model.summary()

### Load model

In [None]:
# Restore the weights
model.load_weights('./checkpoints/autoencoders/checkpoint_kaggle_80eps')

## Denoising and Save images

In [None]:
Y_test_chinese = model.predict(test_chinese, verbose=1, batch_size=1)
i = 0
for image in Y_test_chinese:
  im_path = path + processed_path + to_process_img[i]
  mpl.image.imsave(im_path, image[:,:,0], cmap='gray')
  i += 1

## Next steps
- Training the model on a larger dataset
- Tuning parameters to achieve greater performance
- Fine-tuning the models on a different dataset to implement more functions (e.g., watermark removal and motion deblur)