<a href="https://colab.research.google.com/github/clemsage/NeuralDocumentClassification/blob/master/skeleton.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Settting up the computing environment


## Install and import TensorFlow 2.0 with GPU

Select "GPU" in the Accelerator drop-down on Notebook Settings through the Edit menu.

In [0]:
!pip install tensorflow-gpu==2.0
import tensorflow as tf
print (tf.__version__)

## Confirm TensorFlow can see the GPU

In [0]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

## Additional information about hardware

In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

For CPU information and RAM, run:

In [0]:
!cat /proc/cpuinfo
!cat /proc/meminfo

## Other useful package imports

In [0]:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter

# Working on the dataset

## Information about the dataset

In [0]:
class_names = ['email', 'form', 'handwritten', 'invoice', 'advertisement']

## Import the dataset

In [0]:
num_examples = 100
height_image, width_image, depth_image = 800, 600, 1
num_classes = len(class_names)

images = np.random.randint(low=0, high=256, size=(num_examples, height_image, 
                                                  width_image, depth_image))
#images_rgb = np.concatenate([images for _ in range(3)], axis=-1)
labels = np.random.randint(low=0, high=num_classes, size=num_examples)

## Explore the data

Get the number of images in the dataset:

In [0]:
print(len(images))

Get the width, height and depth of each image:

In [0]:
print(images[0].shape)

Plot 5 random images of each class in gray scale colors:

In [0]:
plt.figure(figsize=(20, 10))
n_images_per_class = 5

for class_idx in range(len(class_names)):
  labels_idx = np.where(labels == class_idx)[0]
  np.random.shuffle(labels_idx)
  labels_idx = labels_idx[:n_images_per_class]
  
  for i in range(n_images_per_class):
    plt.subplot(len(class_names), n_images_per_class, 
                class_idx*n_images_per_class + i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(np.squeeze(images[labels_idx[i]]), cmap='gray')
    #plt.imshow(images_rgb[labels_idx[i]])
    plt.xlabel(class_names[labels[labels_idx[i]]])

plt.show()

Get the class distribution in the dataset:

In [0]:
print({class_names[key]: val for key, val in Counter(labels).items()})

## Preprocess the data

Reshape images to 299 x 299 pixels

In [0]:
images = tf.image.resize(images, (299, 299))

Scale the pixel values to the range [0, 1]

In [0]:
images = images / 255.0

Verify image shape and pixel values

In [0]:
print(images.shape)
print(np.min(images))
print(np.max(images))