##### **Background:**

Our company develops innovative Artificial Intelligence and Computer Vision solutions that revolutionize industries. Machines that can see: We pack our solutions in small yet intelligent devices that can be easily integrated to your existing data flow. Computer vision for everyone: Our devices can recognize faces, estimate age and gender, classify clothing types and colors, identify everyday objects and detect motion. Technical consultancy: We help you identify use cases of artificial intelligence and computer vision in your industry. Artificial intelligence is the technology of today, not the future.

MonReader is a new mobile document digitization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.



MonReader is a new mobile document digitalization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.



##### **Data Description:**

We collected page flipping video from smart phones and labelled them as flipping and not flipping.

We clipped the videos as short videos and labelled them as flipping or not flipping. The extracted frames are then saved to disk in a sequential order with the following naming structure: VideoID_FrameNumber

##### **Goal(s):**

Predict if the page is being flipped using a single image.

##### **Success Metrics:**

Evaluate model performance based on F1 score, the higher the better.

##### **Bonus(es):**

Predict if a given sequence of images contains an action of flipping.

In [1]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.models import Model
from tensorflow.keras.metrics import Precision, Recall
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import classification_report
import numpy as np

##### **Load Data and Augment Training Data**

In [6]:
# Directory setup
train_data_dir = 'images/training'
test_data_dir = 'images/testing'

img_width, img_height = 224, 224

#data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=30,
    width_shift_range=0.3,  
    height_shift_range=0.3,
    shear_range=0.3,
    zoom_range=0.3,
    horizontal_flip=True,
    fill_mode='nearest')

#data generator for testing (no augmentation)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=32,
    class_mode='binary')

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_width, img_height),
    batch_size=32,
    class_mode='binary')

Found 2392 images belonging to 2 classes.
Found 597 images belonging to 2 classes.


##### **Model Creation and Design**

In [None]:
#model architecture
inputs = Input(shape=(img_width, img_height, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = MaxPooling2D((2, 2))(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2))(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = GlobalAveragePooling2D()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=predictions)

#calculate class weights
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_generator.classes),
    y=train_generator.classes)
class_weights = dict(enumerate(class_weights))

#metrics
precision = Precision()
recall = Recall()

def f1_score(y_true, y_pred):
    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))

##### **Model Training**

In [7]:
model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              metrics=[f1_score, 'accuracy', precision, recall, 'AUC'])

#model training
epochs = 10
history = model.fit(
    train_generator,
    epochs=epochs,
    validation_data=test_generator,
    class_weight=class_weights)

#save the model
model.save('page_flip_classifier.h5')

Epoch 1/10


  self._warn_if_super_not_called()


[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 474ms/step - AUC: 0.6317 - accuracy: 0.5933 - f1_score: 0.5038 - loss: 0.6675 - precision_1: 0.6108 - recall_1: 0.5169 - val_AUC: 0.6944 - val_accuracy: 0.5628 - val_f1_score: 0.7023 - val_loss: 0.6806 - val_precision_1: 0.5405 - val_recall_1: 1.0000
Epoch 2/10
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 467ms/step - AUC: 0.7894 - accuracy: 0.7278 - f1_score: 0.7465 - loss: 0.5701 - precision_1: 0.7129 - recall_1: 0.7747 - val_AUC: 0.7561 - val_accuracy: 0.4858 - val_f1_score: 0.0000e+00 - val_loss: 0.7051 - val_precision_1: 0.0000e+00 - val_recall_1: 0.0000e+00
Epoch 3/10
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 468ms/step - AUC: 0.8522 - accuracy: 0.7825 - f1_score: 0.8148 - loss: 0.4852 - precision_1: 0.7617 - recall_1: 0.8415 - val_AUC: 0.8100 - val_accuracy: 0.4858 - val_f1_score: 0.0000e+00 - val_loss: 1.0447 - val_precision_1: 0.0000e+00 - val_recall_1: 0.0000e+00
Epoch 



##### **Performance Metrics**

In [8]:
# Prediction and evaluation
y_pred = model.predict(test_generator)
y_pred = (y_pred > 0.5).astype(int)
y_true = test_generator.classes
report = classification_report(y_true, y_pred)
print(report)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 215ms/step
              precision    recall  f1-score   support

           0       0.49      0.30      0.37       290
           1       0.52      0.70      0.59       307

    accuracy                           0.51       597
   macro avg       0.50      0.50      0.48       597
weighted avg       0.50      0.51      0.49       597

