<a href="https://colab.research.google.com/github/HidekiAI/ML-manga109-OCR/blob/trunk/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


First two are essential, but not necessarily needed for both CoLab and local Jupyter-notebook. But without these, when you crash or restart, you cannot skip it... For CoLab, you must first make sure remote drive is mounted. To align BASH and Python scripts to work on multiple platform, for local, you'd need to either soft-link (or junction) and/or mount (i.e. `mount bind`).

Note that below is ONLY necessary for Google CoLab to access your Google Drive. If on Notepad/Jupyter, do the following instead (not exact, just the example):

-   Linux: make sure to `ln -sv ~/Google/MyDrive /content/drive` to softlink your Google G-Drive as `/content/drive`
-   Windows: From DOS Command Prompt (right clock to launch as Admin) `mklink.exe /D "C:/content/drive" "C:/Users/HidekiAI/Google/MyDrive/"` to create a dir-junction


In [None]:
#!/usr/bin/python
# No need to execute this if running locally, this is only for Google CoLab usage
from google.colab import drive
drive.mount('/content/drive')

Next, we'll need the (official) tools/libraries to read manga109 (annotation) data from https://github.com/manga109. This is essential to both CoLab and local dev'ing.


In [None]:
#!/bin/bash
# MUST run ths on BOTH CoLab and local...
!pip install manga109api

I want to know which version of TF is installed, I cannot run GPU version on my local machine... If it returns empty array '[]' for both CPU and GPU, then you'd need to do the next step first and come back here. If you do verify you have either CPU or GPU installed, you can skip most of the diagnostic-checks for TensorFlow and go straight to the script where it defines the globals for src and dest data dirs.


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration
import tensorflow as tf

# Check TensorFlow version
print("TensorFlow version:", tf.__version__)

# Check TensorFlow configuration
print("TensorFlow configuration:")
print(tf.config.list_physical_devices('GPU'))  # List available GPUs
print(tf.config.list_physical_devices('CPU'))  # List available CPUs

We want to make sure TensorFlow is installed in the Python (virtual) environment for local setup...

-   TensorFlow Object Detection is now depracated
-   TensorFlow Addons (for using TF-Vision) sunsets on May, 2024 and needs to be switched over to Keras, in which it should be accessible directly as long as TF is installed


In [None]:
#!/bin/bash
# NOTE: NO NEED to run this on CoLab, only on local...
!pip install --upgrade pip

!pip install -U --pre tensorflow=="2.*"
!pip install tensorflow
# Comment above and uncomment below if you want to install tensorflow-gpu instead of tensorflow on CoLab
#!pip install tensorflow-gpu
#pip install tensorflow[and-cuda]

!pip install transformers
!pip install tf-models-official
!pip install tf-keras-vis

Next, I'd like to absolutely make sure we have access to TF-Vision for text detection; Because tensorflow-addons has become sunset as of May, 2024, we just need to verify that keras is accessible...


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration
from tensorflow.keras.layers import Input, Dense, Conv2D, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential, Model
from keras.applications import MobileNetV2
from keras import layers
import tensorflow as tf

# Check TensorFlow version
print("TensorFlow version:", tf.__version__)


def create_model(input_shape, num_classes):
    inputs = Input(shape=input_shape)
    x = Conv2D(32, (3, 3), activation='relu')(inputs)
    x = MaxPooling2D((2, 2))(x)
    x = Conv2D(64, (3, 3), activation='relu')(x)
    x = MaxPooling2D((2, 2))(x)
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = Model(inputs=inputs, outputs=outputs)
    return model


def create_ssd_model(num_classes, image_size=(224, 224), weights='imagenet', include_top=False):
    base_model = MobileNetV2(input_shape=(
        image_size[0], image_size[1], 3), weights=weights, include_top=include_top)

    for layer in base_model.layers:
        layer.trainable = False

    ssd_output = layers.Conv2D(num_classes, kernel_size=(
        1, 1), activation='softmax')(base_model.output)

    model = Model(inputs=base_model.input, outputs=ssd_output)

    return model


# Access Keras functionality through tf.keras
# Define a simple Sequential model
test_keras_model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
test_keras_model.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

# Print model summary
test_keras_model.summary()

# Also should verify model creations based on what I am using later...
test_keras_model = create_model(input_shape=(224, 224, 3), num_classes=4)
test_keras_model.compile(optimizer='adam', loss='categorical_crossentropy',
                         metrics=['accuracy'])
test_keras_model.summary()

test_keras_model = create_ssd_model(4, image_size=(
    224, 224), weights='imagenet', include_top=False)
test_keras_model.compile(optimizer='adam', loss='categorical_crossentropy',
                         metrics=['accuracy'])
test_keras_model.summary()

Once TF-Vision is loaded, let's verify for sure via Python...


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB0


# Check TensorFlow version
print("TensorFlow version:", tf.__version__)

# Try importing a TensorFlow Vision model (e.g., EfficientNet)
try:
    # Import the EfficientNetB0 model
    test_keras_model = EfficientNetB0(weights='imagenet')
    print("TensorFlow Vision (via Keras) is accessible.")

except ImportError:
    print("TensorFlow Vision (via Keras) is not accessible.")

Verify either via BASH or python that we can access `/content/drive` mount


In [None]:
#!/bin/bash
! pwd && [ -e /content/drive/MyDrive ] || echo "Unable to validate Google Drive from bash script"

In [None]:
#!/usr/bin/env python
import os

# directory path to the Manga109 dataset (read-only)
global project_root_dir
global manga109_dir
# directory path to the TensorFlow TFRecord model (read-write)
global tf_model_dir

# Check if Google Drive is mounted and/or locally have symlink (or junctions) to access '/content/drive/MyDrive'
if os.path.isdir('/content/drive'):
    # list contents of the root directory of Google drive
    # change this to your own path
    project_root_dir = '/content/drive/MyDrive/projects/ML-manga-ocr-rust/'
    drive_files = os.listdir(project_root_dir)
    print(drive_files)

    data_paths = os.path.join(project_root_dir, 'data/')  # should pre-exist!
    drive_files = os.listdir(data_paths)
    print(drive_files)

    tf_model_dir = os.path.join(data_paths, 'tf_model/')
    # mkdir if not exists
    if not os.path.exists(tf_model_dir):
        os.makedirs(tf_model_dir)
        print('Created TensorFlow model directory at ', tf_model_dir)

    zip_path = os.path.join(project_root_dir, 'data/Manga109s.zip')
    if os.path.exists(zip_path):
        # only UNZIP IF dir does not exist, else assume it's already unzipped
        if not os.path.exists(data_paths):
            # os.makedirs(data_paths)
            #!unzip '{zip_path}' -d '{data_paths}'
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall(data_paths)
                print('Unzipped the data to ', data_paths)
    drive_files = os.listdir(data_paths)
    print(drive_files)

    manga109_dir = os.path.join(
        data_paths, 'Manga109s/Manga109s_released_2023_12_07/')
    data_dir_files = os.listdir(manga109_dir)
    print(data_dir_files)

    # lastly, notify users of their license by printing the readme.txt
    readme_path = os.path.join(manga109_dir, 'readme.txt')
    with open(readme_path, 'r', encoding="utf-8") as bookmark_file:
        print(bookmark_file.read())
else:
    print("Google Drive is not mounted.")

Now that we have manga dir accessible, let's try out the manga109api...

NOTE: See also https://github.com/manga109/manga109-demos/tree/master/visualization, which is basically the same thing but I'm using PyPlot...


In [None]:
#!/usr/bin/env python
import matplotlib.pyplot as plt
import manga109api
from PIL import Image, ImageDraw


def draw_rectangle(img, x0, y0, x1, y1, annotation_type):
    assert annotation_type in ["body", "face", "frame", "text"]
    color = {"body": "#258039", "face": "#f5be41",
             "frame": "#31a9b8", "text": "#cf3721"}[annotation_type]
    draw = ImageDraw.Draw(img)
    draw.rectangle([x0, y0, x1, y1], outline=color, width=10)


test_book = "YumeiroCooking"
page_index = 6

p = manga109api.Parser(root_dir=manga109_dir)
annotation = p.get_annotation(book=test_book)
img = Image.open(p.img_path(book=test_book, index=page_index))

for annotation_type in ["body", "face", "frame", "text"]:
    rois = annotation["page"][page_index][annotation_type]
    for roi in rois:
        draw_rectangle(img, roi["@xmin"], roi["@ymin"],
                       roi["@xmax"], roi["@ymax"], annotation_type)

# Display preprocessed image
plt.imshow(img)
plt.axis('off')
plt.show()

Load and Preprocess Images with TensorFlow:


If you did see an image load up with rectangles around texts, you are now ready to integrate it with TF-Vision...


In [None]:
#!/usr/bin/env python

import matplotlib.pyplot as plt
import tensorflow as tf
import manga109api
from PIL import Image, ImageDraw

# Initialize Manga109 API
manga109 = manga109api.Parser(root_dir=manga109_dir)

# Choose a manga volume and page index
test_volume = 'YumeiroCooking'
page_index = 6

# Load image using Manga109 API
test_image = Image.open(manga109.img_path(book=test_volume, index=page_index))

# Preprocess image using TensorFlow Keras
test_image = tf.keras.preprocessing.image.img_to_array(test_image)
test_image = tf.keras.applications.efficientnet.preprocess_input(test_image)

# Display preprocessed image
plt.imshow(test_image)
plt.axis('off')
plt.show()

If the above worked for single book/volume, we can now iterate the ENTIRE books it knows about; There is a minor issue in which curated annotation file thinks there is a JPG associated to it, in which the images dir for that book no longer exists, so we'll have to do extra checks (extra I/O means performance) whether the file exists or not.
We'll preprocess image prior to making it into TFRecord. Ideally, we'd want this to be on a separate cell, but it causes memory outage due to huge blocks of images, hence we'll check if image has text-regions, and if so, create a TFRecord for that region


In [None]:
# Import libraries
import os
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.optimizers import Adam
import numpy as np
import keras

global IMAGE_SIZE
global BATCH_SIZE
global EPOCHS
global LEARNING_RATE
global NUM_CLASSES_BOOK_COUNT
global saved_model_weights
global saved_model_all

# Define paths to data
# Your dir structure should look like this:
#   project_root_dir/
#   ├── images/
#   │   ├── book1/
#   │   │   ├── page1.jpg
#   ...
#   │   │   └── pageN.jpg
#   ...
#   │   └── bookN/
#   │       ├── page1.jpg
#   ...
#   │       └── pageN.jpg
#   └── annotations/
#       ├── book1.xml
#       ...
#       └── bookN.xml
#   ├── books.txt  <--------------------- this is the list of books in the dataset, i.e. `$ls images > books.txt`
# dataset_root_dir = manga109_dir
dataset_root_dir = os.path.join(project_root_dir, 'data/SmallTestData')
train_images_dir = os.path.join(dataset_root_dir, 'images')
train_annotations_dir = os.path.join(dataset_root_dir, 'annotations')
test_images_dir = os.path.join(project_root_dir, 'data/ForTrainTesting/images')
test_annotations_dir = os.path.join(
    project_root_dir, 'data/ForTrainTesting/annotations')
saved_model_weights = os.path.join(
    tf_model_dir, 'manga109_ocr_model_weights.h5')
saved_model_all = os.path.join(tf_model_dir, 'manga109_ocr_model.h5')

# BEFORE you start the long training process, crash if the directories are not found
if not os.path.exists(project_root_dir):
    raise FileNotFoundError(
        "Project root directory not found at " + project_root_dir)
if not os.path.exists(dataset_root_dir):
    raise FileNotFoundError(
        "Manga109 dataset directory not found at " + dataset_root_dir)
if not os.path.exists(tf_model_dir):
    raise FileNotFoundError(
        "Target Output directory not found at " + tf_model_dir)
if not os.path.exists(train_images_dir):
    raise FileNotFoundError(
        "Training images directory not found at " + train_images_dir)
if not os.path.exists(train_annotations_dir):
    raise FileNotFoundError(
        "Training annotations directory not found at " + train_annotations_dir)
if not os.path.exists(test_images_dir):
    raise FileNotFoundError(
        "Testing images directory not found at " + test_images_dir)
if not os.path.exists(test_annotations_dir):
    raise FileNotFoundError(
        "Testing annotations directory not found at " + test_annotations_dir)
# check if books.txt exists, and whether the count matches the number of books in the images directory
books_txt = os.path.join(dataset_root_dir, 'books.txt')
if not os.path.exists(books_txt):
    raise FileNotFoundError("Books.txt not found at " + books_txt)
# check if books.txt exists, and whether the count matches the number of books in the images directory
books_count = len(os.listdir(train_images_dir))
with open(books_txt, 'r') as f:
    books_txt_count = len(f.readlines())
if books_count != books_txt_count:
    raise ValueError(
        "Mismatch in books count between images directory and books.txt")
print("Books count in images directory matches books.txt")

# Define constants
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32
EPOCHS = 10
LEARNING_RATE = 0.001
#   Found 6519 images belonging to 83 classes.   <-- pay attention to class count here!
#   Found 1587 images belonging to 83 classes.
#   Epoch 1/10
#     37/204 [====>.........................] - ETA: 28:49 - loss: 4.6653 - accuracy: 0.0220
# update NUM_CLASSES_BOOK_COUNT based on number of books in annotations/images directory:
# quickest is to use Manga109 API to count the books:
dir_list = os.listdir(train_images_dir)
print("Number of books found in the IMAGES dataset:", len(dir_list))
print(dir_list)
dir_list = os.listdir(train_annotations_dir)
print("Number of books found in the ANNOTATIONS dataset:", len(dir_list))
print(dir_list)
manga109 = manga109api.Parser(root_dir=dataset_root_dir)
# This is critical, if you get InvalidValue error, most likely, the class count has changed!
NUM_CLASSES_BOOK_COUNT = len(manga109.books)
print("Number of classes (books) found in the dataset:", NUM_CLASSES_BOOK_COUNT)
# make sure books_count == NUM_CLASSES_BOOK_COUNT
if books_count != NUM_CLASSES_BOOK_COUNT:
    raise ValueError(
        "Mismatch in books count between images directory and annotations directory")
print("# Books count in images directory matches annotations directory")

Training models...


In [None]:
# Import libraries
import os
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.optimizers import Adam
import numpy as np
import keras

# Define data generator, note that ImageDataGenerator (deprecated) is now in tf.keras.preprocessing.image
# See https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory
# datagen = tf.keras.preprocessing.image.ImageDataGenerator(
#    preprocessing_function=preprocess_input,
#    rescale=1. / 255,
#    validation_split=0.2)
# Usage: https://keras.io/api/data_loading/image/
#   # Data directory structure:
#   #   training_data/
#   #   ...class_a/
#   #   ......a_image_1.jpg
#   #   ......a_image_2.jpg
#   #   ...class_b/
#   #   ......b_image_1.jpg
#   #   ......b_image_2.jpg
#   #   etc.
#   train_ds = keras.utils.image_dataset_from_directory(
#       directory='training_data/',
#       labels='inferred',
#       label_mode='categorical',
#       batch_size=32,
#       image_size=(256, 256))
#   validation_ds = keras.utils.image_dataset_from_directory(
#       directory='validation_data/',
#       labels='inferred',
#       label_mode='categorical',
#       batch_size=32,
#       image_size=(256, 256))
#
#   model = keras.applications.Xception(
#       weights=None, input_shape=(256, 256, 3), classes=10)
#   model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
#   model.fit(train_ds, epochs=10, validation_data=validation_ds)
dataset = tf.keras.utils.image_dataset_from_directory(
    train_images_dir,
    labels='inferred',
    label_mode='categorical',
    class_names=None,
    color_mode='rgb',
    batch_size=BATCH_SIZE,
    image_size=IMAGE_SIZE,
    shuffle=True,
    seed=123,
    validation_split=0.2,
    subset='training',
    interpolation='bilinear',
    follow_links=False
)
normalization_layer = tf.keras.layers.Rescaling(1./255)
normalized_dataset = dataset.map(lambda x, y: (normalization_layer(x), y))


# class_mode: {'input', 'categorical', 'sparse', 'binary', None}
# train_generator = dataset.flow_from_directory(
#    train_images_dir,
#    target_size=IMAGE_SIZE,
#    batch_size=BATCH_SIZE,
#    class_mode='categorical',
#    subset='training')
train_generator = keras.utils.image_dataset_from_directory(
    train_images_dir,
    labels='inferred',
    label_mode='categorical',
    class_names=None,
    color_mode='rgb',
    batch_size=BATCH_SIZE,
    image_size=IMAGE_SIZE,
    shuffle=True,
    seed=123,
    validation_split=0.2,
    subset='training',
    interpolation='bilinear',
    follow_links=False
)

# class_mode: {'input', 'categorical', 'sparse', 'binary', None}
# val_generator = dataset.flow_from_directory(
#    train_images_dir,
#    target_size=IMAGE_SIZE,
#    batch_size=BATCH_SIZE,
#    class_mode='categorical',
#    subset='validation')
val_generator = keras.utils.image_dataset_from_directory(
    train_images_dir,
    labels='inferred',
    label_mode='categorical',
    class_names=None,
    color_mode='rgb',
    batch_size=BATCH_SIZE,
    image_size=IMAGE_SIZE,
    shuffle=True,
    seed=123,
    validation_split=0.2,
    subset='validation',
    interpolation='bilinear',
    follow_links=False
)

my_model = keras.applications.Xception(
    include_top=True,
    weights='imagenet',
    input_tensor=None,
    input_shape=None,
    pooling=None,
    # classes=NUM_CLASSES_BOOK_COUNT,   # when using 'weights' as 'imagenet' with 'include_top' as true, 'classes' should be 1000
    classifier_activation='softmax'
)
my_model.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])
my_model.summary()
# my_model.fit(train_generator, validation_data=val_generator, epochs=EPOCHS)

# Define SSD model


def create_ssd_model(num_classes, image_size=(224, 224), weights='imagenet', include_top=False):
    base_model = MobileNetV2(input_shape=(
        image_size[0], image_size[1], 3), weights=weights, include_top=include_top)

    for layer in base_model.layers:
        layer.trainable = False

    x = layers.GlobalAveragePooling2D()(base_model.output)
    ssd_output = layers.Dense(num_classes, activation='softmax')(x)

    model = models.Model(inputs=base_model.input, outputs=ssd_output)

    return model


# Compile model
model = create_ssd_model(NUM_CLASSES_BOOK_COUNT, image_size=(
    224, 224), weights='imagenet', include_top=False)
optimizer = Adam(learning_rate=LEARNING_RATE)
model.compile(optimizer=optimizer,
              loss='categorical_crossentropy', metrics=['accuracy'])

# Define callbacks for checkpoints and early stopping
checkpoint_callback = ModelCheckpoint(filepath=tf_model_dir,
                                      save_weights_only=True,
                                      save_best_only=True,
                                      monitor='val_loss',
                                      mode='min',
                                      verbose=1)
early_stopping_callback = EarlyStopping(monitor='val_loss',
                                        patience=3,
                                        mode='min',
                                        verbose=1)

# Train model with callbacks
# my_model.fit(train_generator, validation_data=val_generator, epochs=EPOCHS)
history = model.fit(train_generator,
                    epochs=EPOCHS,
                    validation_data=val_generator,
                    callbacks=[checkpoint_callback, early_stopping_callback])

# save just the weights for my model
model.save_weights(saved_model_weights)
# saving with config and weights
model.save(saved_model_all)

# Evaluate model
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
# class_mode: {'input', 'categorical', 'sparse', 'binary', None}
# test_generator = dataset.flow_from_directory(
# test_images_dir,
# target_size=IMAGE_SIZE,
# batch_size=BATCH_SIZE,
# class_mode='categorical',
# )
# test_generator = keras.utils.image_dataset_from_directory(
#    test_images_dir,
#    labels='inferred',
#    label_mode='categorical',
#    class_names=None,
#    color_mode='rgb',
#    batch_size=BATCH_SIZE,
#    image_size=IMAGE_SIZE,
#    shuffle=True,
#    seed=123,
#    validation_split=0.2,
#    subset='validation',
#    interpolation='bilinear',
#    follow_links=False
# )
#
# Evaluate model on test data
# loss, accuracy = model.evaluate(test_generator)
# print("Test Loss:", loss)
# print("Test Accuracy:", accuracy)
#

Testing trained models:
Load the models (weights) created on previous cell, and check/test it.


In [None]:
# Import libraries
import os
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.optimizers import Adam
import numpy as np
import keras

if not os.path.exists(tf_model_dir):
    raise FileNotFoundError(
        "Target Output directory not found at " + tf_model_dir)
if not os.path.exists(saved_model_all):
    raise FileNotFoundError("Model file not found at " + saved_model_all)
if not os.path.exists(saved_model_weights):
    raise FileNotFoundError(
        "Model weights file not found at " + saved_model_weights)

# load weights and config
model = keras.models.load_model(saved_model_all)

test_generator = keras.utils.image_dataset_from_directory(
    test_images_dir,
    labels='inferred',
    label_mode='categorical',
    class_names=None,
    color_mode='rgb',
    batch_size=BATCH_SIZE,
    image_size=IMAGE_SIZE,
    shuffle=True,
    seed=123,
    validation_split=0.2,
    subset='validation',
    interpolation='bilinear',
    follow_links=False
)

# Evaluate model on test data
loss, accuracy = model.evaluate(test_generator)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)