<a href="https://colab.research.google.com/github/HidekiAI/ML-manga109-OCR/blob/trunk/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


First, we want to make sure TensorFlow is installed in the Python (virtual) environment...

-   TensorFlow Object Detection is now depracated
-   TensorFlow Addons (for using TF-Vision) sunsets on May, 2024 and needs to be switched over to Keras, in which it should be accessible directly as long as TF is installed


In [None]:
#!/bin/bash
# NOTE: NO NEED to run this on CoLab, only on local...
!pip install --upgrade pip

!pip install -U --pre tensorflow=="2.*"
!pip install tensorflow
# Comment above and uncomment below if you want to install tensorflow-gpu instead of tensorflow on CoLab
#!pip install tensorflow-gpu
#pip install tensorflow[and-cuda]

!pip install transformers
!pip install tf-models-official
!pip install tf-keras-vis

Next, we'll need the (official) tools/libraries to read manga109 (annotation) data from https://github.com/manga109


In [None]:
#!/bin/bash
# MUST run ths on BOTH CoLab and local...
!pip install manga109api

I want to know which version of TF is installed, I cannot run GPU version on my local machine...


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration

import tensorflow as tf


# Check TensorFlow version

print("TensorFlow version:", tf.__version__)


# Check TensorFlow configuration

print("TensorFlow configuration:")

print(tf.config.list_physical_devices('GPU'))  # List available GPUs

print(tf.config.list_physical_devices('CPU'))  # List available CPUs

Next, I'd like to absolutely make sure we have access to TF-Vision for text detection; Because tensorflow-addons has become sunset as of May, 2024, we just need to verify that keras is accessible...


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration

from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D

from tensorflow.keras.models import Sequential

import tensorflow as tf


# Check TensorFlow version

print("TensorFlow version:", tf.__version__)


# Access Keras functionality through tf.keras


# Define a simple Sequential model

model = Sequential([

    Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),

    MaxPooling2D(),

    Conv2D(32, 3, padding='same', activation='relu'),

    MaxPooling2D(),

    Conv2D(64, 3, padding='same', activation='relu'),

    MaxPooling2D(),

    Flatten(),

    Dense(128, activation='relu'),

    Dense(10, activation='softmax')

])


# Compile the model

model.compile(optimizer='adam',

              loss='sparse_categorical_crossentropy',

              metrics=['accuracy'])


# Print model summary

model.summary()

Once TF-Vision is loaded, let's verify for sure via Python...


In [None]:
#!/usr/bin/env python
# Optionally run this to check the TensorFlow version and configuration
import tensorflow as tf

from tensorflow.keras.applications import EfficientNetB0


# Check TensorFlow version

print("TensorFlow version:", tf.__version__)


# Try importing a TensorFlow Vision model (e.g., EfficientNet)

try:

    # Import the EfficientNetB0 model

    model = EfficientNetB0(weights='imagenet')

    print("TensorFlow Vision (via Keras) is accessible.")

except ImportError:

    print("TensorFlow Vision (via Keras) is not accessible.")

Note that below is ONLY necessary for Google CoLab to access your Google Drive. If on Notepad/Jupyter, do the following instead (not exact, just the example):

-   Linux: make sure to `ln -sv ~/Google/MyDrive /content/drive` to softlink your Google G-Drive as `/content/drive`
-   Windows: From DOS Command Prompt (right clock to launch as Admin) `mklink.exe /D "C:/content/drive" "C:/Users/HidekiAI/Google/MyDrive/"` to create a dir-junction


In [None]:
#!/usr/bin/python
# No need to execute this if running locally, this is only for Google CoLab usage
from google.colab import drive
drive.mount('/content/drive')

Verify either via BASH or python that we can access `/content/drive` mount


In [None]:
#!/bin/bash
! pwd && [ -e /content/drive/MyDrive ] || echo "Unable to validate Google Drive from bash script"

In [None]:
#!/usr/bin/env python
import os

# directory path to the Manga109 dataset (read-only)
global manga109_dir
# directory path to the TensorFlow TFRecord model (read-write)
global tf_model_dir

# Check if Google Drive is mounted and/or locally have symlink (or junctions) to access '/content/drive/MyDrive'
if os.path.isdir('/content/drive'):
    # list contents of the root directory of Google drive
    # change this to your own path
    root_paths = '/content/drive/MyDrive/projects/ML-manga-ocr-rust/'
    data_paths = os.path.join(root_paths, 'data/')  # should pre-exist!
    tf_model_dir = os.path.join(data_paths, 'tf_model/')
    # mkdir if not exists
    if not os.path.exists(tf_model_dir):
        os.makedirs(tf_model_dir)
        print('Created TensorFlow model directory at ', tf_model_dir)

    drive_files = os.listdir(root_paths)
    print(drive_files)
    drive_files = os.listdir(data_paths)
    print(drive_files)
    zip_path = os.path.join(root_paths, 'data/Manga109s.zip')
    if os.path.exists(zip_path):
        # only UNZIP IF dir does not exist, else assume it's already unzipped
        if not os.path.exists(data_paths):
            # os.makedirs(data_paths)
            #!unzip '{zip_path}' -d '{data_paths}'
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall(data_paths)
                print('Unzipped the data to ', data_paths)
    drive_files = os.listdir(data_paths)
    manga109_dir = os.path.join(
        data_paths, 'Manga109s/Manga109s_released_2023_12_07/')
    data_dir_files = os.listdir(manga109_dir)
    print(data_dir_files)
    # lastly, notify users of their license by printing the readme.txt
    readme_path = os.path.join(manga109_dir, 'readme.txt')
    with open(readme_path, 'r', encoding="utf-8") as file:
        print(file.read())
else:
    print("Google Drive is not mounted.")

Now that we have manga dir accessible, let's try out the manga109api...


In [None]:
#!/usr/bin/env python
import manga109api
from PIL import Image, ImageDraw


def draw_rectangle(img, x0, y0, x1, y1, annotation_type):
    assert annotation_type in ["body", "face", "frame", "text"]
    color = {"body": "#258039", "face": "#f5be41",
             "frame": "#31a9b8", "text": "#cf3721"}[annotation_type]
    draw = ImageDraw.Draw(img)
    draw.rectangle([x0, y0, x1, y1], outline=color, width=10)


if __name__ == "__main__":
    book = "YumeiroCooking"
    page_index = 6

    p = manga109api.Parser(root_dir=manga109_dir)
    annotation = p.get_annotation(book=book)
    img = Image.open(p.img_path(book=book, index=page_index))

    for annotation_type in ["body", "face", "frame", "text"]:
        rois = annotation["page"][page_index][annotation_type]
        for roi in rois:
            draw_rectangle(img, roi["@xmin"], roi["@ymin"],
                           roi["@xmax"], roi["@ymax"], annotation_type)

    # Display preprocessed image
    import matplotlib.pyplot as plt
    plt.imshow(img)
    plt.axis('off')
    plt.show()

Load and Preprocess Images with TensorFlow:


If you did see an image load up with rectangles around texts, you are now ready to integrate it with TF-Vision...


In [None]:
#!/usr/bin/env python

import matplotlib.pyplot as plt
import tensorflow as tf
import manga109api
from PIL import Image, ImageDraw

# Initialize Manga109 API
manga109 = manga109api.Parser(root_dir=manga109_dir)

# Choose a manga volume and page index
volume = 'YumeiroCooking'
page_index = 6

# Load image using Manga109 API
image = Image.open(manga109.img_path(book=volume, index=page_index))

# Preprocess image using TensorFlow Keras
image = tf.keras.preprocessing.image.img_to_array(image)
image = tf.keras.applications.efficientnet.preprocess_input(image)

# Display preprocessed image
plt.imshow(image)
plt.axis('off')
plt.show()

If the above worked for single book/volume, we can now iterate the ENTIRE books it knows about; There is a minor issue in which curated annotation file thinks there is a JPG associated to it, in which the images dir for that book no longer exists, so we'll have to do extra checks (extra I/O means performance) whether the file exists or not.
We'll preprocess image prior to making it into TFRecord. Ideally, we'd want this to be on a separate cell, but it causes memory outage due to huge blocks of images, hence we'll check if image has text-regions, and if so, create a TFRecord for that region


In [None]:
#!/usr/bin/env python
import tensorflow as tf

global tf_model_paths

# Each text region looks like so:
#       'text': [
#           {'@id': '000000d2', '@xmin': 698, '@ymin': 238, '@xmax': 711, '@ymax': 284, '#text': 'ブン？', 'type': 'text'},
#           {'@id': '000000dc', '@xmin': 356, '@ymin': 273, '@xmax': 403, '@ymax': 340, '#text': 'あの人.....', 'type': 'text'},
#           {'@id': '000000de', '@xmin': 1131, '@ymin': 752, '@xmax': 1200, '@ymax': 841, '#text': 'わっ', 'type': 'text'},
#           {'@id': '000000e0', '@xmin': 482, '@ymin': 91, '@xmax': 498, '@ymax': 145, '#text': 'あれ？', 'type': 'text'}]}
# It seems that so far, best approach is to map a single (preprocessed) image to a set (list/array) of text regions


def create_tf_manga109_rects_from_page(preprocessed_image, page_width, page_height, text_rects):
    # Initialize lists for rectangle coordinates
    xmin_list, ymin_list, xmax_list, ymax_list = [], [], [], []

    for text_rect in text_rects:
        # Append rectangle coordinates to the lists
        xmin_list.append(text_rect['@xmin'])
        ymin_list.append(text_rect['@ymin'])
        xmax_list.append(text_rect['@xmax'])
        ymax_list.append(text_rect['@ymax'])

    # Create a TensorFlow Example from the image and the text regions
    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[preprocessed_image.tobytes()])),
        'page_width': tf.train.Feature(int64_list=tf.train.Int64List(value=[page_width])),
        'page_height': tf.train.Feature(int64_list=tf.train.Int64List(value=[page_height])),
        'xmin': tf.train.Feature(int64_list=tf.train.Int64List(value=xmin_list)),
        'ymin': tf.train.Feature(int64_list=tf.train.Int64List(value=ymin_list)),
        'xmax': tf.train.Feature(int64_list=tf.train.Int64List(value=xmax_list)),
        'ymax': tf.train.Feature(int64_list=tf.train.Int64List(value=ymax_list)),
    }))

    return tf_example


def create_tf_records(writer, preprocessed_image, page, output_path):
    # sample output:
    #   Page= 10 , page= {'@index': 10, '@width': 1654, '@height': 1170, 'frame': [{'@id': '000000cd', '@xmin': 341, '@ymin': 96, '@xmax': 485, '@ymax': 354, 'type': 'frame'}, {'@id': '000000cf', '@xmin': 834, '@ymin': 505, '@xmax': 1648, '@ymax': 745, 'type': 'frame'}, {'@id': '000000d3', '@xmin': 897, '@ymin': 750, '@xmax': 1216, '@ymax': 1169, 'type': 'frame'}, {'@id': '000000d5', '@xmin': 80, '@ymin': 721, '@xmax': 748, '@ymax': 1098, 'type': 'frame'}, {'@id': '000000d6', '@xmin': 1098, '@ymin': 1, '@xmax': 1653, '@ymax': 502, 'type': 'frame'}, {'@id': '000000da', '@xmin': 1214, '@ymin': 746, '@xmax': 1565, '@ymax': 1096, 'type': 'frame'}, {'@id': '000000e1', '@xmin': 489, '@ymin': 3, '@xmax': 745, '@ymax': 360, 'type': 'frame'}, {'@id': '000000e2', '@xmin': 82, '@ymin': 96, '@xmax': 340, '@ymax': 356, 'type': 'frame'}, {'@id': '000000e3', '@xmin': 1, '@ymin': 372, '@xmax': 817, '@ymax': 720, 'type': 'frame'}, {'@id': '000000e4', '@xmin': 901, '@ymin': 100, '@xmax': 1092, '@ymax': 502, 'type': 'frame'}], 'face': [{'@id': '000000d0', '@xmin': 987, '@ymin': 163, '@xmax': 1029, '@ymax': 192, '@character': '00000010', 'type': 'face'}, {'@id': '000000d4', '@xmin': 350, '@ymin': 150, '@xmax': 471, '@ymax': 242, '@character': '00000003', 'type': 'face'}, {'@id': '000000d9', '@xmin': 1043, '@ymin': 775, '@xmax': 1088, '@ymax': 808, '@character': '00000010', 'type': 'face'}], 'body': [{'@id': '000000ce', '@xmin': 595, '@ymin': 129, '@xmax': 662, '@ymax': 227, '@character': '00000010', 'type': 'body'}, {'@id': '000000d1', '@xmin': 343, '@ymin': 100, '@xmax': 480, '@ymax': 351, '@character': '00000003', 'type': 'body'}, {'@id': '000000d7', '@xmin': 969, '@ymin': 138, '@xmax': 1050, '@ymax': 367, '@character': '00000010', 'type': 'body'}, {'@id': '000000d8', '@xmin': 991, '@ymin': 752, '@xmax': 1148, '@ymax': 1062, '@character': '00000010', 'type': 'body'}, {'@id': '000000db', '@xmin': 1310, '@ymin': 515, '@xmax': 1623, '@ymax': 743, '@character': '00000090', 'type': 'body'}, {'@id': '000000dd', '@xmin': 139, '@ymin': 385, '@xmax': 817, '@ymax': 723, '@character': '00000090', 'type': 'body'}, {'@id': '000000df', '@xmin': 383, '@ymin': 372, '@xmax': 477, '@ymax': 459, '@character': '00000010', 'type': 'body'}],
    #       'text': [
    #           {'@id': '000000d2', '@xmin': 698, '@ymin': 238, '@xmax': 711, '@ymax': 284, '#text': 'ブン？', 'type': 'text'},
    #           {'@id': '000000dc', '@xmin': 356, '@ymin': 273, '@xmax': 403, '@ymax': 340, '#text': 'あの人.....', 'type': 'text'},
    #           {'@id': '000000de', '@xmin': 1131, '@ymin': 752, '@xmax': 1200, '@ymax': 841, '#text': 'わっ', 'type': 'text'},
    #           {'@id': '000000e0', '@xmin': 482, '@ymin': 91, '@xmax': 498, '@ymax': 145, '#text': 'あれ？', 'type': 'text'}]}
    text_rects = page.get('text')

    tf_result = create_tf_manga109_rects_from_page(
        preprocessed_image, page['@width'], page['@height'],
        text_rects)
    writer.write(tf_result.SerializeToString())


# Note that writer will overwrite existing files, so we'll create a writer ouside the loop
tf_record_paths = os.path.join(tf_model_dir, 'manga109_detection.tfrecords')
writer = tf.io.TFRecordWriter(tf_record_paths)

# Example usage:
# annotation_data = {
#    'YumeiroCooking/000.jpg': [{'class': 'body', 'xmin': 100, 'ymin': 50, 'xmax': 200, 'ymax': 150}],
#    'YumeiroCooking/001.jpg': [{'class': 'face', 'xmin': 50, 'ymin': 30, 'xmax': 100, 'ymax': 80}],
# }
# Iterate through all books
for book in manga109.books:
    print(f"Processing book: {book}")
    annotations_of_this_book = manga109.get_annotation(book)
    pages = annotations_of_this_book['page']

    # Iterate through all pages in the book
    # sample output:
    #   Page= 0 , page= {'@index': 0, '@width': 1654, '@height': 1170, 'frame': [], 'face': [], 'body': [], 'text': []}
    #   Page= 2 , page= {'@index': 2, '@width': 1654, '@height': 1170, 'frame': [{'@id': '00000000', '@xmin': 83, '@ymin': 86, '@xmax': 751, '@ymax': 1090, 'type': 'frame'}], 'face': [{'@id': '00000004', '@xmin': 406, '@ymin': 684, '@xmax': 456, '@ymax': 764, '@character': '00000003', 'type': 'face'}], 'body': [{'@id': '00000002', '@xmin': 178, '@ymin': 660, '@xmax': 548, '@ymax': 965, '@character': '00000003', 'type': 'body'}], 'text': [{'@id': '00000001', '@xmin': 550, '@ymin': 660, '@xmax': 583, '@ymax': 696, '#text': 'あ', 'type': 'text'}]}
    #   Page= 7 , page= {'@index': 7, '@width': 1654, '@height': 1170, 'frame': [{'@id': '0000007e', '@xmin': 53, '@ymin': 6, '@xmax': 419, '@ymax': 361, 'type': 'frame'}, {'@id': '00000086', '@xmin': 901, '@ymin': 93, '@xmax': 1382, '@ymax': 500, 'type': 'frame'}, {'@id': '00000088', '@xmin': 901, '@ymin': 519, '@xmax': 1567, '@ymax': 1169, 'type': 'frame'}, {'@id': '0000008c', '@xmin': 435, '@ymin': 98, '@xmax': 747, '@ymax': 361, 'type': 'frame'}, {'@id': '0000008d', '@xmin': 5, '@ymin': 361, '@xmax': 819, '@ymax': 1169, 'type': 'frame'}, {'@id': '0000008e', '@xmin': 1385, '@ymin': 96, '@xmax': 1565, '@ymax': 501, 'type': 'frame'}], 'face': [{'@id': '0000007a', '@xmin': 223, '@ymin': 199, '@xmax': 348, '@ymax': 286, '@character': '00000003', 'type': 'face'}, {'@id': '0000007f', '@xmin': 1117, '@ymin': 204, '@xmax': 1231, '@ymax': 294, '@character': '00000003', 'type': 'face'}, {'@id': '00000080', '@xmin': 1403, '@ymin': 449, '@xmax': 1454, '@ymax': 494, '@character': '00000010', 'type': 'face'}, {'@id': '00000083', '@xmin': 492, '@ymin': 276, '@xmax': 541, '@ymax': 316, '@character': '00000010', 'type': 'face'}], 'body': [{'@id': '00000077', '@xmin': 431, '@ymin': 249, '@xmax': 597, '@ymax': 363, '@character': '00000010', 'type': 'body'}, {'@id': '00000079', '@xmin': 1400, '@ymin': 444, '@xmax': 1458, '@ymax': 501, '@character': '00000010', 'type': 'body'}, {'@id': '00000081', '@xmin': 161, '@ymin': 91, '@xmax': 419, '@ymax': 363, '@character': '00000003', 'type': 'body'}, {'@id': '00000087', '@xmin': 1043, '@ymin': 114, '@xmax': 1364, '@ymax': 501, '@character': '00000003', 'type': 'body'}, {'@id': '0000008f', '@xmin': 37, '@ymin': 415, '@xmax': 766, '@ymax': 1012, '@character': '00000090', 'type': 'body'}], 'text': [{'@id': '00000078', '@xmin': 463, '@ymin': 695, '@xmax': 477, '@ymax': 736, '#text': 'しょうぶ.....', 'type': 'text'}, {'@id': '0000007b', '@xmin': 217, '@ymin': 348, '@xmax': 268, '@ymax': 456, '#text': 'こらっ\nこのやろ', 'type': 'text'}, {'@id': '0000007c', '@xmin': 55, '@ymin': 251, '@xmax': 95, '@ymax': 334, '#text': 'おいっ', 'type': 'text'}, {'@id': '0000007d', '@xmin': 693, '@ymin': 92, '@xmax': 749, '@ymax': 178, '#text': '出てこいっ！', 'type': 'text'}, {'@id': '00000082', '@xmin': 1284, '@ymin': 78, '@xmax': 1380, '@ymax': 300, '#text': 'そこかっ！', 'type': 'text'}, {'@id': '00000084', '@xmin': 573, '@ymin': 260, '@xmax': 622, '@ymax': 316, '#text': 'むちゃ言うな', 'type': 'text'}, {'@id': '00000085', '@xmin': 397, '@ymin': 90, '@xmax': 414, '@ymax': 173, '#text': 'どこだ！', 'type': 'text'}, {'@id': '00000089', '@xmin': 327, '@ymin': 723, '@xmax': 374, '@ymax': 772, '#text': 'なんちゃって\nはは....', 'type': 'text'}, {'@id': '0000008a', '@xmin': 532, '@ymin': 483, '@xmax': 631, '@ymax': 660, '#text': 'おいっ......てばっ\n出てきてわたしと', 'type': 'text'}, {'@id': '0000008b', '@xmin': 89, '@ymin': 85, '@xmax': 175, '@ymax': 203, '#text': '出て来て私と勝負しろっ！', 'type': 'text'}]}
    #   Page= 10 , page= {'@index': 10, '@width': 1654, '@height': 1170, 'frame': [{'@id': '000000cd', '@xmin': 341, '@ymin': 96, '@xmax': 485, '@ymax': 354, 'type': 'frame'}, {'@id': '000000cf', '@xmin': 834, '@ymin': 505, '@xmax': 1648, '@ymax': 745, 'type': 'frame'}, {'@id': '000000d3', '@xmin': 897, '@ymin': 750, '@xmax': 1216, '@ymax': 1169, 'type': 'frame'}, {'@id': '000000d5', '@xmin': 80, '@ymin': 721, '@xmax': 748, '@ymax': 1098, 'type': 'frame'}, {'@id': '000000d6', '@xmin': 1098, '@ymin': 1, '@xmax': 1653, '@ymax': 502, 'type': 'frame'}, {'@id': '000000da', '@xmin': 1214, '@ymin': 746, '@xmax': 1565, '@ymax': 1096, 'type': 'frame'}, {'@id': '000000e1', '@xmin': 489, '@ymin': 3, '@xmax': 745, '@ymax': 360, 'type': 'frame'}, {'@id': '000000e2', '@xmin': 82, '@ymin': 96, '@xmax': 340, '@ymax': 356, 'type': 'frame'}, {'@id': '000000e3', '@xmin': 1, '@ymin': 372, '@xmax': 817, '@ymax': 720, 'type': 'frame'}, {'@id': '000000e4', '@xmin': 901, '@ymin': 100, '@xmax': 1092, '@ymax': 502, 'type': 'frame'}], 'face': [{'@id': '000000d0', '@xmin': 987, '@ymin': 163, '@xmax': 1029, '@ymax': 192, '@character': '00000010', 'type': 'face'}, {'@id': '000000d4', '@xmin': 350, '@ymin': 150, '@xmax': 471, '@ymax': 242, '@character': '00000003', 'type': 'face'}, {'@id': '000000d9', '@xmin': 1043, '@ymin': 775, '@xmax': 1088, '@ymax': 808, '@character': '00000010', 'type': 'face'}], 'body': [{'@id': '000000ce', '@xmin': 595, '@ymin': 129, '@xmax': 662, '@ymax': 227, '@character': '00000010', 'type': 'body'}, {'@id': '000000d1', '@xmin': 343, '@ymin': 100, '@xmax': 480, '@ymax': 351, '@character': '00000003', 'type': 'body'}, {'@id': '000000d7', '@xmin': 969, '@ymin': 138, '@xmax': 1050, '@ymax': 367, '@character': '00000010', 'type': 'body'}, {'@id': '000000d8', '@xmin': 991, '@ymin': 752, '@xmax': 1148, '@ymax': 1062, '@character': '00000010', 'type': 'body'}, {'@id': '000000db', '@xmin': 1310, '@ymin': 515, '@xmax': 1623, '@ymax': 743, '@character': '00000090', 'type': 'body'}, {'@id': '000000dd', '@xmin': 139, '@ymin': 385, '@xmax': 817, '@ymax': 723, '@character': '00000090', 'type': 'body'}, {'@id': '000000df', '@xmin': 383, '@ymin': 372, '@xmax': 477, '@ymax': 459, '@character': '00000010', 'type': 'body'}], 'text': [{'@id': '000000d2', '@xmin': 698, '@ymin': 238, '@xmax': 711, '@ymax': 284, '#text': 'ブン？', 'type': 'text'}, {'@id': '000000dc', '@xmin': 356, '@ymin': 273, '@xmax': 403, '@ymax': 340, '#text': 'あの人.....', 'type': 'text'}, {'@id': '000000de', '@xmin': 1131, '@ymin': 752, '@xmax': 1200, '@ymax': 841, '#text': 'わっ', 'type': 'text'}, {'@id': '000000e0', '@xmin': 482, '@ymin': 91, '@xmax': 498, '@ymax': 145, '#text': 'あれ？', 'type': 'text'}]}
    # NOTE: each page can have multiple text regions
    for page_index, page in enumerate(pages):
        # print("\tPage=", page_index, ", page=", page)

        # Load image using Manga109 API - the annotation provided may mismatch in a sense of some JPG may be missing,
        # causing premature exit of the loop, hence we'll try to verify if the file exists first and skip
        if not os.path.exists(manga109.img_path(book=volume, index=page_index)):
            print("File not found: ", manga109.img_path(
                book=volume, index=page_index))
            continue

        # Load image using Manga109 API
        image = Image.open(manga109.img_path(book=volume, index=page_index))
        image_flattened = tf.keras.preprocessing.image.img_to_array(image)
        preprocessed = tf.keras.applications.efficientnet.preprocess_input(
            image_flattened)

        # Display preprocessed image (optional)
        # import matplotlib.pyplot as plt
        # plt.imshow(preprocessed)
        # plt.axis('off')
        # plt.show()

        frame_rects = page.get('frame')
        face_rects = page.get('face')
        body_rects = page.get('body')
        text_rects = page.get('text')

        # We're ONLY interested in classifying text regions
        if text_rects.__len__() > 0:
            create_tf_records(writer, preprocessed, page, tf_record_paths)

            # for debugging, if we find pages with texts, print them
            print(page_index, end=" ")
    writer.flush()
writer.close()

Model and Loss Function:


In [None]:
#!/usr/bin/env python
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.models import Model

# Define your model architecture
global model


def create_model(input_shape, num_classes):
    inputs = Input(shape=input_shape)
    x = Conv2D(32, (3, 3), activation='relu')(inputs)
    x = MaxPooling2D((2, 2))(x)
    x = Conv2D(64, (3, 3), activation='relu')(x)
    x = MaxPooling2D((2, 2))(x)
    x = Flatten()(x)
    x = Dense(128, activation='relu')(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = Model(inputs=inputs, outputs=outputs)
    return model


# Compile the model
model = create_model(input_shape=(224, 224, 3), num_classes=4)
model.compile(optimizer='adam', loss='categorical_crossentropy',
              metrics=['accuracy'])

Data Augmentation:
This is possibly not needed since I am now using TFRecord...

In [None]:
#!/usr/bin/env python
from tensorflow.keras.preprocessing.image import ImageDataGenerator

global datagen
# Define data augmentation parameters
datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
)

Training...


In [None]:
#!/usr/bin/env python
# Train the model
BATCH_SIZE = 32
steps_per_epoch = len(tf_record_paths) // BATCH_SIZE
EPOCHS = 10
SHUFFLE_BUFFER_SIZE = 10000
#
#  # Convert annotations to one-hot encoding for classes
#  # Assuming you have a function to convert class labels to numeric IDs
#  # For simplicity, let's assume classes are already numeric IDs
#  # If not, you need to convert them using label encoding
#  # Also, you need to modify your create_tf_example function to include numeric class labels
#  # Assuming you have label_map as a dictionary mapping class labels to numeric IDs
#  # label_map = {'body': 0, 'face': 1, 'frame': 2, 'text': 3}
#
#  # Define steps per epoch based on the number of training samples and batch size
#  steps_per_epoch = len(annotation_data) // BATCH_SIZE
#


#  # Train the model using fit_generator
#  model.fit_generator(
#      datagen.flow_from_directory(
#          '/path/to/dataset', target_size=(224, 224), batch_size=BATCH_SIZE),
#      steps_per_epoch=steps_per_epoch,
#      epochs=EPOCHS
#  )


def parse_tfrecord_fn(example):
    feature_description = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'xmin': tf.io.VarLenFeature(tf.int64),
        'ymin': tf.io.VarLenFeature(tf.int64),
        'xmax': tf.io.VarLenFeature(tf.int64),
        'ymax': tf.io.VarLenFeature(tf.int64),
        'page_width': tf.io.FixedLenFeature([], tf.int64),
        'page_height': tf.io.FixedLenFeature([], tf.int64),
    }
    example = tf.io.parse_single_example(example, feature_description)

    image = tf.io.decode_raw(example['image'], tf.uint8)
    image = tf.reshape(
        image, [example['page_height'], example['page_width'], -1])
    xmin = tf.sparse.to_dense(example['xmin'])
    ymin = tf.sparse.to_dense(example['ymin'])
    xmax = tf.sparse.to_dense(example['xmax'])
    ymax = tf.sparse.to_dense(example['ymax'])

    return image, xmin, ymin, xmax, ymax


# Load TFRecord data
raw_dataset = tf.data.TFRecordDataset(tf_record_paths)

# Parse the data
parsed_dataset = raw_dataset.map(parse_tfrecord_fn)

# Shuffle and batch the data
dataset_from_tfrecord = parsed_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)

# Convert the tf.data.Dataset to a generator
def gen_from_tfrecords():
    for features in dataset_from_tfrecord:
        yield features


# Train the model using fit_generator
# model.fit_generator(gen(), steps_per_epoch=len(tf_record_paths) // BATCH_SIZE, epochs=EPOCHS)
model.fit_generator(
    #datagen.flow_from_directory( '/path/to/dataset', target_size=(224, 224), batch_size=BATCH_SIZE),
    gen_from_tfrecords(),
    steps_per_epoch=steps_per_epoch,
    epochs=EPOCHS
)