# 손글씨 인식 Application
Colab 환경에서 손글씨 인식 애플리케이션을 만들어봅시다.
애플리케이션 사용자의 유스케이스는 아래와 같습니다.
- 사용자는 손글씨 이미지 파일을 업로드할 수 있다.
- 사용자는 캔버스에 손글씨를 쓸 수 있다.
- 사용자는 텍스트 결과를 확인할 수 있다.

## Colaab 환경 설정
python package들을 설치합니다. 예제로 사용할 이미지들도 다운로드 받습니다.

In [1]:
# Local에서 Run하는 경우 False로 변경
using_colab = True

In [2]:
if using_colab:
    !wget https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/requirements.txt
    !pip install -r requirements.txt

    !mkdir examples
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Red.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/sentence.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/i_love_you.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/merrychristmas.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Rock.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Bob.png

## Import dependency

In [1]:
import os

import gradio as gr
import numpy as np
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

## 이미지 업로드 UI

In [2]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")

In [3]:
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7861
Running on public URL: https://08ae3afd0c1f105b32.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [4]:
app.close()

Closing server running on port: 7861


## TrOCR 추론기 클래스
TrOCR 추론기 클래스는 TrOCR 모델 및 processor 초기화와 추론 작업을 수행하는 클래스입니다.

In [5]:
class TrOCRInferencer:
    def __init__(self):
        print("[INFO] Initialize TrOCR Inferencer.")
        self.processor = TrOCRProcessor.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )
        self.model = VisionEncoderDecoderModel.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )

    def inference(self, image: Image) -> str:
        """Inference using model.

        It is performed as a procedure of preprocessing - inference - postprocessing.
        """
        # preprocess
        pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
        # inference
        generated_ids = self.model.generate(pixel_values)
        # postprocess
        generated_text = self.processor.batch_decode(
            generated_ids, skip_special_tokens=True
        )[0]

        return generated_text


inferencer = TrOCRInferencer()

[INFO] Initialize TrOCR Inferencer.


Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-base-handwritten and are newly initialized: ['encoder.pooler.dense.weight', 'encoder.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 추론 기능 구현

In [6]:
def image_to_text(image: np.ndarray) -> str:
    image = Image.fromarray(image).convert("RGB")
    text = inferencer.inference(image)
    return text

In [7]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=image, outputs=output
    )

In [8]:
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7861
Running on public URL: https://1d6e932923de7aa512.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [9]:
app.close()

Closing server running on port: 7861


## 캔버스 UI

In [10]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    sketchpad = gr.Sketchpad(
        label="Handwritten Sketchpad",
        shape=(600, 192),
        brush_radius=2,
        invert_colors=False,
    )
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=sketchpad, outputs=output
    )

In [11]:
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7861
Running on public URL: https://2d6623c662b7fe3bc9.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [12]:
app.close()

Closing server running on port: 7861


## Gradio App 구현

In [15]:
# Implement app
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    with gr.Tab("Image upload"):
        image = gr.Image(label="Handwritten image file")
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=image, outputs=output
        )

        gr.Markdown("## Image Examples")
        gr.Examples(
            examples=[
                os.path.join(os.getcwd(), "examples/Hello.png"),
                os.path.join(os.getcwd(), "examples/Hello_cursive.png"),
                os.path.join(os.getcwd(), "examples/Red.png"),
                os.path.join(os.getcwd(), "examples/sentence.png"),
                os.path.join(os.getcwd(), "examples/i_love_you.png"),
                os.path.join(os.getcwd(), "examples/merrychristmas.png"),
                os.path.join(os.getcwd(), "examples/Rock.png"),
                os.path.join(os.getcwd(), "examples/Bob.png"),
            ],
            inputs=image,
            outputs=output,
            fn=image_to_text,
        )

    with gr.Tab("Drawing"):
        sketchpad = gr.Sketchpad(
            label="Handwritten Sketchpad",
            shape=(600, 192),
            brush_radius=2,
            invert_colors=False,
        )
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=sketchpad, outputs=output
        )

In [16]:
# App 실행
app.launch(inline=False, share=True)

Running on local URL:  http://127.0.0.1:7861
Running on public URL: https://5461a22edfc9753d24.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces




In [17]:
app.close()

Closing server running on port: 7861
