# 손글씨 인식 Application
Colab 환경에서 손글씨 인식 애플리케이션을 만들어봅시다. 애플리케이션 사용자의 유스케이스는 아래와 같습니다.
- 사용자는 손글씨 이미지 파일을 업로드할 수 있다.
- 사용자는 캔버스에 손글씨를 쓸 수 있다.
- 사용자는 텍스트 결과를 확인할 수 있다.

## Colab 환경 설정
python package들을 설치합니다. 예제로 사용할 이미지들도 다운로드 받습니다.

In [None]:
# Local에서 Run하는 경우 False로 변경
using_colab = True

In [None]:
if using_colab:
    !wget https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/requirements-colab.txt
    !pip install -r requirements-colab.txt

    !mkdir examples
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello_cursive.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Red.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/sentence.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/i_love_you.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/merrychristmas.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Rock.png
    !cd examples && wget https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Bob.png

--2024-04-28 07:59:42--  https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/requirements-colab.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘requirements-colab.txt.1’


2024-04-28 07:59:42 (6.81 MB/s) - ‘requirements-colab.txt.1’ saved [123/123]

mkdir: cannot create directory ‘examples’: File exists
--2024-04-28 07:59:51--  https://github.com/mrsyee/dl_apps/raw/main/ocr/examples/Hello.png
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/mrsyee/dl_apps/main/ocr/examples/Hello.png [following]
--2024-04-28 07:59:52--  https://raw.githubusercontent.com/m

## Import dependency

In [None]:
import os

import gradio as gr
import numpy as np
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel



## 이미지 업로드 UI

In [None]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")

In [None]:
app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://1572fc492e87cf85a2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
app.close()

Closing server running on port: 7860


## TrOCR 추론기 클래스
TrOCR 추론기 클래스는 TrOCR 모델 및 processor 초기화와 추론 작업을 수행하는 클래스입니다.

In [None]:
class TrOCRInferencer:
    def __init__(self):
        print("[INFO] Initialize TrOCR Inferencer.")
        self.processor = TrOCRProcessor.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )
        self.model = VisionEncoderDecoderModel.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )

    def inference(self, image: Image) -> str:
        """Inference using model.

        It is performed as a procedure of preprocessing - inference - postprocessing.
        """
        # preprocess
        pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
        # inference
        generated_ids = self.model.generate(pixel_values)
        # postprocess
        generated_text = self.processor.batch_decode(
            generated_ids, skip_special_tokens=True
        )[0]

        return generated_text


inferencer = TrOCRInferencer()

[INFO] Initialize TrOCR Inferencer.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/228 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.17k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at microsoft/trocr-base-handwritten and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## 추론 기능 구현

In [None]:
def image_to_text(image: np.ndarray) -> str:
    image = Image.fromarray(image).convert("RGB")
    text = inferencer.inference(image)
    return text

In [None]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    image = gr.Image(label="Handwritten image file")
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=image, outputs=output
    )

In [None]:
app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://74c4d7324bcdc30d0c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
app.close()

Closing server running on port: 7860


## 캔버스 UI

In [None]:
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    sketchpad = gr.Sketchpad(
        label="Handwritten Sketchpad",
        shape=(600, 192),
        brush_radius=2,
        invert_colors=False,
    )
    output = gr.Textbox(label="Output Box")
    convert_btn = gr.Button("Convert")
    convert_btn.click(
        fn=image_to_text, inputs=sketchpad, outputs=output
    )

In [None]:
app.launch(inline=False, share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://20411fc39f471d8bff.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
app.close()

Closing server running on port: 7860


## 최종 App 구현

In [None]:
# Implement inferencer
class TrOCRInferencer:
    def __init__(self):
        print("[INFO] Initialize TrOCR Inferencer.")
        self.processor = TrOCRProcessor.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )
        self.model = VisionEncoderDecoderModel.from_pretrained(
            "microsoft/trocr-base-handwritten"
        )

    def inference(self, image: Image) -> str:
        """Inference using model.

        It is performed as a procedure of preprocessing - inference - postprocessing.
        """
        # preprocess
        pixel_values = self.processor(images=image, return_tensors="pt").pixel_values
        # inference
        generated_ids = self.model.generate(pixel_values)
        # postprocess
        generated_text = self.processor.batch_decode(
            generated_ids, skip_special_tokens=True
        )[0]

        return generated_text

inferencer = TrOCRInferencer()


# Implement event function
def image_to_text(image: np.ndarray) -> str:
    image = Image.fromarray(image).convert("RGB")
    text = inferencer.inference(image)
    return text


# Implement app
with gr.Blocks() as app:
    gr.Markdown("# Handwritten Image OCR")
    with gr.Tab("Image upload"):
        image = gr.Image(label="Handwritten image file")
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=image, outputs=output
        )

        gr.Markdown("## Image Examples")
        gr.Examples(
            examples=[
                os.path.join(os.getcwd(), "examples/Hello.png"),
                os.path.join(os.getcwd(), "examples/Hello_cursive.png"),
                os.path.join(os.getcwd(), "examples/Red.png"),
                os.path.join(os.getcwd(), "examples/sentence.png"),
                os.path.join(os.getcwd(), "examples/i_love_you.png"),
                os.path.join(os.getcwd(), "examples/merrychristmas.png"),
                os.path.join(os.getcwd(), "examples/Rock.png"),
                os.path.join(os.getcwd(), "examples/Bob.png"),
            ],
            inputs=image,
            outputs=output,
            fn=image_to_text,
        )

    with gr.Tab("Drawing"):
        sketchpad = gr.Sketchpad(
            label="Handwritten Sketchpad",
            shape=(600, 192),
            brush_radius=2,
            invert_colors=False,
        )
        output = gr.Textbox(label="Output Box")
        convert_btn = gr.Button("Convert")
        convert_btn.click(
            fn=image_to_text, inputs=sketchpad, outputs=output
        )

In [None]:
# App 실행
app.launch(inline=False, share=True)

In [None]:
app.close()