![Logo](assets/effocr_logo.png)

# EfficientOCR (EffOCR)

Welcome to the EffOCR demo! [EffOCR](https://arxiv.org/abs/2304.02737) is a new architecture for 

- sample efficient and
- computationally efficient 

OCR, newly implemented as Python package.

Let's check it out!

## Installation and Setup

Installing EffOCR is easy with `pip`!

In [None]:
%pip install efficient_ocr

For the remainder of the demo, we'll need import the main `EffOCR` class, as well as a few other helpful libraries.

In [None]:
from efficient_ocr import EffOCR

In [5]:
import yaml
# import ...

## Configs

Whether you're running training or inference with EffOCR, you'll need to speciy a config. Fortunately, EffOCR only requires you directly specify a single config file in `YAML` format. (If you want to use MMDetection as a backend for object detection, you'll also need an MMDetection config file as well.)

Here's how it looks, for example:

In [7]:
with open('./config/config_dummy.yaml') as f:
    print(yaml.safe_dump(yaml.safe_load(f), explicit_start=True))

---
Global:
  char_only: false
  recognition_only: false
  wandb_project: name_of_wandb_project_for_effocr
Line:
  batch_size: 16
  conf_thresh: 0.2
  device: cpu
  epochs: 50
  input_shape: (640, 640)
  iou_thresh: 0.15
  max_det: 200
  min_seg_ratio: 2
  model_backend: yolo
  model_path: /path/to/line/detection/model
  num_cores: null
  providers: null
  training_name: name_of_training_run
  visualize: null
Localizer:
  batch_size: 16
  conf_thresh: 0.25
  device: cpu
  epochs: 50
  input_shape: (640, 640)
  iou_thresh: 0.1
  max_det: 200
  mmdet_config: null
  model_backend: yolo
  model_path: /path/to/localizer/model
  num_cores: null
  onnx_providers: null
  training_name: name_of_training_run
  vertical: false
  visualize: null
Recognizer:
  char:
    adamw_beta1: 0.9
    adamw_beta2: 0.999
    ascender: true
    aug_paired: false
    batch_size: 128
    char_only_sampler: false
    char_trans_version: 2
    dec_lr_factor: 0.9
    default_font_name: Noto
    diff_sizes: false
   

Some fields are self-explanatory, others are not! Visit the documentation on our GitHub repo to learn more.

## (Zero-Shot) Inference

Many will want to use EffOCR right off the shelf for zero-shot inference purposes. Our package makes this easy to accommodate in just `X`` lines of code:

### Example: Inference on Archival Data

In particular, the digitization of archival content will be of interest to many in the social sciences and (digital) humanities. To showcase the zero-shot capabilities of one of our models trained on mixed English print from archival and historical sources, we apply it to...

## Sample-Efficient Training

EffOCR is a sample efficient OCR architecture, and our package let's you train EffOCR yourself to take advantage of it! You can accomplish this in just `2` lines of code.

In [None]:
effocr = EffOCR(data_json, data_dir, config_yaml)
effocr.train()

As you can see, to fully train EffOCR, you only need `3` inputs:

1. `data_json = /path/to/coco/json`: a single COCO-style JSON file with annotations for words (or equivalent orthographic units), characters, and lines. (N.B. we provide an example config file for a popular image annotation service to help you get started on creating your own annotations from scratch on your own dataset!)
2. `data_dir = /path/to/coco/images/dir`: a single directory containing the images you annotated. 
3. `config_yaml = /path/to/config/yaml`: an EffOCR YAML config file. 

That's it! `EffOCR` does the rest, training all four modules in EffOCR.

You can also just train certain modules within your EffOCR model, e.g., to just train the localizer and line detection modules:

In [None]:
effocr = EffOCR(data_json, data_dir, config_yaml)
effocr.train(target=['line_detection', 'word_and_character_detection'])

### Example: Few-Shot Learning in Historical Japanese

In practice, we suspect that many people will be interested in training and fine-tuning their own EffOCR recognizer modules. One use case we've been interested is the digitization of historical Japanese firm-level reports, with data of interest being in tables, with text oriented vertically. 

Because of the low-resourceness of this setting, the best available alternative to EffOCR gets a CER of 55.6%! But with just a `1` training sample across a few hundred kanji, EffOCR achieves just a CER of X%!

To use EffOCR for few-shot learning, just specify 
```YAML
Recognizer:
  char:
    few_shot: 10
```
in the config, and train on the relevant modules, once the relevant data has been assembled!

In [None]:
effocr = EffOCR(data_json, data_dir, config_yaml)
effocr.train(target=['char_recognition'])

Inference and visualization with the newly trained model is super easy to perform, as shown before!

The upload to the HF model hub!