GitHub - amazon-science/textadain-robust-recognition: TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

This is the official pytorch implementation of TextAdaIN (ECCV 2022).

Oren Nuriel, Sharon Fogel, Ron Litman

TextAdaIN creates local distortions in the feature map which prevent the network from overfitting to local statistics. It does so by viewing each feature map as a sequence of elements and deliberately mismatching fine-grained feature statistics between elements in a mini-batch.

Overcoming the shortcut

Below we see the attention maps of a text recognizer before and after applying local corruptions to the input image. Each example shows the input image (bottom), attention map (top) and model prediction (left). Each line in the attention map is a time step representing the attention per character prediction. (a) The baseline model, which uses local statistics as a shortcut, misinterprets the corrupted images. (b) Our proposed method which overcomes this shortcut, enhances performance on both standard and challenging testing conditions

Integrating into your favorite text recognizer backbone

Sample code for the class can be found in TextAdaIN.py

As there are weights to this module, after training with this, the model can be loaded with or without this module.

# in the init of a pytorch module for training (no learnable weights, and isn't applied during inference so can load with or without)
    self.text_adain = TextAdaIN()



# in the forward
    out = self.conv(out)
    out = self.text_adain(out)
    out = self.bn(out)

Results

Below are the results for a variety of settings - scene text and handwriting and multiple architectures, with and without TextAdaIN. Applying TextAdaIN in state-of-the-art recognizers increases performance.

Method	Scene Text		Handwritten
	Regular	Irregular	IAM	RIMES
	5,529	3,010	17,990	7,734
Baek et al. (CTC)	88.7	72.9	80.6	87.8
+ TextAdaIN	89.5 (+0.8)	73.8 (+0.9)	81.5 (+0.9)	90.7 (+2.9)
Baek et al. (Attn)	92.0	77.4	82.7	90.2
+ TextAdaIN	92.2 (+0.2)	77.7 (+0.3)	84.1 (+1.4)	93.0 (+2.8)
Litman et al.	93.6	83.0	85.7	93.3
+ TextAdaIN	94.2 (+0.6)	83.4 (+0.4)	87.3 (+1.6)	94.4 (+1.1)
Fang et al.	93.9	82.0	85.4	92.0
+ TextAdaIN	94.2 (+0.3)	82.8 (+0.8)	86.3 (+0.9)	93.0 (+1.0)

Experiments - Plug n' play

Standard Text Recognizer

To run with the Baek et al framework, insert the TextAdaIN module into the ResNet backbone after every convolutional layer in the feature extractor as described above. After this is done, simply run the commandline as instructed in the training & evaluation section

For scene text we use the original configurations.

When training on handwriting datasets we run with the following configurations.

python train.py --train_data <train data path> --valid_data <val data path> --select_data / --batch_ratio 1 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --exp-name handwriting --sensitive --rgb --num_iter 200000 --batch_size 128 --textadain

ABINet

To run with ABINet, insert the TextAdaIN module into the ResNet backbone after every convolutional layer into the feature extractor as described above. After this is done, simply run the command line as instructed in the training section

Please refer to the implementation details in the paper for further information.

Citation

If you find this work useful please consider citing it:

@article{nuriel2021textadain,
  title={TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers},
  author={Nuriel, Oren and Fogel, Sharon and Litman, Ron},
  journal={arXiv preprint arXiv:2105.03906},
  year={2021}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figures		figures
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
TextAdaIN.py		TextAdaIN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

NOTICE

NOTICE

README.md

README.md

TextAdaIN.py

TextAdaIN.py

Repository files navigation

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Overcoming the shortcut

Integrating into your favorite text recognizer backbone

Results

Experiments - Plug n' play

Standard Text Recognizer

ABINet

Citation

Security

License

About

Releases

Packages

Languages

License

amazon-science/textadain-robust-recognition

Folders and files

Latest commit

History

Repository files navigation

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Overcoming the shortcut

Integrating into your favorite text recognizer backbone

Results

Experiments - Plug n' play

Standard Text Recognizer

ABINet

Citation

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages