# Typo Detector with OpenVino

Typo detection in AI is a process of identifying and correcting typographical errors in text data using machine learning algorithms. The goal of typo detection is to improve the accuracy, readability, and usability of text by identifying and correcting mistakes made during the writing process.

A typo detector takes a sentence as an input and identify all typographical errors such as misspellings and homophone errors.

This tutorial provides how to use the [Typo Detector](https://huggingface.co/m3hrdadfi/typo-detector-distilbert-en) from the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library to perform the above task.

## Imports

In [4]:
from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification, pipeline
from openvino.runtime import Core
from pathlib import Path
import numpy as np
import torch

## Methods

There are two methods to use the typo detection model with OpenVino. In this tutorial we will look at both.

##### 1. Using the [Hugging Face Optimum](https://huggingface.co/docs/optimum/index) library
The Hugging Face Optimum API is a high-level API that allows us to convert and quantize models from the Hugging Face Transformers library to the OpenVINO™ IR format.

##### 2. Converting the model to ONNX and then to OpenVino IR
First the Pytorch model is convereted to the ONNX format and then the [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) tool will be used to convert to Openvino IR format.

## Hugging Face Optimum library

For this method, we need to install the Hugging Face Optimum library accelerated by OpenVINO integration.

Optimum Intel can be used to load optimized models from the [Hugging Face Hub](https://huggingface.co/docs/optimum/intel/hf.co/models) and create pipelines to run an inference with OpenVINO Runtime using Hugging Face APIs. The Optimum Inference models are API compatible with Hugging Face Transformers models.  This means we need just replace AutoModelForXxx class with the corresponding OVModelForXxx class.

In [5]:
!pip install optimum[openvino]



Import required class

In [6]:
from optimum.intel.openvino import OVModelForTokenClassification

##### Load the model

From the OVModelForTokenCLassification class we will import the relevant pre-trained model. To load a Transformers model and convert it to the OpenVINO format on-the-fly, we set export=True when loading your model.

In [8]:
# The pretrained model we are using
model_id = "m3hrdadfi/typo-detector-distilbert-en"

model_dir = Path("model")

# Save the model to the path if not existing
if model_dir.exists():
    model = OVModelForTokenClassification.from_pretrained(model_dir)
else:
    model = OVModelForTokenClassification.from_pretrained(model_id, from_transformers=True)
    model.save_pretrained(model_dir)

##### Load the tokenizer

Text Preprocessing cleans the text-based input data so it can be fed into the model. [Tokenization](https://towardsdatascience.com/tokenization-for-natural-language-processing-a179a891bad4) splits paragraphs and sentences into smaller units that can be more easily assigned meaning. It involves cleaning the data and assigning tokens or IDs to the words, so they are represented in a vector space where similar words have similar vectors. This helps the model understand the context of a sentence. We're making use of an [AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer) from Hugging Face, which is essentially a pretrained tokenizer.

In [None]:
# Load the tokernizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

Then we use the inference pipeline for `text-classification` task. You can find more information about usage Hugging Face inference pipelines in this [tutorial](https://huggingface.co/docs/transformers/pipeline_tutorial)

In [9]:
nlp = pipeline('token-classification', model=model, tokenizer=tokenizer, aggregation_strategy="average")

Function to find typos in a sentence and print them

In [11]:
def show_typos(sentence):

    typos = [sentence[r["start"]: r["end"]] for r in nlp(sentence)]

    detected = sentence
    for typo in typos:
        detected = detected.replace(typo, f'<i>{typo}</i>')

    print("[Input]: ", sentence)
    print("[Detected]: ", detected)
    print("-" * 130)

Demo

In [None]:
sentences = [
 "He had also stgruggled with addiction during his time in Congress .",
 "The review thoroughla assessed all aspects of JLENS SuR and CPG esign maturit and confidence .",
 "Letterma also apologized two his staff for the satyation .",
 "Vincent Jay had earlier won France 's first gold in gthe 10km biathlon sprint .",
 "It is left to the directors to figure out hpw to bring the stry across to tye audience .",
]

for sentence in sentences:
    show_typos(sentence)