## Grammer Checker for Natural language text: A syntactically processing approach



1. *Data Collection and Annotation*:
   - Gather a large dataset of natural language text with annotated grammatical errors. Annotate each error type (e.g., subject-verb agreement, tense inconsistency) to create a labeled dataset for training your ML models.

2. *Feature Extraction*:
   - Extract features from the input text that are relevant to grammatical error detection and correction. These features may include syntactic features derived from parsed trees/graphs, lexical features, contextual features, and error-specific features.

3. *Model Selection*:
   - Choose appropriate ML models for error detection and correction tasks. Common models include:
     - *Sequence Models*: Models like recurrent neural networks (RNNs) or long short-term memory (LSTM) networks can be used for sequence labeling tasks, such as part-of-speech tagging or named entity recognition.
     - *Transformer Models*: Transformer-based architectures like BERT (Bidirectional Encoder Representations from Transformers) or GPT (Generative Pre-trained Transformer) can capture contextual information effectively and are suitable for a wide range of NLP tasks.
     - *Conditional Random Fields (CRFs)*: CRFs are often used for sequence labeling tasks, where the output depends on the entire input sequence.

4. *Training*:
   - Train your ML models using the annotated dataset. Fine-tune pre-trained models on your specific task to improve performance and adapt them to the grammatical error detection and correction domain.

5. *Evaluation*:
   - Evaluate the performance of your ML models on a separate evaluation dataset. Use standard evaluation metrics such as precision, recall, and F1-score to measure the accuracy of error detection and correction.

6. *Integration with Syntactic Processing*:
   - Integrate ML-based models with syntactic processing techniques to leverage the strengths of both approaches. For example, use syntactic parsing to provide structural information to ML models or combine syntactic features with learned representations for better error detection and correction.

7. *User Interface*:
   - Design a user-friendly interface that interacts with your ML models to input text, display detected errors, and suggest corrections. Provide options for users to accept or reject corrections and provide feedback to improve the system.

8. *Documentation and Deployment*:
   - Document the ML models used, training procedures, and integration with other components of the grammar checker. Deploy the grammar checker with ML models as a standalone tool or integrate it into existing applications or platforms.

By leveraging ML models as a major part of your grammar checker project, you can achieve higher accuracy and robustness in detecting and correcting grammatical errors in natural language text.

### Pseudocode for Grammar Checker

In [None]:
# Pseudocode for Grammar Checker

# Load the trained model
model = load_model('path_to_your_model')

# Load the tokenizer
tokenizer = load_tokenizer('path_to_your_tokenizer')

def grammar_checker(input_text):
    # Tokenize the input text
    tokenized_text = tokenizer.tokenize(input_text)
    
    # Predict the error locations and types using the model
    error_predictions = model.predict(tokenized_text)
    
    # For each predicted error in the text
    for error in error_predictions:
        # Identify the error type and location
        error_type, error_location = identify_error(error)
        
        # Generate correction for the error
        correction = generate_correction(error_type, error_location, tokenized_text)
        
        # Replace the error in the text with the correction
        tokenized_text = replace_error_with_correction(tokenized_text, error_location, correction)
    
    # Detokenize the text
    corrected_text = tokenizer.detokenize(tokenized_text)
    
    return corrected_text


In [None]:
# Pseudocode for Grammar Checker using Gramformer

# Import the necessary libraries
from gramformer import Gramformer

# Initialize the Gramformer model
gf = Gramformer(models=1)  # 0=detector, 1=highlighter, 2=corrector, 3=all 

def grammar_checker(input_text):
    # Use the Gramformer model to correct the input text
    corrections = gf.correct(input_text)
    
    # The 'correct' method returns a list of corrections. 
    # If multiple corrections are possible, it will return all of them.
    # Here, we're just taking the first one.
    corrected_text = corrections[0]
    
    return corrected_text


**Introduction**
Gramformer is a framework for detecting, highlighting, and correcting grammatical errors in natural language text¹. It was created by Prithiviraj Damodaran and is open to collaboration¹. Gramformer exposes three separate interfaces to a family of algorithms to detect, highlight, and correct grammar errors¹.

**Methodology**
Gramformer leverages state-of-the-art NLP Transformer models like T5². It works at the sentence level and has been trained on sentences of length 64¹. It combines some of the top-notch research in grammar correction¹. However, it's important to note that the fine-tuning for this model is done on relatively smaller models with not-so-much data due to compute budget constraints¹.

**Applications**
Gramformer has potential applications in several areas:
1. **Post-processing machine-generated text**: Machine-Language generation is becoming mainstream, so will post-processing machine-generated text¹.
2. **Human-In-The-Loop (HITL) text**: Most Supervised NLU (Chatbots and Conversational) systems need humans/experts to enter or edit text that needs to be grammatically correct¹.
3. **Assisted writing for humans**: Integrating into custom Text editors of your Apps¹.
4. **Custom Platform integration**: As of today, grammatical safety nets for authoring social contents (Post or Comments) or text in messaging platforms is very little (word level correction) or non-existent¹.

**Limitations**
While Gramformer is a powerful tool, it does have some limitations. It works at sentence levels and has been trained on 64 length sentences, so it's not (yet) suitable for long prose or paragraphs¹. Also, the results should be taken with a pinch of salt and considered as a proof-of-concept for a novel method for generating grammar error correction dataset¹.

**Future Work**
The creator of Gramformer is working on a version based on a larger base model and a lot more data for those who might want to use this in a production setup¹.

**Conclusion**
In conclusion, Gramformer is a promising tool for grammar correction. It combines the power of transformer models with the flexibility of Python, making it a valuable resource for anyone looking to improve the grammatical accuracy of their text².

(1) https://github.com/PrithivirajDamodaran/Gramformer.
(2) https://www.vennify.ai/gramformer-correct-grammar-transformer-nlp/.
(3) https://github.com/PrithivirajDamodaran/Gramformer.git.

## Setup and Installation

In [1]:
!pip install -U git+https://github.com/PrithivirajDamodaran/Gramformer.git
!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
!pip install --upgrade gramformer
!pip install transformers
!pip install spacy
!python -m spacy download en_core_web_sm

In [1]:
from gramformer import Gramformer

# Try loading individual models
try:
    gf_detector = Gramformer(models=0, use_gpu=False)  # Detector model
    print("Detector model loaded successfully.")
except Exception as e:
    print("Error loading detector model:", e)

try:
    gf_highlighter = Gramformer(models=1, use_gpu=False)  # Highlighter model
    print("Highlighter model loaded successfully.")
except Exception as e:
    print("Error loading highlighter model:", e)

try:
    gf_corrector = Gramformer(models=2, use_gpu=False)  # Corrector model
    print("Corrector model loaded successfully.")
except Exception as e:
    print("Error loading corrector model:", e)


  from .autonotebook import tqdm as notebook_tqdm


Detector model loaded successfully.




[Gramformer] Grammar error correct/highlight model loaded..
Highlighter model loaded successfully.
TO BE IMPLEMENTED!!!
Corrector model loaded successfully.


In [2]:
from gramformer import Gramformer

# Load Gramformer with only detector and highlighter models
gf = Gramformer(models=1, use_gpu=False)  # Detector and Highlighter models only

# Detect and highlight grammar errors
highlighted_sentence = gf.correct('My camera battery a dead')

print("Highlighted Sentence:", highlighted_sentence)


[Gramformer] Grammar error correct/highlight model loaded..
Highlighted Sentence: {'My camera battery is dead.'}


## Instantiate Gramformer

In [3]:
gf = Gramformer(models=1, use_gpu=True) # 0 = detector, 1 = highlighter, 2 = corrector, 3 = all

[Gramformer] Grammar error correct/highlight model loaded..


## Run Correction

In [4]:
gf.correct('My camera battery a dead')

{'My camera battery is dead.'}

In [5]:
sentences = [
    'I like for walks', 
    'World is flat', 
    'Red a color', 
    'I wish my Computer was run faster.'
]

In [6]:
for sentence in sentences:
    res = gf.correct(sentence)
    print(res)

{'I like walks.'}
{'The world is flat.'}
{'Red a color'}
{'I wish my computer was running faster.'}


## Putting it Together with Gradio

Gradio is an open-source Python library that allows you to quickly create customizable user interfaces for your machine learning models, APIs, or any arbitrary Python function¹²³⁴. It's designed to work with a wide range of machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn³.

Here's how it works:

1. **Function Wrapping**: You wrap your function, machine learning model, or API with Gradio's `Interface` class¹². This class takes three core arguments: the function to wrap (`fn`), the input components (`inputs`), and the output components (`outputs`)².

2. **Interface Creation**: You create an interface using the `gradio.Interface()` function¹. This function takes in your function, the input components, and the output components as parameters¹.

3. **Launching the Interface**: You launch the interface using the `launch()` method¹. This method can take a `share` parameter, which, if set to `True`, creates a publicly shareable link from your computer for the interface¹.

4. **Interacting with the Interface**: Once the interface is launched, you can interact with it directly in your Python notebook, or anyone can interact with it via the shared link¹².

5. **Hot Reloading**: When developing locally, you can run your Gradio app in hot reload mode, which automatically reloads the Gradio app whenever you make changes to the file².

Gradio requires no knowledge of HTML, CSS, or JavaScript, making it a great tool for quickly prototyping and sharing machine learning models and other functions¹²³⁴.

(1) https://www.geeksforgeeks.org/python-create-uis-for-prototyping-machine-learning-model-with-gradio/.
(2) https://www.gradio.app/guides/quickstart.
(3) https://medium.com/@HeCanThink/gradio-the-new-frontier-in-interactive-python-data-apps-64b5ce06628a.
(4) https://www.machinelearningnuggets.com/gradio-tutorial/.

In [7]:
import gradio as gr

In [8]:
def correct(sentence):
    res = gf.correct(sentence) # Gramformer correct
    return res # Return first value in res array

In [9]:
app_inputs = gr.Textbox(lines=2, placeholder="Enter sentence here...")

In [10]:
interface = gr.Interface(fn=correct, 
                        inputs=app_inputs,
                         outputs='text', 
                        title='Sup, I\'m Gramformer')

In [11]:
interface.launch()

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


