Skip to content

In this work, we introduce a transformer-based encoder-only grammatical error correction approach for improving automatic speech recognition by utilizing both a grammatical acceptability classifier (GAC) and a grammatical error correction model (GEC).

Notifications You must be signed in to change notification settings

UC-Berkeley-I-School/Improving-ASR-Output-Using-a-Transformer-based-Grammatical-Error-Correction-Approach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

Improving-ASR-Output-Using-a-Transformer-based-Grammatical-Error-Correction-Approach

DATASCI 266 Natural Language Processing with Deep Learning Final Project

Authors: Rachel Gao, Juliana Gómez-Consuegra, Erica Nakabayashi
Emails: rachelgao, julianagc, ericanaka @berkeley.edu

Abstract

In this work, we introduce a transformer-based encoder-only grammatical error correction approach for improving automatic speech recognition by utilizing both a grammatical acceptability classifier (GAC) and a grammatical error correction model (GEC). We investigate different strategies for optimizing the models and show that using raw data to train our GAC model generates better outputs than using cleaned and augmented data. We also find that the model which punctuates and proper-cases the input data by means of Named Entity Recognition (NER) yields better results than other GEC models, leading to reduced over-correction. Considering phonetics also improved model performance, and the performance differs between gender and emotions.

Code

The code support for this project is at the linked Google Drive below. Please email the authors to gain read access to the shared drive if you have trouble accessing it.

Shared Drive

Please find below instructions on how to navigate our project files.

Files and Folders

Throughout our project, we had three major milestones reflected on respective folder:

Data

Please find on Data all raw data gathered, data cleaning notebooks and explorations related to the original data, and basecleaned data ready for use in both of our models. Also in this root folder, it is possible to find all our experimentation data related to our GEC's models for analytical evaluation purposes.

Grammatical_Acceptability_Classifier

Please find on Grammatical Acceptability Classifier Folder our Experiments notebooks, a comparison sheet and the Final model.

Grammatical_Error_Correction

Please find on Grammatical Error Correction different approaches on GEC models architectures. As described on our paper, we used five different processes. In each folder please find the given model architecture with different thresholds, k-beams, and other parameters as it applies.

0.SimpleGEC 1.FineTuneGEC 2.Dynamic GEC 3.PhoneticGEC 4.RawGEC 5.TwoGramRawGEC

At this root folder, you can also find:

  • Spreadsheet with metrics on 50 sentences of out reduced set to get a glimpse of all fine tunning attempts.
  • Evaluation notebooks on all reduced training data for all final models.
  • Tryouts to different functions and functionalities, with folders name's starting with "TO_".

References

Here you can find all references used on our paper.

Paper

Please find it Here the final project report. Also, previous deliverables can be found here.

About

In this work, we introduce a transformer-based encoder-only grammatical error correction approach for improving automatic speech recognition by utilizing both a grammatical acceptability classifier (GAC) and a grammatical error correction model (GEC).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published