This project is to develop a web application tailored to English learners from Japanese or Mandarin-speaking backgrounds. The app is able to identify errors in sentences written by learners and suggest accurate corrections, ultimately enhancing their learning experience and improving their language proficiency. The Methods & Techniques section below illustrates the implementation details for an end-to-end NLP pipeline.
Our team is named A Very Beta ChatGPT 4.5 and made up of the following members:
- Stefan Hall
- Tim Wang
- Amy Yang
To run the application locally, navigate to the 'Model Deployment' folder and execute the code below in your terminal:
pip install -r requirements.txt
streamlit run app.py
To enable the public access, the web application was deployed on Hugging Face Space. Find web app here. You can also find the fine-tuned models on Hugging Face.
You can make selection for your native language and the language model to be used.
- Your native language Japanese or Madarin. (For this project, only two foreign language backgrounds are considered.)
- Language model to use BART or GPT-2.
Once the options above are selected, you can enter the sentence and click 'Generate' button to get the recommended/corrected sentence as well as the POS tagging and Dependency Tree to visualise the relationships between words in a sentence.
The dataset used in this project is from NAIST Lang-8 Learner Corpora. The python file extract_err-cor-pair_new.py
in the repository is used for data extraction into csv files. It is a modified version of this original file.
- BART Model
- CaliberAI/streamlit-nlg-gpt-2
- Video: Deploy Fine Tuned BERT or Transformers model on Streamlit Cloud
- OpenAI GPT2
- Gupta, S. (2020, December 10). Parts-of-Speech tagging app using Streamlit, spacy and Python. Srishti Gupta's Blog. https://srishti.hashnode.dev/parts-of-speech-tagging-app-using-streamlit-spacy-and-python
- Lang8-NAIST-extractor