GitHub - Vitsyn-Morgunov-and-Nikulin/automatic-essay-evaluator: Automatic Essay Evaluator: Kaggle ELL Competition Solution

Express your thoughts — achieve your goals!

📌 Table of Contents

📝 About the project

Writing skills are essential for a modern person and must be developed throughout the entire life. Good piece of writing might help in career, relationships, personal effectiveness, and even in self-understanding. However, improving this competency could be problematic in the absence of a reviewer.

Accordingly, creation of an open-source automated text evaluator tends to be a natural step towards enhanced writing skills within society. Firstly, it can speed up the essay review processes done by teachers. Secondly, such a tool can make assessment more unbiased. Thirdly, it might help foreigners to identify linguistic gaps and thereby facilitate the learning process.

As a part of feedback prize competition our goal is to create an automatic solution that scores students’ essays using multiple criteria: cohesion, syntax, vocabulary, phraseology, grammar, and conventions. For each criterion, the system assigns a score from 1.0 to 5.0.

⚡ Getting Started

Use our service

Check out our relevant version that is available on this link!

Prerequisites:

GNU make utility (link)
Python of version 3.7.13 (link)
Packaging manager poetry (link)
At least 2Gb on your hard disk

Once all of above is satisfied, execute the following to setup the poetry:

poetry lock
poetry --no-root install

Run application locally

To your delight, it's done via a single command:

poetry run make build

Run experiments

Our MLOps pipeline is powered by Hydra package, which allows to configure experiments via yaml files. This directory contains nested configuration for our experiments.

Try out running a baseline using following command:

poetry run python -m src.main +experiment=sanity_constant_predictor

Directory src/config/conf/experiments contains our basic set-ups used in competition. We highly encourage you to fine-tune these configurations and create your own to achieve even higher results!

📖 How it works?

Our top performing solution is based on the fine-tuned DeBERTa model deberta-v3-large and six CatBoost Regressors predicting analytical measures. Based on this solution there was built a automatic essay evaluator system powered by Hugging Face Demo engine.

The interface is quite intuitive and user-friendly: entire workflow is guided by a textual annotations. User is asked to insert an essay in a correspondent text field. Once the document is ready, our system inferences the model and visualises the results in the very same window. Essay seems to belong to a solid B student — good for him!

🚀 Quality Ensuring

In terms of ISO 25010 standard, this project mainly focuses on performance efficiency: it should ...

achieve tolerable mean columnwise root mean squared error (competition metric);
perform fast on inference (position in leaderboard depends on runtime);
utilize GPU effectively (affects time of each experiment).

Still, we also put significant effort to (partially) automate routine operations and restrict programmers from violating style rules and designing non-working code:

Using Poetry to avoid dependency hell (replacement for pip package);
Continuous integration workflow that performs linting according to PEP8 and unit/integration testing;
Submission workflow that loads our best performing solution to Kaggle kernel;
Configurable experiments via Hydra that keeps our studies clean and structured;
Syncing experiments in Weights & Biases that helps us to monitor progress of our experiments;
Automate building of project via Makefile;
Evaluation via cross-validation that is cosidered to be the most objective amid possible ways to assess generalization of a model;
Reproducible experimentation that guarantees that same set-up will give equal results on different machines;
Notifications in Telegram when training is completed;
Thoughtful pull request reviews, e.g. here — our inspection was based on "Clean Code" book by Uncle Bob;
Badges with codecov, codacy, continuous integration, and kaggle submission;
Used snyk to find vulnerabilities, e.g., in this PR;
Used sphinx package for auto-generation of our documentation;
Tried to attach commits to tickets (mostly in latter part of development);
Pre-commit hooking that runs autopep8, dependencies sorting, and autoflake.
- Disclaimer: this practice considered to be harmful, and slow down the process of development! We actively applied it only on prior stages of development. Use it wisely!

✏️ How to contribute?

In our development process we followed practices described by Uncle Bob in his magnificent "Clean Code". Please, consult this book in case any trouble.

Make a fork of this repository, and develop your own tool. Make sure it is error-free and the test coverage is at least 60 percent. Update config files accordingly, and check their operability.

While producing your code, use this famous git workflow. Also note that our branches use prefixes feature/, fix/, and ci-cd/.

Further, send a pull request. In the comment, write the main features of the tool, the technology stack used, and a brief description of the algorithms. This should be enough for us to accept your code.

To check the quality of the code, we use flake8 and codacy.

💻 Contributors

Shamil Arslanov
Email: s.arslanov@innopolis.university
GitHub: @homomorfism

Maxim Faleev
Email: m.faleev@innopolis.university
GitHub: @implausibleDeniability

Danis Alukaev
Email: d.alukaev@innopolis.university
GitHub: @DanisAlukaev

📃 Licence

Lunguask is a free and open-source software licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
data		data
demo		demo
docs		docs
src		src
tests		tests
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📝 About the project

⚡ Getting Started

Use our service

Prerequisites:

Run application locally

Run experiments

📖 How it works?

🚀 Quality Ensuring

✏️ How to contribute?

💻 Contributors

📃 Licence

About

Releases 1

Packages

Contributors 4

Languages

License

Vitsyn-Morgunov-and-Nikulin/automatic-essay-evaluator

Folders and files

Latest commit

History

Repository files navigation

📝 About the project

⚡ Getting Started

Use our service

Prerequisites:

Run application locally

Run experiments

📖 How it works?

🚀 Quality Ensuring

✏️ How to contribute?

💻 Contributors

📃 Licence

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages