Skip to content

Vitsyn-Morgunov-and-Nikulin/automatic-essay-evaluator

Repository files navigation

Logotype

Express your thoughts — achieve your goals!

Codacy Badge codecov CI/CD master Kaggle master

| 🔥 Try Now! | 💿 Docs | 🏆 Competition | 🏋️‍♀️ Weights | 📈 Monitor |

📌 Table of Contents

📝 About the project

Writing skills are essential for a modern person and must be developed throughout the entire life. Good piece of writing might help in career, relationships, personal effectiveness, and even in self-understanding. However, improving this competency could be problematic in the absence of a reviewer.

Accordingly, creation of an open-source automated text evaluator tends to be a natural step towards enhanced writing skills within society. Firstly, it can speed up the essay review processes done by teachers. Secondly, such a tool can make assessment more unbiased. Thirdly, it might help foreigners to identify linguistic gaps and thereby facilitate the learning process.

As a part of feedback prize competition our goal is to create an automatic solution that scores students’ essays using multiple criteria: cohesion, syntax, vocabulary, phraseology, grammar, and conventions. For each criterion, the system assigns a score from 1.0 to 5.0.

⚡ Getting Started

Use our service

Check out our relevant version that is available on this link!

Prerequisites:

  1. GNU make utility (link)
  2. Python of version 3.7.13 (link)
  3. Packaging manager poetry (link)
  4. At least 2Gb on your hard disk

Once all of above is satisfied, execute the following to setup the poetry:

poetry lock
poetry --no-root install

Run application locally

To your delight, it's done via a single command:

poetry run make build

Run experiments

Our MLOps pipeline is powered by Hydra package, which allows to configure experiments via yaml files. This directory contains nested configuration for our experiments.

Try out running a baseline using following command:

poetry run python -m src.main +experiment=sanity_constant_predictor

Directory src/config/conf/experiments contains our basic set-ups used in competition. We highly encourage you to fine-tune these configurations and create your own to achieve even higher results!

📖 How it works?

Our top performing solution is based on the fine-tuned DeBERTa model deberta-v3-large and six CatBoost Regressors predicting analytical measures. Based on this solution there was built a automatic essay evaluator system powered by Hugging Face Demo engine.

drawing

The interface is quite intuitive and user-friendly: entire workflow is guided by a textual annotations. User is asked to insert an essay in a correspondent text field. Once the document is ready, our system inferences the model and visualises the results in the very same window. Essay seems to belong to a solid B student — good for him!

🚀 Quality Ensuring

In terms of ISO 25010 standard, this project mainly focuses on performance efficiency: it should ...

  • achieve tolerable mean columnwise root mean squared error (competition metric);
  • perform fast on inference (position in leaderboard depends on runtime);
  • utilize GPU effectively (affects time of each experiment).

Still, we also put significant effort to (partially) automate routine operations and restrict programmers from violating style rules and designing non-working code:

✏️ How to contribute?

In our development process we followed practices described by Uncle Bob in his magnificent "Clean Code". Please, consult this book in case any trouble.

Make a fork of this repository, and develop your own tool. Make sure it is error-free and the test coverage is at least 60 percent. Update config files accordingly, and check their operability.

While producing your code, use this famous git workflow. Also note that our branches use prefixes feature/, fix/, and ci-cd/.

Further, send a pull request. In the comment, write the main features of the tool, the technology stack used, and a brief description of the algorithms. This should be enough for us to accept your code.

To check the quality of the code, we use flake8 and codacy.

💻 Contributors

Shamil Arslanov
      Email: s.arslanov@innopolis.university
      GitHub: @homomorfism

Maxim Faleev
      Email: m.faleev@innopolis.university
      GitHub: @implausibleDeniability

Danis Alukaev
      Email: d.alukaev@innopolis.university
      GitHub: @DanisAlukaev

📃 Licence

Lunguask is a free and open-source software licensed under the MIT License.