Skip to content

The purpose of the study is to automate the analysis of scientific and educational documents in the context of research works using LLM

License

Notifications You must be signed in to change notification settings

LISA-ITMO/Edulytica

Repository files navigation

ITMO license python codecov build docs

Edulytica

logo

Description

The purpose of the application is to automate the analysis of scientific and educational documents in the context of research works using LLM (Large language model, large language models) to reduce the time and intellectual costs of teachers. The result is two LLMs, trained on specially collected data, capable of summarizing the text of a large document and revealing whether the stated goals and objectives of the work have been achieved.

Features

Please help us improve this project, share your feedback with opening issue!

Installation

1. Clone the repository

git clone https://github.com/LISA-ITMO/Edulytica.git

2. Activate venv

source ~/PyProject/Edulytica/api_venv/bin/activate

3. Install requirements

pip install -r requirements.txt

4. Start Application

python3 src/edulytica_api/app.py

5. Activate Celery

celery -A src.edulytica_api.celery.tasks worker --loglevel=info -E -P gevent

6. Run npm

npm start

7. Run Celery task

celery -A src.edulytica_api.celery.tasks flower

Getting started

First, you can familiarize yourself with the examples in JSON format of the system's responses to the test sample of works.

When you have managed to launch the service, you can send the documents yourself and get acquainted with the results of their verification!

Documentation

Details of the documentation can be found at the links below:

  • algorithms - part of the task of analyzing the text how much it is necessary to change the source text (which is written by AI) so that AI recognition systems do not recognize AI in this text;
  • data_handling - an auxiliary module that stores parsers of data and documents for generating datasets;
  • edulytica_api - this module stores the source code of the web service;
  • extracting_rules - This module is devoted to an experiment with extracting design rules using LLM;
  • rag - Package for an experiment with semantic search, kNN and the mBERT model are used.

Code documentation is available at the link.

Requirements

For more information, see the file requiremets.txt.

Contacts

Our contacts:

Conferences

  1. XIII Конгресс молодых ученых ИТМО:
    • Дворников А.С., Стрижов Д.А., Унтила А.А., Федоров Д.А. ИССЛЕДОВАНИЕ СГЕНЕРИРОВАННОГО ТЕКСТА НА ПРЕДМЕТ РАСПОЗНАВАНИЯ ИЗМЕНЕНИЙ СЕРВИСАМИ ИДЕНТИФИКАЦИИ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА - 2024;
    • Мищенко М.Ю., Мустафин Д.Э., Унтила А.А. Оценка релевантности неструктурированных данных для анализа и дообучения LLM - 2024;
    • Маракулин А.А., Дедкова А.В., Аминов Н.С., Федоров Д.А. СРАВНИТЕЛЬНЫЙ АНАЛИЗ МЕТОДОВ PEFT ДЛЯ ДООБУЧЕНИЯ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ - 2024;
    • Богданов М.А., Никифоров М.А., Аминов Н.С., Терещенко В.В., Федоров Д.А. АНАЛИЗ БОЛЬШИХ ДОКУМЕНТОВ ПРИ ПОМОЩИ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ - 2024;
  2. 53 конференция ППС:
    • Мустафин Д.Э., Крылов М.М., Терещенко В.В.ХРАНЕНИЕ ГЕТЕРОГЕННЫХ ДАННЫХ ДЛЯ ИХ ПОСЛЕДУЮЩЕЙ ОБРАБОТКИ - 2023;
    • Богданов М.А., Терещенко В.В., Аминов Н.С.ПРЕДВАРИТЕЛЬНЫЙ АНАЛИЗ ДОКУМЕНТОВ УЧЕБНОГО ПРОЦЕССА ДЛЯ ПОСЛЕДУЮЩЕГО ИХ ТЕМАТИЧЕСКОГО МОДЕЛИРОВАНИЯ - 2023;
    • Синюков Л.В., Лаптев Е.И., Терещенко В.В.ОЦЕНКА ВЛИЯНИЯ ОБРАЗОВАТЕЛЬНЫХ ДИСЦИПЛИН НА РЕЗУЛЬТАТ КУРСОВЫХ РАБОТ С ИСПОЛЬЗОВАНИЕМ ТЕМАТИЧЕСКОГО МОДЕЛИРОВАНИЯ - 2023;
    • Дворников А.С., Стрижов Д.А., Аминов Н.С. РАЗРАБОТКА LLM-МОДЕЛИ КЛАССИФИКАЦИИ ТЕКСТА С ЦЕЛЬЮ АВТОМАТИЧЕСКОГО ОПРЕДЕЛЕНИЯ ДОКУМЕНТА, НАПИСАННОГО ИСКУССТВЕННЫМ ИНТЕЛЛЕКТОМ - 2023.

Authors

Tereshchenko Vladislav
Martsinkevich Viacheslav
Aminov Natig
Mischenko Maxim
Bogdanov Maxim
Dvornikov Artem
Laptev Egor
Sinyukov Lev

About

The purpose of the study is to automate the analysis of scientific and educational documents in the context of research works using LLM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published