Text2Trait is a project that combines a user-friendly frontend application with a backend algorithm powered by the [LasUIE tool (https://github.com/ChocoWu/LasUIE/tree/master). Every part of the application requires different libraries, hence every folder with a certain utility contains a requirements.txt/pyproject.toml file allowing you to download appropiate versions of the dependencies.
- Frontend Application
- Backend Algorithm (LasUIE-based)
- Utility Scripts
- Dataset
The frontend is relatively easy to use and designed for quick setup.
- Install all dependencies listed in the
pyproject.tomlfile. - Locate and run the
app.pyscript:python app.py
- After running the script, your terminal will display a message similar to:
Running on http://127.0.0.1:5000/
- Open the displayed link in your browser โ the application should load immediately. Every part of the code in this section is well commented and described. If you have any doubts how certain method work, you can find it's description just under the method definition.
This backend leverages the LasUIE model and is centered around three key files:
run_finetune.py
run_inference.py
config.json-
run_finetune.pyThis script fine-tunes a selected backbone model from popular GLMs such as T5, BERT, or Flan-T5.- A wide range of hyperparameters can be configured.
- Due to limitations of the original implementation, several updates were made to align with the LasUIE workflow.
- All modifications are clearly marked in the code for easy reference.
- Additionally,
utils.py(inside theenginefolder) has been updated with similar improvements.
-
run_inference.pyDesigned for straightforward usage:- Set your desired hyperparameters in the file.
- Ensure the correct directory is selected.
- Run the script.
โ ๏ธ Note: When using a trained model from thecheckpointfolder, make sure to update the model name in the file to match the one in the folder.
-
config.jsonA configuration file containing general parameters that influence both training and inference, such as:- Backbone model type
- Learning rate
- Other key settings
This section provides a collection of lightweight, well-documented scripts to streamline data preparation for training. Each script is clearly named and does exactly what it promises. You can use them to:
- Convert PDF data into
.txtand.jsonformats - Transform Excel data into the required JSON training format
- Split datasets into train, validation (dev), and test sets
- Transfom inference data into a knowledge graph that is used in the application to visualize results