Skip to content

improve pylaia dataset parser#12

Open
milanalimova wants to merge 4 commits intoachimrabus:mainfrom
milanalimova:fix-pylaia-dataset-parser
Open

improve pylaia dataset parser#12
milanalimova wants to merge 4 commits intoachimrabus:mainfrom
milanalimova:fix-pylaia-dataset-parser

Conversation

@milanalimova
Copy link

example usage:
python convert_to_pylaia_new.py --input_train_csv output_from_transkribus_parser\train.csv --input_val_csv output_from_transkribus_parser\val.csv --output_dir output_dir\ --train_img_root output_from_transkribus_parser\ --val_img_root output_from_transkribus_parse\ --height 96 --process_images_from train

achimrabus added a commit that referenced this pull request Feb 20, 2026
Parser now accepts both "image.png text" (space) and "image.png,text" (CSV)
formats, auto-detected per line. Fixes compatibility with convert_to_pylaia.py
variants and milanalimova's PR #12 without breaking existing datasets.

Also updates module docstring to reference Puigcerver (2017) architecture
and clarify that the PyLaia package is not required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant