This project translates table images to editable texts and outputs a structured JSON files.
Python 3.7
OpenCV 3.4.2
Tesseract 5.0.0
Jsonschema 3.0.2
Make some changes in read_table_image.py first and then run it directly.
- Change
imgPathandtarget_dirto your own path of the input images and the output location. - To recognize special characters, download the 'traineddata' files in
tessdataor fine-tune the model through Tesseract. In the defined function img2text, add the name of the traineddata you need to the parameter'-lang'. Use '+' between different traineddata. - Save processed images during the processes by indications in the comments.
- The output is a structured JSON file named as
'image name + _table.json'. An image showing cell detection result in a table is also provided.