Read-table-image

This project translates table images to editable texts and outputs a structured JSON files.

Requirements

Python 3.7
OpenCV 3.4.2
Tesseract 5.0.0
Jsonschema 3.0.2

Make some changes in read_table_image.py first and then run it directly.

Change imgPath and target_dir to your own path of the input images and the output location.
To recognize special characters, download the 'traineddata' files in tessdata or fine-tune the model through Tesseract. In the defined function img2text, add the name of the traineddata you need to the parameter '-lang'. Use '+' between different traineddata.
Save processed images during the processes by indications in the comments.
The output is a structured JSON file named as 'image name + _table.json'. An image showing cell detection result in a table is also provided.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
tessdata		tessdata
README.md		README.md
read_table_image.py		read_table_image.py