Implementation of адфыл application for half automatic labeling paragraphs in image of document
- Install python3 and pip (if you haven't installed it yet)
- Install some packages via pip (
pip install -r requirements.txt
) - Install tesseract for your OS and specify path to it:
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract' # путь к тессеракту
- Move images for labeling to
data/images
folder - Start
initialize.py
script:
python initialize.py
- Edit labels dictionary in
app.py
file for your task
labels = {
'header': (255, 0, 0),
'text': (0, 255, 0),
'list': (0, 0, 255),
}
- Launch application using
python app.py
- Go to
localhost:5000
and label your images
Press space key
While holding the left mouse button, move the cursor to another point and release the mouse button. Select the desired label in the list that appears or press the number of this label on the keyboard.
Hover over the bounding box and click the right mouse button (for smartphones: double-click on the bounding box).
Hover over the bounding box, hold the left mouse button, move bbox to necessary point and release the mouse button.
Hover over the border of the bounding rectangle, hold down the left mouse button, drag the border to the desired location and release the button.
Press +
or -
key to zoom in and out respectively
Press to the reset
button.
Press to the save
button and labeled image will move from images
to the labels
folder with creating .json file and .jpg image with bounding boxes.