Skip to content

Layout Parser text boxes not properly aligned causing incorrect sorting of text boxes #50

@farazk86

Description

@farazk86

Hi,

I'm using layout parser to perform OCR on a research paper, but on almost every page of the pdf the text boxes are not properly aligned. For example I input this page:

image

perform detection using:

model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', 
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
layout = model.detect(image)

# Show the detected layout of the input image
lp.draw_box(image, layout, box_width=3)

The detected image is shown below:

detect

As can be seen, the bottom left box is not properly aligned, which causes problem with the sort script, as given in the tutorial:

# sort the left and right blocks and assign id to each
h, w = image.size

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# And finally combine the two list and add the index
# according to the order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# visualize the cleaned text blocks
lp.draw_box(image, text_blocks,
            box_width=3, 
            show_element_id=True)

detect_sort

The misaligned box is given an index of 0. Which is not correct.

Is there any way to avoid this problem?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions