Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions apps/common/handle/impl/doc_split_handle.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,7 @@ def get_image_id(image_id):

title_font_list = [
[36, 100],
[26, 36],
[24, 26],
[22, 24],
[18, 22],
[16, 18]
[30, 36]
]


Expand All @@ -130,7 +126,7 @@ def get_title_level(paragraph: Paragraph):
if len(paragraph.runs) == 1:
font_size = paragraph.runs[0].font.size
pt = font_size.pt
if pt >= 16:
if pt >= 30:
for _value, index in zip(title_font_list, range(len(title_font_list))):
if pt >= _value[0] and pt < _value[1]:
return index + 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code has some minor improvements and corrections:

  1. The get_image_id function is defined at the top but used further down, which means it might not be needed there unless you're reusing it elsewhere.

  2. In the get_title_level function:

    • You've removed three sets of conditional checks that are essentially duplicating the check for pt >= 30. Only the last condition remains useful.
  3. The list comprehension in title_font_list should likely include all available sizes rather than just smaller ones to cover all possible titles if they exist beyond the given range.

  4. It's unclear why pt >= 16 or any specific conditions (like < 36) were included for fonts larger than 30 points in title_font_list, as it would always match with [30, 36].

Here's an improved version of the get_title_level function based on these considerations:

def get_title_level(paragraph: Paragraph):
    if len(paragraph.runs) == 1:
        font_size = paragraph.runs[0].font.size
        pt = font_size.pt
        
        # Use binary search to find the appropriate title level
        left, right = 0, len(title_font_list) - 1
        while left <= right:
            mid = left + (right - left) // 2
            size_range = title_font_list[mid]
            if pt >= 30 and pt < size_range[1]:
                return mid + 1
            elif pt < size_range[0]:
                right = mid - 1
            else:
                left = mid + 1

    return 1  # Default level, typically H1

Potential Optimization Suggestions:

  • For better readability and maintainability, separate out each case into different functions or methods.
  • If the number of title levels extends significantly, consider using a dictionary mapping instead of a list for faster lookups.
  • Ensure that the logic handles edge cases correctly, such as when no relevant paragraphs are found.

Expand Down